xref: /llvm-project/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst (revision f82fb06cd1276bd358315e45cd3f4312b1319314)
14aac00a7SEndre Fülöp============================
24aac00a7SEndre FülöpTaint Analysis Configuration
34aac00a7SEndre Fülöp============================
44aac00a7SEndre Fülöp
56002e2fdSDaniel KruppThe Clang Static Analyzer uses taint analysis to detect injection vulnerability related issues in code.
66002e2fdSDaniel KruppThe backbone of taint analysis in the Clang SA is the ``TaintPropagation`` modeling checker.
7*f82fb06cSDaniel KruppThe reports are emitted via the :ref:`optin-taint-GenericTaint` checker.
86002e2fdSDaniel KruppThe ``TaintPropagation`` checker has a default taint-related configuration.
96002e2fdSDaniel KruppThe built-in default settings are defined in code, and they are always in effect.
10*f82fb06cSDaniel KruppThe checker also provides a configuration interface for extending the default settings via the ``optin.taint.TaintPropagation:Config`` checker config parameter
116002e2fdSDaniel Kruppby providing a configuration file to the in `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format.
124aac00a7SEndre FülöpThis documentation describes the syntax of the configuration file and gives the informal semantics of the configuration options.
134aac00a7SEndre Fülöp
144aac00a7SEndre Fülöp.. contents::
154aac00a7SEndre Fülöp   :local:
164aac00a7SEndre Fülöp
174aac00a7SEndre Fülöp.. _clangsa-taint-configuration-overview:
184aac00a7SEndre Fülöp
194aac00a7SEndre FülöpOverview
204aac00a7SEndre Fülöp________
214aac00a7SEndre Fülöp
224aac00a7SEndre FülöpTaint analysis works by checking for the occurrence of special operations during the symbolic execution of the program.
234aac00a7SEndre FülöpTaint analysis defines sources, sinks, and propagation rules. It identifies errors by detecting a flow of information that originates from a taint source, reaches a taint sink, and propagates through the program paths via propagation rules.
246002e2fdSDaniel KruppA source, sink, or an operation that propagates taint is mainly domain-specific knowledge, but there are some built-in defaults provided by the ``TaintPropagation`` checker.
254aac00a7SEndre FülöpIt is possible to express that a statement sanitizes tainted values by providing a ``Filters`` section in the external configuration (see :ref:`clangsa-taint-configuration-example` and :ref:`clangsa-taint-filter-details`).
264aac00a7SEndre FülöpThere are no default filters defined in the built-in settings.
274aac00a7SEndre FülöpThe checker's documentation also specifies how to provide a custom taint configuration with command-line options.
284aac00a7SEndre Fülöp
294aac00a7SEndre Fülöp.. _clangsa-taint-configuration-example:
304aac00a7SEndre Fülöp
314aac00a7SEndre FülöpExample configuration file
324aac00a7SEndre Fülöp__________________________
334aac00a7SEndre Fülöp
344aac00a7SEndre Fülöp.. code-block:: yaml
354aac00a7SEndre Fülöp
364aac00a7SEndre Fülöp  # The entries that specify arguments use 0-based indexing when specifying
374aac00a7SEndre Fülöp  # input arguments, and -1 is used to denote the return value.
384aac00a7SEndre Fülöp
394aac00a7SEndre Fülöp  Filters:
404aac00a7SEndre Fülöp    # Filter functions
414aac00a7SEndre Fülöp    # Taint is sanitized when tainted variables are pass arguments to filters.
424aac00a7SEndre Fülöp
434aac00a7SEndre Fülöp    # Filter function
444aac00a7SEndre Fülöp    #   void cleanse_first_arg(int* arg)
454aac00a7SEndre Fülöp    #
464aac00a7SEndre Fülöp    # Result example:
474aac00a7SEndre Fülöp    #   int x; // x is tainted
484aac00a7SEndre Fülöp    #   cleanse_first_arg(&x); // x is not tainted after the call
494aac00a7SEndre Fülöp    - Name: cleanse_first_arg
504aac00a7SEndre Fülöp      Args: [0]
514aac00a7SEndre Fülöp
524aac00a7SEndre Fülöp  Propagations:
534aac00a7SEndre Fülöp    # Source functions
544aac00a7SEndre Fülöp    # The omission of SrcArgs key indicates unconditional taint propagation,
554aac00a7SEndre Fülöp    # which is conceptually what a source does.
564aac00a7SEndre Fülöp
574aac00a7SEndre Fülöp    # Source function
584aac00a7SEndre Fülöp    #   size_t fread(void *ptr, size_t size, size_t nmemb, FILE * stream)
594aac00a7SEndre Fülöp    #
604aac00a7SEndre Fülöp    # Result example:
614aac00a7SEndre Fülöp    #   FILE* f = fopen("file.txt");
624aac00a7SEndre Fülöp    #   char buf[1024];
634aac00a7SEndre Fülöp    #   size_t read = fread(buf, sizeof(buf[0]), sizeof(buf)/sizeof(buf[0]), f);
644aac00a7SEndre Fülöp    #   // both read and buf are tainted
654aac00a7SEndre Fülöp    - Name: fread
664aac00a7SEndre Fülöp      DstArgs: [0, -1]
674aac00a7SEndre Fülöp
684aac00a7SEndre Fülöp    # Propagation functions
694aac00a7SEndre Fülöp    # The presence of SrcArgs key indicates conditional taint propagation,
704aac00a7SEndre Fülöp    # which is conceptually what a propagator does.
714aac00a7SEndre Fülöp
724aac00a7SEndre Fülöp    # Propagation function
734aac00a7SEndre Fülöp    #   char *dirname(char *path)
744aac00a7SEndre Fülöp    #
754aac00a7SEndre Fülöp    # Result example:
764aac00a7SEndre Fülöp    #   char* path = read_path();
774aac00a7SEndre Fülöp    #   char* dir = dirname(path);
784aac00a7SEndre Fülöp    #   // dir is tainted if path was tainted
794aac00a7SEndre Fülöp    - Name: dirname
804aac00a7SEndre Fülöp      SrcArgs: [0]
814aac00a7SEndre Fülöp      DstArgs: [-1]
824aac00a7SEndre Fülöp
834aac00a7SEndre Fülöp  Sinks:
844aac00a7SEndre Fülöp    # Sink functions
854aac00a7SEndre Fülöp    # If taint reaches any of the arguments specified, a warning is emitted.
864aac00a7SEndre Fülöp
874aac00a7SEndre Fülöp    # Sink function
884aac00a7SEndre Fülöp    #   int system(const char* command)
894aac00a7SEndre Fülöp    #
904aac00a7SEndre Fülöp    # Result example:
914aac00a7SEndre Fülöp    #   const char* command = read_command();
924aac00a7SEndre Fülöp    #   system(command); // emit diagnostic if command is tainted
934aac00a7SEndre Fülöp    - Name: system
944aac00a7SEndre Fülöp      Args: [0]
954aac00a7SEndre Fülöp
964aac00a7SEndre FülöpIn the example file above, the entries under the `Propagation` key implement the conceptual sources and propagations, and sinks have their dedicated `Sinks` key.
974aac00a7SEndre FülöpThe user can define operations (function calls) where the tainted values should be cleansed by listing entries under the `Filters` key.
984aac00a7SEndre FülöpFilters model the sanitization of values done by the programmer, and providing these is key to avoiding false-positive findings.
994aac00a7SEndre Fülöp
1004aac00a7SEndre FülöpConfiguration file syntax and semantics
1014aac00a7SEndre Fülöp_______________________________________
1024aac00a7SEndre Fülöp
1034aac00a7SEndre FülöpThe configuration file should have valid `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ syntax.
1044aac00a7SEndre Fülöp
1054aac00a7SEndre FülöpThe configuration file can have the following top-level keys:
1064aac00a7SEndre Fülöp - Filters
1074aac00a7SEndre Fülöp - Propagations
1084aac00a7SEndre Fülöp - Sinks
1094aac00a7SEndre Fülöp
1104aac00a7SEndre FülöpUnder the `Filters` key, the user can specify a list of operations that remove taint (see :ref:`clangsa-taint-filter-details` for details).
1114aac00a7SEndre Fülöp
1124aac00a7SEndre FülöpUnder the `Propagations` key, the user can specify a list of operations that introduce and propagate taint (see :ref:`clangsa-taint-propagation-details` for details).
1134aac00a7SEndre FülöpThe user can mark taint sources with a `SrcArgs` key in the `Propagation` key, while propagations have none.
1144aac00a7SEndre FülöpThe lack of the `SrcArgs` key means unconditional propagation, which is how sources are modeled.
1154aac00a7SEndre FülöpThe semantics of propagations are such, that if any of the source arguments are tainted (specified by indexes in `SrcArgs`) then all of the destination arguments (specified by indexes in `DstArgs`) also become tainted.
1164aac00a7SEndre Fülöp
1174aac00a7SEndre FülöpUnder the `Sinks` key, the user can specify a list of operations where the checker should emit a bug report if tainted data reaches it (see :ref:`clangsa-taint-sink-details` for details).
1184aac00a7SEndre Fülöp
1194aac00a7SEndre Fülöp.. _clangsa-taint-filter-details:
1204aac00a7SEndre Fülöp
1214aac00a7SEndre FülöpFilter syntax and semantics
1224aac00a7SEndre Fülöp###########################
1234aac00a7SEndre Fülöp
1244aac00a7SEndre FülöpAn entry under `Filters` is a `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the following mandatory keys:
1254aac00a7SEndre Fülöp - `Name` is a string that specifies the name of a function.
1264aac00a7SEndre Fülöp   Encountering this function during symbolic execution the checker will sanitize taint from the memory region referred to by the given arguments or return a sanitized value.
1274aac00a7SEndre Fülöp - `Args` is a list of numbers in the range of ``[-1..int_max]``.
1284aac00a7SEndre Fülöp   It indicates the indexes of arguments in the function call.
1294aac00a7SEndre Fülöp   The number ``-1`` signifies the return value; other numbers identify call arguments.
1304aac00a7SEndre Fülöp   The values of these arguments are considered clean after the function call.
1314aac00a7SEndre Fülöp
1324aac00a7SEndre FülöpThe following keys are optional:
1334aac00a7SEndre Fülöp - `Scope` is a string that specifies the prefix of the function's name in its fully qualified name. This option restricts the set of matching function calls. It can encode not only namespaces but struct/class names as well to match member functions.
1344aac00a7SEndre Fülöp
1354aac00a7SEndre Fülöp .. _clangsa-taint-propagation-details:
1364aac00a7SEndre Fülöp
1374aac00a7SEndre FülöpPropagation syntax and semantics
1384aac00a7SEndre Fülöp################################
1394aac00a7SEndre Fülöp
1404aac00a7SEndre FülöpAn entry under `Propagation` is a `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the following mandatory keys:
1414aac00a7SEndre Fülöp - `Name` is a string that specifies the name of a function.
1424aac00a7SEndre Fülöp   Encountering this function during symbolic execution propagate taint from one or more arguments to other arguments and possibly the return value.
1434aac00a7SEndre Fülöp   It helps model the taint-related behavior of functions that are not analyzable otherwise.
1444aac00a7SEndre Fülöp
1454aac00a7SEndre FülöpThe following keys are optional:
1464aac00a7SEndre Fülöp - `Scope` is a string that specifies the prefix of the function's name in its fully qualified name. This option restricts the set of matching function calls.
1474aac00a7SEndre Fülöp - `SrcArgs` is a list of numbers in the range of ``[0..int_max]`` that indicates the indexes of arguments in the function call.
1484aac00a7SEndre Fülöp   Taint-propagation considers the values of these arguments during the evaluation of the function call.
1494aac00a7SEndre Fülöp   If any `SrcArgs` arguments are tainted, the checker will consider all `DstArgs` arguments tainted after the call.
1504aac00a7SEndre Fülöp - `DstArgs` is a list of numbers in the range of ``[-1..int_max]`` that indicates the indexes of arguments in the function call.
1514aac00a7SEndre Fülöp   The number ``-1`` specifies the return value of the function.
1524aac00a7SEndre Fülöp   If any `SrcArgs` arguments are tainted, the checker will consider all `DstArgs` arguments tainted after the call.
1534aac00a7SEndre Fülöp - `VariadicType` is a string that can be one of ``None``, ``Dst``, ``Src``.
1544aac00a7SEndre Fülöp   It is used in conjunction with `VariadicIndex` to specify arguments inside a variadic argument.
1554aac00a7SEndre Fülöp   The value of ``Src`` will treat every call site argument that is part of a variadic argument list as a source concerning propagation rules (as if specified by `SrcArg`).
1564aac00a7SEndre Fülöp   The value of ``Dst`` will treat every call site argument that is part of a variadic argument list a destination concerning propagation rules.
1574aac00a7SEndre Fülöp   The value of ``None`` will not consider the arguments that are part of a variadic argument list (this option is redundant but can be used to temporarily switch off handling of a particular variadic argument option without removing the VariadicIndex key).
1584aac00a7SEndre Fülöp - `VariadicIndex` is a number in the range of ``[0..int_max]``. It indicates the starting index of the variadic argument in the signature of the function.
1594aac00a7SEndre Fülöp
1604aac00a7SEndre Fülöp
1614aac00a7SEndre Fülöp.. _clangsa-taint-sink-details:
1624aac00a7SEndre Fülöp
1634aac00a7SEndre FülöpSink syntax and semantics
1644aac00a7SEndre Fülöp#########################
1654aac00a7SEndre Fülöp
1664aac00a7SEndre FülöpAn entry under `Sinks` is a `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the following mandatory keys:
1674aac00a7SEndre Fülöp - `Name` is a string that specifies the name of a function.
1684aac00a7SEndre Fülöp   Encountering this function during symbolic execution will emit a taint-related diagnostic if any of the arguments specified with `Args` are tainted at the call site.
1694aac00a7SEndre Fülöp - `Args` is a list of numbers in the range of ``[0..int_max]`` that indicates the indexes of arguments in the function call.
1704aac00a7SEndre Fülöp   The checker reports an error if any of the specified arguments are tainted.
1714aac00a7SEndre Fülöp
1724aac00a7SEndre FülöpThe following keys are optional:
1734aac00a7SEndre Fülöp - `Scope` is a string that specifies the prefix of the function's name in its fully qualified name. This option restricts the set of matching function calls.
174