14aac00a7SEndre Fülöp============================ 24aac00a7SEndre FülöpTaint Analysis Configuration 34aac00a7SEndre Fülöp============================ 44aac00a7SEndre Fülöp 56002e2fdSDaniel KruppThe Clang Static Analyzer uses taint analysis to detect injection vulnerability related issues in code. 66002e2fdSDaniel KruppThe backbone of taint analysis in the Clang SA is the ``TaintPropagation`` modeling checker. 7*f82fb06cSDaniel KruppThe reports are emitted via the :ref:`optin-taint-GenericTaint` checker. 86002e2fdSDaniel KruppThe ``TaintPropagation`` checker has a default taint-related configuration. 96002e2fdSDaniel KruppThe built-in default settings are defined in code, and they are always in effect. 10*f82fb06cSDaniel KruppThe checker also provides a configuration interface for extending the default settings via the ``optin.taint.TaintPropagation:Config`` checker config parameter 116002e2fdSDaniel Kruppby providing a configuration file to the in `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format. 124aac00a7SEndre FülöpThis documentation describes the syntax of the configuration file and gives the informal semantics of the configuration options. 134aac00a7SEndre Fülöp 144aac00a7SEndre Fülöp.. contents:: 154aac00a7SEndre Fülöp :local: 164aac00a7SEndre Fülöp 174aac00a7SEndre Fülöp.. _clangsa-taint-configuration-overview: 184aac00a7SEndre Fülöp 194aac00a7SEndre FülöpOverview 204aac00a7SEndre Fülöp________ 214aac00a7SEndre Fülöp 224aac00a7SEndre FülöpTaint analysis works by checking for the occurrence of special operations during the symbolic execution of the program. 234aac00a7SEndre FülöpTaint analysis defines sources, sinks, and propagation rules. It identifies errors by detecting a flow of information that originates from a taint source, reaches a taint sink, and propagates through the program paths via propagation rules. 246002e2fdSDaniel KruppA source, sink, or an operation that propagates taint is mainly domain-specific knowledge, but there are some built-in defaults provided by the ``TaintPropagation`` checker. 254aac00a7SEndre FülöpIt is possible to express that a statement sanitizes tainted values by providing a ``Filters`` section in the external configuration (see :ref:`clangsa-taint-configuration-example` and :ref:`clangsa-taint-filter-details`). 264aac00a7SEndre FülöpThere are no default filters defined in the built-in settings. 274aac00a7SEndre FülöpThe checker's documentation also specifies how to provide a custom taint configuration with command-line options. 284aac00a7SEndre Fülöp 294aac00a7SEndre Fülöp.. _clangsa-taint-configuration-example: 304aac00a7SEndre Fülöp 314aac00a7SEndre FülöpExample configuration file 324aac00a7SEndre Fülöp__________________________ 334aac00a7SEndre Fülöp 344aac00a7SEndre Fülöp.. code-block:: yaml 354aac00a7SEndre Fülöp 364aac00a7SEndre Fülöp # The entries that specify arguments use 0-based indexing when specifying 374aac00a7SEndre Fülöp # input arguments, and -1 is used to denote the return value. 384aac00a7SEndre Fülöp 394aac00a7SEndre Fülöp Filters: 404aac00a7SEndre Fülöp # Filter functions 414aac00a7SEndre Fülöp # Taint is sanitized when tainted variables are pass arguments to filters. 424aac00a7SEndre Fülöp 434aac00a7SEndre Fülöp # Filter function 444aac00a7SEndre Fülöp # void cleanse_first_arg(int* arg) 454aac00a7SEndre Fülöp # 464aac00a7SEndre Fülöp # Result example: 474aac00a7SEndre Fülöp # int x; // x is tainted 484aac00a7SEndre Fülöp # cleanse_first_arg(&x); // x is not tainted after the call 494aac00a7SEndre Fülöp - Name: cleanse_first_arg 504aac00a7SEndre Fülöp Args: [0] 514aac00a7SEndre Fülöp 524aac00a7SEndre Fülöp Propagations: 534aac00a7SEndre Fülöp # Source functions 544aac00a7SEndre Fülöp # The omission of SrcArgs key indicates unconditional taint propagation, 554aac00a7SEndre Fülöp # which is conceptually what a source does. 564aac00a7SEndre Fülöp 574aac00a7SEndre Fülöp # Source function 584aac00a7SEndre Fülöp # size_t fread(void *ptr, size_t size, size_t nmemb, FILE * stream) 594aac00a7SEndre Fülöp # 604aac00a7SEndre Fülöp # Result example: 614aac00a7SEndre Fülöp # FILE* f = fopen("file.txt"); 624aac00a7SEndre Fülöp # char buf[1024]; 634aac00a7SEndre Fülöp # size_t read = fread(buf, sizeof(buf[0]), sizeof(buf)/sizeof(buf[0]), f); 644aac00a7SEndre Fülöp # // both read and buf are tainted 654aac00a7SEndre Fülöp - Name: fread 664aac00a7SEndre Fülöp DstArgs: [0, -1] 674aac00a7SEndre Fülöp 684aac00a7SEndre Fülöp # Propagation functions 694aac00a7SEndre Fülöp # The presence of SrcArgs key indicates conditional taint propagation, 704aac00a7SEndre Fülöp # which is conceptually what a propagator does. 714aac00a7SEndre Fülöp 724aac00a7SEndre Fülöp # Propagation function 734aac00a7SEndre Fülöp # char *dirname(char *path) 744aac00a7SEndre Fülöp # 754aac00a7SEndre Fülöp # Result example: 764aac00a7SEndre Fülöp # char* path = read_path(); 774aac00a7SEndre Fülöp # char* dir = dirname(path); 784aac00a7SEndre Fülöp # // dir is tainted if path was tainted 794aac00a7SEndre Fülöp - Name: dirname 804aac00a7SEndre Fülöp SrcArgs: [0] 814aac00a7SEndre Fülöp DstArgs: [-1] 824aac00a7SEndre Fülöp 834aac00a7SEndre Fülöp Sinks: 844aac00a7SEndre Fülöp # Sink functions 854aac00a7SEndre Fülöp # If taint reaches any of the arguments specified, a warning is emitted. 864aac00a7SEndre Fülöp 874aac00a7SEndre Fülöp # Sink function 884aac00a7SEndre Fülöp # int system(const char* command) 894aac00a7SEndre Fülöp # 904aac00a7SEndre Fülöp # Result example: 914aac00a7SEndre Fülöp # const char* command = read_command(); 924aac00a7SEndre Fülöp # system(command); // emit diagnostic if command is tainted 934aac00a7SEndre Fülöp - Name: system 944aac00a7SEndre Fülöp Args: [0] 954aac00a7SEndre Fülöp 964aac00a7SEndre FülöpIn the example file above, the entries under the `Propagation` key implement the conceptual sources and propagations, and sinks have their dedicated `Sinks` key. 974aac00a7SEndre FülöpThe user can define operations (function calls) where the tainted values should be cleansed by listing entries under the `Filters` key. 984aac00a7SEndre FülöpFilters model the sanitization of values done by the programmer, and providing these is key to avoiding false-positive findings. 994aac00a7SEndre Fülöp 1004aac00a7SEndre FülöpConfiguration file syntax and semantics 1014aac00a7SEndre Fülöp_______________________________________ 1024aac00a7SEndre Fülöp 1034aac00a7SEndre FülöpThe configuration file should have valid `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ syntax. 1044aac00a7SEndre Fülöp 1054aac00a7SEndre FülöpThe configuration file can have the following top-level keys: 1064aac00a7SEndre Fülöp - Filters 1074aac00a7SEndre Fülöp - Propagations 1084aac00a7SEndre Fülöp - Sinks 1094aac00a7SEndre Fülöp 1104aac00a7SEndre FülöpUnder the `Filters` key, the user can specify a list of operations that remove taint (see :ref:`clangsa-taint-filter-details` for details). 1114aac00a7SEndre Fülöp 1124aac00a7SEndre FülöpUnder the `Propagations` key, the user can specify a list of operations that introduce and propagate taint (see :ref:`clangsa-taint-propagation-details` for details). 1134aac00a7SEndre FülöpThe user can mark taint sources with a `SrcArgs` key in the `Propagation` key, while propagations have none. 1144aac00a7SEndre FülöpThe lack of the `SrcArgs` key means unconditional propagation, which is how sources are modeled. 1154aac00a7SEndre FülöpThe semantics of propagations are such, that if any of the source arguments are tainted (specified by indexes in `SrcArgs`) then all of the destination arguments (specified by indexes in `DstArgs`) also become tainted. 1164aac00a7SEndre Fülöp 1174aac00a7SEndre FülöpUnder the `Sinks` key, the user can specify a list of operations where the checker should emit a bug report if tainted data reaches it (see :ref:`clangsa-taint-sink-details` for details). 1184aac00a7SEndre Fülöp 1194aac00a7SEndre Fülöp.. _clangsa-taint-filter-details: 1204aac00a7SEndre Fülöp 1214aac00a7SEndre FülöpFilter syntax and semantics 1224aac00a7SEndre Fülöp########################### 1234aac00a7SEndre Fülöp 1244aac00a7SEndre FülöpAn entry under `Filters` is a `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the following mandatory keys: 1254aac00a7SEndre Fülöp - `Name` is a string that specifies the name of a function. 1264aac00a7SEndre Fülöp Encountering this function during symbolic execution the checker will sanitize taint from the memory region referred to by the given arguments or return a sanitized value. 1274aac00a7SEndre Fülöp - `Args` is a list of numbers in the range of ``[-1..int_max]``. 1284aac00a7SEndre Fülöp It indicates the indexes of arguments in the function call. 1294aac00a7SEndre Fülöp The number ``-1`` signifies the return value; other numbers identify call arguments. 1304aac00a7SEndre Fülöp The values of these arguments are considered clean after the function call. 1314aac00a7SEndre Fülöp 1324aac00a7SEndre FülöpThe following keys are optional: 1334aac00a7SEndre Fülöp - `Scope` is a string that specifies the prefix of the function's name in its fully qualified name. This option restricts the set of matching function calls. It can encode not only namespaces but struct/class names as well to match member functions. 1344aac00a7SEndre Fülöp 1354aac00a7SEndre Fülöp .. _clangsa-taint-propagation-details: 1364aac00a7SEndre Fülöp 1374aac00a7SEndre FülöpPropagation syntax and semantics 1384aac00a7SEndre Fülöp################################ 1394aac00a7SEndre Fülöp 1404aac00a7SEndre FülöpAn entry under `Propagation` is a `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the following mandatory keys: 1414aac00a7SEndre Fülöp - `Name` is a string that specifies the name of a function. 1424aac00a7SEndre Fülöp Encountering this function during symbolic execution propagate taint from one or more arguments to other arguments and possibly the return value. 1434aac00a7SEndre Fülöp It helps model the taint-related behavior of functions that are not analyzable otherwise. 1444aac00a7SEndre Fülöp 1454aac00a7SEndre FülöpThe following keys are optional: 1464aac00a7SEndre Fülöp - `Scope` is a string that specifies the prefix of the function's name in its fully qualified name. This option restricts the set of matching function calls. 1474aac00a7SEndre Fülöp - `SrcArgs` is a list of numbers in the range of ``[0..int_max]`` that indicates the indexes of arguments in the function call. 1484aac00a7SEndre Fülöp Taint-propagation considers the values of these arguments during the evaluation of the function call. 1494aac00a7SEndre Fülöp If any `SrcArgs` arguments are tainted, the checker will consider all `DstArgs` arguments tainted after the call. 1504aac00a7SEndre Fülöp - `DstArgs` is a list of numbers in the range of ``[-1..int_max]`` that indicates the indexes of arguments in the function call. 1514aac00a7SEndre Fülöp The number ``-1`` specifies the return value of the function. 1524aac00a7SEndre Fülöp If any `SrcArgs` arguments are tainted, the checker will consider all `DstArgs` arguments tainted after the call. 1534aac00a7SEndre Fülöp - `VariadicType` is a string that can be one of ``None``, ``Dst``, ``Src``. 1544aac00a7SEndre Fülöp It is used in conjunction with `VariadicIndex` to specify arguments inside a variadic argument. 1554aac00a7SEndre Fülöp The value of ``Src`` will treat every call site argument that is part of a variadic argument list as a source concerning propagation rules (as if specified by `SrcArg`). 1564aac00a7SEndre Fülöp The value of ``Dst`` will treat every call site argument that is part of a variadic argument list a destination concerning propagation rules. 1574aac00a7SEndre Fülöp The value of ``None`` will not consider the arguments that are part of a variadic argument list (this option is redundant but can be used to temporarily switch off handling of a particular variadic argument option without removing the VariadicIndex key). 1584aac00a7SEndre Fülöp - `VariadicIndex` is a number in the range of ``[0..int_max]``. It indicates the starting index of the variadic argument in the signature of the function. 1594aac00a7SEndre Fülöp 1604aac00a7SEndre Fülöp 1614aac00a7SEndre Fülöp.. _clangsa-taint-sink-details: 1624aac00a7SEndre Fülöp 1634aac00a7SEndre FülöpSink syntax and semantics 1644aac00a7SEndre Fülöp######################### 1654aac00a7SEndre Fülöp 1664aac00a7SEndre FülöpAn entry under `Sinks` is a `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the following mandatory keys: 1674aac00a7SEndre Fülöp - `Name` is a string that specifies the name of a function. 1684aac00a7SEndre Fülöp Encountering this function during symbolic execution will emit a taint-related diagnostic if any of the arguments specified with `Args` are tainted at the call site. 1694aac00a7SEndre Fülöp - `Args` is a list of numbers in the range of ``[0..int_max]`` that indicates the indexes of arguments in the function call. 1704aac00a7SEndre Fülöp The checker reports an error if any of the specified arguments are tainted. 1714aac00a7SEndre Fülöp 1724aac00a7SEndre FülöpThe following keys are optional: 1734aac00a7SEndre Fülöp - `Scope` is a string that specifies the prefix of the function's name in its fully qualified name. This option restricts the set of matching function calls. 174