xref: /llvm-project/mlir/docs/Dialects/TOSA.md (revision 4b014352308f7244b86438fff7c61c632934a1ff)
1# Tensor Operator Set Architecture (TOSA) Dialect
2
3[TOC]
4
5## Rationale
6
7The MLIR TOSA dialect implements the [TOSA
8specification](https://developer.mlplatform.org/w/tosa/).  This document
9describes the decision process for how TOSA expresses operators in
10high level dialects.
11
12TOSA was developed after parallel efforts to rationalize the top-down picture
13from multiple high-level frameworks, as well as a bottom-up view of different
14hardware target concerns (CPU, GPU and NPU), and reflects a set of choices
15that attempt to manage both sets of requirements.
16
17## TOSA and Tensor Level Expressiveness
18
19TOSA endeavors to provide an operator set that tries to fulfil the following
20expressiveness goals at the *tensor level of abstraction* :
21
22### Complete
23
24This is driven by the top-down perspective, needing to express as much of
25multiple high level frameworks fully in TOSA, as possible. This was originally
26done from an operator frequency analysis done upon dozens of high level
27networks in different frameworks, to select the most frequently occurring ones
28and establish a common set of tensor-level operators that could express them.
29
30TOSA categorizes its operator set into classes and attempts to address major
31functional operations at the tensor level, including compute, reduction,
32elementwise transformations, comparison and control flow.
33
34### Minimal
35
36This takes the bottom-up approach - keep the TOSA operator set minimal in
37order to bound the design of hardware, operator kernels, code generation
38strategies and associated considerations that effect the executability of TOSA
39content.
40
41In this regard TOSA seeks to avoid creating compound operators, instead
42leaving it to compiler backend to fuse multiple TOSA ops if required. This
43choice also benefits the numerical precision goal, since it is easier to fuse the
44numerical functionality of successive operators, than to split the numerical
45functionality of a compound operator.
46
47### Numerical Precision
48
49TOSA began as a means to address operator-level numerical precision for
50code generation and hardware development. It therefore incorporates precision
51detail into the operator set.
52
53In this regard, TOSA operators are best understood as a combination of the visible
54quantization information embedded within an operation, together with the
55functional information about how that information is used, as described in the
56specification of the operation.
57
58## TOSA Operator Rationale
59
60The general basis of selection of the operator set that constitutes TOSA is
61described in the TOSA specification document  under Section 1.3 Operator
62Selection. Explanation of the thinking behind some operators is listed here:
63
64### COND\_IF and WHILE\_LOOP
65
66Several neural networks express conditional control flow at the tensor level.
67A survey of multiple high level frameworks indicated that conditional if and
68a loop construct are common in all major frameworks, with some variation.
69Since TOSA endeavors to be complete in expressing tensor level functionality
70including control flow, it implements these constructs.
71
72The COND\_IF and WHILE\_LOOP operators implement such structured control
73flow forms and should be lowerable to corresponding ops in the scf dialect.
74Since the dialect seeks to remain isomorphic with an external, serialized form,
75the decision was to keep these ops in the dialect (as opposed to deferring
76completely to scf), and this may be re-evaluated if this turns out to not yield
77the expected value.
78
79## Using TOSA In A Compiler
80
81The TOSA specification describes each operator in functional detail. It is
82expected that compilers that use TOSA will use its builders to construct the
83operators so that the quantization information for the operator is correctly
84generated.
85
86The functional steps described in the pseudocode of the specification enables
87the construction of code generation for that operation, or decisions on the
88design of underlying hardware. The functional pseudocode also describes
89how the quantization parameters are utilized within the operation.
90
91### Quantization Parameters in Ops vs Tensors
92
93TOSA uses the quantization parameters embedded in the input and output
94tensors to construct the quantization attributes that sit within the operator.
95Once these attributes are constructed, the quantization information within
96the tensors are no longer necessary for code generation.
97
98This enables the tensors to be subsequently interpreted simply as contiguous
99buffers containing raw data, with no 'meta information' in the form of the
100quantization_type. Precision related manipulation of the input or output are
101instead described by the operator itself which describes, for example, when
102the zero point is applied, or when the scale multiplication is done.
103
104However, TOSA does *not* eliminate the existing MLIR QuantOps quantization
105type information within the tensors; this leaves the choice of how to handle
106quantization information, to later backend code generation steps.
107
108Maintaining the ability to overlap these different representations of
109quantization parameters (i.e. tensor-carried vs op-carried) is an important
110capability when considering progressive lowering between uses that expect one
111scheme vs the other.
112
113## Operation definitions
114
115[include "Dialects/TosaOps.md"]
116