xref: /dpdk/doc/guides/prog_guide/graph_lib.rst (revision 070db97e017b7ed9a5320b2f624f05562a632bd3)
14dc6d8e6SJerin Jacob..  SPDX-License-Identifier: BSD-3-Clause
24dc6d8e6SJerin Jacob    Copyright(C) 2020 Marvell International Ltd.
34dc6d8e6SJerin Jacob
44dc6d8e6SJerin JacobGraph Library and Inbuilt Nodes
54dc6d8e6SJerin Jacob===============================
64dc6d8e6SJerin Jacob
74dc6d8e6SJerin JacobGraph architecture abstracts the data processing functions as a ``node`` and
84dc6d8e6SJerin Jacob``links`` them together to create a complex ``graph`` to enable reusable/modular
94dc6d8e6SJerin Jacobdata processing functions.
104dc6d8e6SJerin Jacob
114dc6d8e6SJerin JacobThe graph library provides API to enable graph framework operations such as
124dc6d8e6SJerin Jacobcreate, lookup, dump and destroy on graph and node operations such as clone,
134dc6d8e6SJerin Jacobedge update, and edge shrink, etc. The API also allows to create the stats
144dc6d8e6SJerin Jacobcluster to monitor per graph and per node stats.
154dc6d8e6SJerin Jacob
164dc6d8e6SJerin JacobFeatures
174dc6d8e6SJerin Jacob--------
184dc6d8e6SJerin Jacob
194dc6d8e6SJerin JacobFeatures of the Graph library are:
204dc6d8e6SJerin Jacob
214dc6d8e6SJerin Jacob- Nodes as plugins.
224dc6d8e6SJerin Jacob- Support for out of tree nodes.
234dc6d8e6SJerin Jacob- Inbuilt nodes for packet processing.
24*070db97eSPavan Nikhilesh- Node specific xstat counts.
254dc6d8e6SJerin Jacob- Multi-process support.
264dc6d8e6SJerin Jacob- Low overhead graph walk and node enqueue.
274dc6d8e6SJerin Jacob- Low overhead statistics collection infrastructure.
284dc6d8e6SJerin Jacob- Support to export the graph as a Graphviz dot file. See ``rte_graph_export()``.
294dc6d8e6SJerin Jacob- Allow having another graph walk implementation in the future by segregating
30a2bc0584SZhirun Yan  the fast path(``rte_graph_worker.h``) and slow path code.
314dc6d8e6SJerin Jacob
324dc6d8e6SJerin JacobAdvantages of Graph architecture
334dc6d8e6SJerin Jacob--------------------------------
344dc6d8e6SJerin Jacob
354dc6d8e6SJerin Jacob- Memory latency is the enemy for high-speed packet processing, moving the
364dc6d8e6SJerin Jacob  similar packet processing code to a node will reduce the I cache and D
374dc6d8e6SJerin Jacob  caches misses.
384dc6d8e6SJerin Jacob- Exploits the probability that most packets will follow the same nodes in the
394dc6d8e6SJerin Jacob  graph.
404dc6d8e6SJerin Jacob- Allow SIMD instructions for packet processing of the node.-
414dc6d8e6SJerin Jacob- The modular scheme allows having reusable nodes for the consumers.
424dc6d8e6SJerin Jacob- The modular scheme allows us to abstract the vendor HW specific
434dc6d8e6SJerin Jacob  optimizations as a node.
444dc6d8e6SJerin Jacob
454dc6d8e6SJerin JacobPerformance tuning parameters
464dc6d8e6SJerin Jacob-----------------------------
474dc6d8e6SJerin Jacob
484dc6d8e6SJerin Jacob- Test with various burst size values (256, 128, 64, 32) using
4989c67ae2SCiara Power  RTE_GRAPH_BURST_SIZE config option.
504dc6d8e6SJerin Jacob  The testing shows, on x86 and arm64 servers, The sweet spot is 256 burst
514dc6d8e6SJerin Jacob  size. While on arm64 embedded SoCs, it is either 64 or 128.
5289c67ae2SCiara Power- Disable node statistics (using ``RTE_LIBRTE_GRAPH_STATS`` config option)
534dc6d8e6SJerin Jacob  if not needed.
544dc6d8e6SJerin Jacob
554dc6d8e6SJerin JacobProgramming model
564dc6d8e6SJerin Jacob-----------------
574dc6d8e6SJerin Jacob
584dc6d8e6SJerin JacobAnatomy of Node:
594dc6d8e6SJerin Jacob~~~~~~~~~~~~~~~~
604dc6d8e6SJerin Jacob
614dc6d8e6SJerin Jacob.. _figure_anatomy_of_a_node:
624dc6d8e6SJerin Jacob
634dc6d8e6SJerin Jacob.. figure:: img/anatomy_of_a_node.*
644dc6d8e6SJerin Jacob
65924e7d8fSThomas Monjalon   Anatomy of a node
664dc6d8e6SJerin Jacob
674dc6d8e6SJerin JacobThe node is the basic building block of the graph framework.
684dc6d8e6SJerin Jacob
694dc6d8e6SJerin JacobA node consists of:
704dc6d8e6SJerin Jacob
714dc6d8e6SJerin Jacobprocess():
724dc6d8e6SJerin Jacob^^^^^^^^^^
734dc6d8e6SJerin Jacob
744dc6d8e6SJerin JacobThe callback function will be invoked by worker thread using
754dc6d8e6SJerin Jacob``rte_graph_walk()`` function when there is data to be processed by the node.
764dc6d8e6SJerin JacobA graph node process the function using ``process()`` and enqueue to next
774dc6d8e6SJerin Jacobdownstream node using ``rte_node_enqueue*()`` function.
784dc6d8e6SJerin Jacob
794dc6d8e6SJerin JacobContext memory:
804dc6d8e6SJerin Jacob^^^^^^^^^^^^^^^
814dc6d8e6SJerin Jacob
824dc6d8e6SJerin JacobIt is memory allocated by the library to store the node-specific context
834dc6d8e6SJerin Jacobinformation. This memory will be used by process(), init(), fini() callbacks.
844dc6d8e6SJerin Jacob
854dc6d8e6SJerin Jacobinit():
864dc6d8e6SJerin Jacob^^^^^^^
874dc6d8e6SJerin Jacob
884dc6d8e6SJerin JacobThe callback function will be invoked by ``rte_graph_create()`` on when
894dc6d8e6SJerin Jacoba node gets attached to a graph.
904dc6d8e6SJerin Jacob
914dc6d8e6SJerin Jacobfini():
924dc6d8e6SJerin Jacob^^^^^^^
934dc6d8e6SJerin Jacob
944dc6d8e6SJerin JacobThe callback function will be invoked by ``rte_graph_destroy()`` on when a
954dc6d8e6SJerin Jacobnode gets detached to a graph.
964dc6d8e6SJerin Jacob
974dc6d8e6SJerin JacobNode name:
984dc6d8e6SJerin Jacob^^^^^^^^^^
994dc6d8e6SJerin Jacob
1004dc6d8e6SJerin JacobIt is the name of the node. When a node registers to graph library, the library
1014dc6d8e6SJerin Jacobgives the ID as ``rte_node_t`` type. Both ID or Name shall be used lookup the
1024dc6d8e6SJerin Jacobnode. ``rte_node_from_name()``, ``rte_node_id_to_name()`` are the node
1034dc6d8e6SJerin Jacoblookup functions.
1044dc6d8e6SJerin Jacob
1054dc6d8e6SJerin Jacobnb_edges:
1064dc6d8e6SJerin Jacob^^^^^^^^^
1074dc6d8e6SJerin Jacob
1084dc6d8e6SJerin JacobThe number of downstream nodes connected to this node. The ``next_nodes[]``
1094dc6d8e6SJerin Jacobstores the downstream nodes objects. ``rte_node_edge_update()`` and
1104dc6d8e6SJerin Jacob``rte_node_edge_shrink()`` functions shall be used to update the ``next_node[]``
1114dc6d8e6SJerin Jacobobjects. Consumers of the node APIs are free to update the ``next_node[]``
1124dc6d8e6SJerin Jacobobjects till ``rte_graph_create()`` invoked.
1134dc6d8e6SJerin Jacob
1144dc6d8e6SJerin Jacobnext_node[]:
1154dc6d8e6SJerin Jacob^^^^^^^^^^^^
1164dc6d8e6SJerin Jacob
1174dc6d8e6SJerin JacobThe dynamic array to store the downstream nodes connected to this node. Downstream
1184dc6d8e6SJerin Jacobnode should not be current node itself or a source node.
1194dc6d8e6SJerin Jacob
1204dc6d8e6SJerin JacobSource node:
1214dc6d8e6SJerin Jacob^^^^^^^^^^^^
1224dc6d8e6SJerin Jacob
1234dc6d8e6SJerin JacobSource nodes are static nodes created using ``RTE_NODE_REGISTER`` by passing
1244dc6d8e6SJerin Jacob``flags`` as ``RTE_NODE_SOURCE_F``.
1254dc6d8e6SJerin JacobWhile performing the graph walk, the ``process()`` function of all the source
1264dc6d8e6SJerin Jacobnodes will be called first. So that these nodes can be used as input nodes for a graph.
1274dc6d8e6SJerin Jacob
128*070db97eSPavan Nikhileshnb_xstats:
129*070db97eSPavan Nikhilesh^^^^^^^^^^
130*070db97eSPavan Nikhilesh
131*070db97eSPavan NikhileshThe number of xstats that this node can report.
132*070db97eSPavan NikhileshThe ``xstat_desc[]`` stores the xstat descriptions which will later be propagated to stats.
133*070db97eSPavan Nikhilesh
134*070db97eSPavan Nikhileshxstat_desc[]:
135*070db97eSPavan Nikhilesh^^^^^^^^^^^^^
136*070db97eSPavan Nikhilesh
137*070db97eSPavan NikhileshThe dynamic array to store the xstat descriptions that will be reported by this node.
138*070db97eSPavan Nikhilesh
1394dc6d8e6SJerin JacobNode creation and registration
1404dc6d8e6SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1414dc6d8e6SJerin Jacob* Node implementer creates the node by implementing ops and attributes of
1424dc6d8e6SJerin Jacob  ``struct rte_node_register``.
1434dc6d8e6SJerin Jacob
1444dc6d8e6SJerin Jacob* The library registers the node by invoking RTE_NODE_REGISTER on library load
1454dc6d8e6SJerin Jacob  using the constructor scheme. The constructor scheme used here to support multi-process.
1464dc6d8e6SJerin Jacob
1474dc6d8e6SJerin JacobLink the Nodes to create the graph topology
1484dc6d8e6SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1494dc6d8e6SJerin Jacob.. _figure_link_the_nodes:
1504dc6d8e6SJerin Jacob
1514dc6d8e6SJerin Jacob.. figure:: img/link_the_nodes.*
1524dc6d8e6SJerin Jacob
153924e7d8fSThomas Monjalon   Topology after linking the nodes
1544dc6d8e6SJerin Jacob
1554dc6d8e6SJerin JacobOnce nodes are available to the program, Application or node public API
156*070db97eSPavan Nikhileshfunctions can link them together to create a complex packet processing graph.
1574dc6d8e6SJerin Jacob
1584dc6d8e6SJerin JacobThere are multiple different types of strategies to link the nodes.
1594dc6d8e6SJerin Jacob
1604dc6d8e6SJerin JacobMethod (a):
1614dc6d8e6SJerin Jacob^^^^^^^^^^^
1624dc6d8e6SJerin JacobProvide the ``next_nodes[]`` at the node registration time. See ``struct rte_node_register::nb_edges``.
1634dc6d8e6SJerin JacobThis is a use case to address the static node scheme where one knows upfront the
1644dc6d8e6SJerin Jacob``next_nodes[]`` of the node.
1654dc6d8e6SJerin Jacob
1664dc6d8e6SJerin JacobMethod (b):
1674dc6d8e6SJerin Jacob^^^^^^^^^^^
1684dc6d8e6SJerin JacobUse ``rte_node_edge_get()``, ``rte_node_edge_update()``, ``rte_node_edge_shrink()``
1694dc6d8e6SJerin Jacobto update the ``next_nodes[]`` links for the node runtime but before graph create.
1704dc6d8e6SJerin Jacob
1714dc6d8e6SJerin JacobMethod (c):
1724dc6d8e6SJerin Jacob^^^^^^^^^^^
1734dc6d8e6SJerin JacobUse ``rte_node_clone()`` to clone a already existing node, created using RTE_NODE_REGISTER.
1744dc6d8e6SJerin JacobWhen ``rte_node_clone()`` invoked, The library, would clone all the attributes
1754dc6d8e6SJerin Jacobof the node and creates a new one. The name for cloned node shall be
1764dc6d8e6SJerin Jacob``"parent_node_name-user_provided_name"``.
1774dc6d8e6SJerin Jacob
1784dc6d8e6SJerin JacobThis method enables the use case of Rx and Tx nodes where multiple of those nodes
1794dc6d8e6SJerin Jacobneed to be cloned based on the number of CPU available in the system.
1804dc6d8e6SJerin JacobThe cloned nodes will be identical, except the ``"context memory"``.
1814dc6d8e6SJerin JacobContext memory will have information of port, queue pair in case of Rx and Tx
1824dc6d8e6SJerin Jacobethdev nodes.
1834dc6d8e6SJerin Jacob
1844dc6d8e6SJerin JacobCreate the graph object
1854dc6d8e6SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~
1864dc6d8e6SJerin JacobNow that the nodes are linked, Its time to create a graph by including
1874dc6d8e6SJerin Jacobthe required nodes. The application can provide a set of node patterns to
188b457a9b3SAshwin Sekhar T Kform a graph object. The ``fnmatch()`` API used underneath for the pattern
1894dc6d8e6SJerin Jacobmatching to include the required nodes. After the graph create any changes to
1904dc6d8e6SJerin Jacobnodes or graph is not allowed.
1914dc6d8e6SJerin Jacob
1924dc6d8e6SJerin JacobThe ``rte_graph_create()`` API shall be used to create the graph.
1934dc6d8e6SJerin Jacob
1944dc6d8e6SJerin JacobExample of a graph object creation:
1954dc6d8e6SJerin Jacob
1964dc6d8e6SJerin Jacob.. code-block:: console
1974dc6d8e6SJerin Jacob
1984dc6d8e6SJerin Jacob   {"ethdev_rx-0-0", ip4*, ethdev_tx-*"}
1994dc6d8e6SJerin Jacob
2004dc6d8e6SJerin JacobIn the above example, A graph object will be created with ethdev Rx
2014dc6d8e6SJerin Jacobnode of port 0 and queue 0, all ipv4* nodes in the system,
2024dc6d8e6SJerin Jacoband ethdev tx node of all ports.
2034dc6d8e6SJerin Jacob
2048b78671dSZhirun YanGraph models
2058b78671dSZhirun Yan~~~~~~~~~~~~
2068b78671dSZhirun YanThere are two different kinds of graph walking models. User can select the model using
2078b78671dSZhirun Yan``rte_graph_worker_model_set()`` API. If the application decides to use only one model,
2088b78671dSZhirun Yanthe fast path check can be avoided by defining the model with RTE_GRAPH_MODEL_SELECT.
2098b78671dSZhirun YanFor example:
2108b78671dSZhirun Yan
2118b78671dSZhirun Yan.. code-block:: c
2128b78671dSZhirun Yan
2138b78671dSZhirun Yan  #define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
2148b78671dSZhirun Yan  #include "rte_graph_worker.h"
2158b78671dSZhirun Yan
2168b78671dSZhirun YanRTC (Run-To-Completion)
2178b78671dSZhirun Yan^^^^^^^^^^^^^^^^^^^^^^^
2188b78671dSZhirun YanThis is the default graph walking model. Specifically, ``rte_graph_walk_rtc()`` and
2198b78671dSZhirun Yan``rte_node_enqueue*`` fast path API functions are designed to work on single-core to
2208b78671dSZhirun Yanhave better performance. The fast path API works on graph object, So the multi-core
2218b78671dSZhirun Yangraph processing strategy would be to create graph object PER WORKER.
2228b78671dSZhirun Yan
2238b78671dSZhirun YanExample:
2248b78671dSZhirun Yan
2258b78671dSZhirun YanGraph: node-0 -> node-1 -> node-2 @Core0.
2268b78671dSZhirun Yan
2278b78671dSZhirun Yan.. code-block:: diff
2288b78671dSZhirun Yan
2298b78671dSZhirun Yan    + - - - - - - - - - - - - - - - - - - - - - +
2308b78671dSZhirun Yan    '                  Core #0                  '
2318b78671dSZhirun Yan    '                                           '
2328b78671dSZhirun Yan    ' +--------+     +---------+     +--------+ '
2338b78671dSZhirun Yan    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
2348b78671dSZhirun Yan    ' +--------+     +---------+     +--------+ '
2358b78671dSZhirun Yan    '                                           '
2368b78671dSZhirun Yan    + - - - - - - - - - - - - - - - - - - - - - +
2378b78671dSZhirun Yan
2388b78671dSZhirun YanDispatch model
2398b78671dSZhirun Yan^^^^^^^^^^^^^^
2408b78671dSZhirun YanThe dispatch model enables a cross-core dispatching mechanism which employs
2418b78671dSZhirun Yana scheduling work-queue to dispatch streams to other worker cores which
2428b78671dSZhirun Yanbeing associated with the destination node.
2438b78671dSZhirun Yan
2448b78671dSZhirun YanUse ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
2458b78671dSZhirun Yanwith the node.
2468b78671dSZhirun YanEach worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
2478b78671dSZhirun Yangraph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
2488b78671dSZhirun Yanbind graph with the worker core.
2498b78671dSZhirun Yan
2508b78671dSZhirun YanExample:
2518b78671dSZhirun Yan
2528b78671dSZhirun YanGraph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
2538b78671dSZhirun YanConfig graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
2548b78671dSZhirun Yan
2558b78671dSZhirun Yan.. code-block:: diff
2568b78671dSZhirun Yan
2578b78671dSZhirun Yan    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
2588b78671dSZhirun Yan    '  Core #0   '     '          Core #1         '     '  Core #2   '
2598b78671dSZhirun Yan    '            '     '                          '     '            '
2608b78671dSZhirun Yan    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
2618b78671dSZhirun Yan    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
2628b78671dSZhirun Yan    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
2638b78671dSZhirun Yan    '            '     '     |                    '     '      ^     '
2648b78671dSZhirun Yan    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
2658b78671dSZhirun Yan                             |                                 |
2668b78671dSZhirun Yan                             + - - - - - - - - - - - - - - - - +
2678b78671dSZhirun Yan
2684dc6d8e6SJerin Jacob
2694dc6d8e6SJerin JacobIn fast path
2704dc6d8e6SJerin Jacob~~~~~~~~~~~~
2714dc6d8e6SJerin JacobTypical fast-path code looks like below, where the application
2724dc6d8e6SJerin Jacobgets the fast-path graph object using ``rte_graph_lookup()``
2734dc6d8e6SJerin Jacobon the worker thread and run the ``rte_graph_walk()`` in a tight loop.
2744dc6d8e6SJerin Jacob
2754dc6d8e6SJerin Jacob.. code-block:: c
2764dc6d8e6SJerin Jacob
2774dc6d8e6SJerin Jacob    struct rte_graph *graph = rte_graph_lookup("worker0");
2784dc6d8e6SJerin Jacob
2794dc6d8e6SJerin Jacob    while (!done) {
2804dc6d8e6SJerin Jacob        rte_graph_walk(graph);
2814dc6d8e6SJerin Jacob    }
2824dc6d8e6SJerin Jacob
2834dc6d8e6SJerin JacobContext update when graph walk in action
2844dc6d8e6SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2854dc6d8e6SJerin JacobThe fast-path object for the node is ``struct rte_node``.
2864dc6d8e6SJerin Jacob
2874dc6d8e6SJerin JacobIt may be possible that in slow-path or after the graph walk-in action,
2884dc6d8e6SJerin Jacobthe user needs to update the context of the node hence access to
2894dc6d8e6SJerin Jacob``struct rte_node *`` memory.
2904dc6d8e6SJerin Jacob
2914dc6d8e6SJerin Jacob``rte_graph_foreach_node()``, ``rte_graph_node_get()``,
2924f823975SThomas Monjalon``rte_graph_node_get_by_name()`` APIs can be used to get the
2934dc6d8e6SJerin Jacob``struct rte_node*``. ``rte_graph_foreach_node()`` iterator function works on
2944dc6d8e6SJerin Jacob``struct rte_graph *`` fast-path graph object while others works on graph ID or name.
2954dc6d8e6SJerin Jacob
2964dc6d8e6SJerin JacobGet the node statistics using graph cluster
2974dc6d8e6SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2984dc6d8e6SJerin JacobThe user may need to know the aggregate stats of the node across
2994dc6d8e6SJerin Jacobmultiple graph objects. Especially the situation where each graph object bound
3004dc6d8e6SJerin Jacobto a worker thread.
3014dc6d8e6SJerin Jacob
3024dc6d8e6SJerin JacobIntroduced a graph cluster object for statistics.
3034dc6d8e6SJerin Jacob``rte_graph_cluster_stats_create()`` API shall be used for creating a
3044dc6d8e6SJerin Jacobgraph cluster with multiple graph objects and ``rte_graph_cluster_stats_get()``
3054dc6d8e6SJerin Jacobto get the aggregate node statistics.
3064dc6d8e6SJerin Jacob
3074dc6d8e6SJerin JacobAn example statistics output from ``rte_graph_cluster_stats_get()``
3084dc6d8e6SJerin Jacob
3094dc6d8e6SJerin Jacob.. code-block:: diff
3104dc6d8e6SJerin Jacob
3114dc6d8e6SJerin Jacob    +---------+-----------+-------------+---------------+-----------+---------------+-----------+
3124dc6d8e6SJerin Jacob    |Node     |calls      |objs         |realloc_count  |objs/call  |objs/sec(10E6) |cycles/call|
3134dc6d8e6SJerin Jacob    +---------------------+-------------+---------------+-----------+---------------+-----------+
3144dc6d8e6SJerin Jacob    |node0    |12977424   |3322220544   |5              |256.000    |3047.151872    |20.0000    |
3154dc6d8e6SJerin Jacob    |node1    |12977653   |3322279168   |0              |256.000    |3047.210496    |17.0000    |
3164dc6d8e6SJerin Jacob    |node2    |12977696   |3322290176   |0              |256.000    |3047.221504    |17.0000    |
3174dc6d8e6SJerin Jacob    |node3    |12977734   |3322299904   |0              |256.000    |3047.231232    |17.0000    |
3184dc6d8e6SJerin Jacob    |node4    |12977784   |3322312704   |1              |256.000    |3047.243776    |17.0000    |
3194dc6d8e6SJerin Jacob    |node5    |12977825   |3322323200   |0              |256.000    |3047.254528    |17.0000    |
3204dc6d8e6SJerin Jacob    +---------+-----------+-------------+---------------+-----------+---------------+-----------+
3214dc6d8e6SJerin Jacob
3224dc6d8e6SJerin JacobNode writing guidelines
3234dc6d8e6SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~
3244dc6d8e6SJerin Jacob
3254dc6d8e6SJerin JacobThe ``process()`` function of a node is the fast-path function and that needs
3264dc6d8e6SJerin Jacobto be written carefully to achieve max performance.
3274dc6d8e6SJerin Jacob
3284dc6d8e6SJerin JacobBroadly speaking, there are two different types of nodes.
3294dc6d8e6SJerin Jacob
3304dc6d8e6SJerin JacobStatic nodes
3314dc6d8e6SJerin Jacob~~~~~~~~~~~~
3324dc6d8e6SJerin JacobThe first kind of nodes are those that have a fixed ``next_nodes[]`` for the
3334dc6d8e6SJerin Jacobcomplete burst (like ethdev_rx, ethdev_tx) and it is simple to write.
3344dc6d8e6SJerin Jacob``process()`` function can move the obj burst to the next node either using
3354dc6d8e6SJerin Jacob``rte_node_next_stream_move()`` or using ``rte_node_next_stream_get()`` and
3364dc6d8e6SJerin Jacob``rte_node_next_stream_put()``.
3374dc6d8e6SJerin Jacob
3384dc6d8e6SJerin JacobIntermediate nodes
3394dc6d8e6SJerin Jacob~~~~~~~~~~~~~~~~~~
3404dc6d8e6SJerin JacobThe second kind of such node is ``intermediate nodes`` that decide what is the
3414dc6d8e6SJerin Jacob``next_node[]`` to send to on a per-packet basis. In these nodes,
3424dc6d8e6SJerin Jacob
3434dc6d8e6SJerin Jacob* Firstly, there has to be the best possible packet processing logic.
3444dc6d8e6SJerin Jacob
3454dc6d8e6SJerin Jacob* Secondly, each packet needs to be queued to its next node.
3464dc6d8e6SJerin Jacob
3474dc6d8e6SJerin JacobThis can be done using ``rte_node_enqueue_[x1|x2|x4]()`` APIs if
3484dc6d8e6SJerin Jacobthey are to single next or ``rte_node_enqueue_next()`` that takes array of nexts.
3494dc6d8e6SJerin Jacob
3504dc6d8e6SJerin JacobIn scenario where multiple intermediate nodes are present but most of the time
3514dc6d8e6SJerin Jacobeach node using the same next node for all its packets, the cost of moving every
3524dc6d8e6SJerin Jacobpointer from current node's stream to next node's stream could be avoided.
3534dc6d8e6SJerin JacobThis is called home run and ``rte_node_next_stream_move()`` could be used to
3544dc6d8e6SJerin Jacobjust move stream from the current node to the next node with least number of cycles.
3554dc6d8e6SJerin JacobSince this can be avoided only in the case where all the packets are destined
3564dc6d8e6SJerin Jacobto the same next node, node implementation should be also having worst-case
3574dc6d8e6SJerin Jacobhandling where every packet could be going to different next node.
3584dc6d8e6SJerin Jacob
3594dc6d8e6SJerin JacobExample of intermediate node implementation with home run:
3604dc6d8e6SJerin Jacob^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
361443b949eSDavid Marchand
362443b949eSDavid Marchand#. Start with speculation that next_node = node->ctx.
3634dc6d8e6SJerin Jacob   This could be the next_node application used in the previous function call of this node.
3644dc6d8e6SJerin Jacob
365443b949eSDavid Marchand#. Get the next_node stream array with required space using
3664dc6d8e6SJerin Jacob   ``rte_node_next_stream_get(next_node, space)``.
3674dc6d8e6SJerin Jacob
368443b949eSDavid Marchand#. while n_left_from > 0 (i.e packets left to be sent) prefetch next pkt_set
3694dc6d8e6SJerin Jacob   and process current pkt_set to find their next node
3704dc6d8e6SJerin Jacob
371443b949eSDavid Marchand#. if all the next nodes of the current pkt_set match speculated next node,
3724dc6d8e6SJerin Jacob   just count them as successfully speculated(``last_spec``) till now and
3734dc6d8e6SJerin Jacob   continue the loop without actually moving them to the next node. else if there is
3744dc6d8e6SJerin Jacob   a mismatch, copy all the pkt_set pointers that were ``last_spec`` and move the
3754dc6d8e6SJerin Jacob   current pkt_set to their respective next's nodes using ``rte_enqueue_next_x1()``.
3764dc6d8e6SJerin Jacob   Also, one of the next_node can be updated as speculated next_node if it is more
3774dc6d8e6SJerin Jacob   probable. Finally, reset ``last_spec`` to zero.
3784dc6d8e6SJerin Jacob
379443b949eSDavid Marchand#. if n_left_from != 0 then goto 3) to process remaining packets.
3804dc6d8e6SJerin Jacob
381443b949eSDavid Marchand#. if last_spec == nb_objs, All the objects passed were successfully speculated
3824dc6d8e6SJerin Jacob   to single next node. So, the current stream can be moved to next node using
3834dc6d8e6SJerin Jacob   ``rte_node_next_stream_move(node, next_node)``.
3844dc6d8e6SJerin Jacob   This is the ``home run`` where memcpy of buffer pointers to next node is avoided.
3854dc6d8e6SJerin Jacob
386443b949eSDavid Marchand#. Update the ``node->ctx`` with more probable next node.
3874dc6d8e6SJerin Jacob
3884dc6d8e6SJerin JacobGraph object memory layout
3894dc6d8e6SJerin Jacob--------------------------
3904dc6d8e6SJerin Jacob.. _figure_graph_mem_layout:
3914dc6d8e6SJerin Jacob
3924dc6d8e6SJerin Jacob.. figure:: img/graph_mem_layout.*
3934dc6d8e6SJerin Jacob
394924e7d8fSThomas Monjalon   Memory layout
395924e7d8fSThomas Monjalon
396924e7d8fSThomas MonjalonUnderstanding the memory layout helps to debug the graph library and
3974dc6d8e6SJerin Jacobimprove the performance if needed.
3984dc6d8e6SJerin Jacob
399*070db97eSPavan NikhileshGraph object consists of a header, circular buffer to store the pending stream
400*070db97eSPavan Nikhileshwhen walking over the graph, variable-length memory to store the ``rte_node`` objects,
401*070db97eSPavan Nikhileshand variable-length memory to store the xstat reported by each ``rte_node``.
4024dc6d8e6SJerin Jacob
4034dc6d8e6SJerin JacobThe graph_nodes_mem_create() creates and populate this memory. The functions
4044dc6d8e6SJerin Jacobsuch as ``rte_graph_walk()`` and ``rte_node_enqueue_*`` use this memory
4054dc6d8e6SJerin Jacobto enable fastpath services.
4064dc6d8e6SJerin Jacob
4074dc6d8e6SJerin JacobInbuilt Nodes
4084dc6d8e6SJerin Jacob-------------
4094dc6d8e6SJerin Jacob
410597f51c3SJerin JacobDPDK provides a set of nodes for data processing.
411597f51c3SJerin JacobThe following diagram depicts inbuilt nodes data flow.
412597f51c3SJerin Jacob
413597f51c3SJerin Jacob.. _figure_graph_inbuit_node_flow:
414597f51c3SJerin Jacob
415597f51c3SJerin Jacob.. figure:: img/graph_inbuilt_node_flow.*
416597f51c3SJerin Jacob
417597f51c3SJerin Jacob   Inbuilt nodes data flow
418597f51c3SJerin Jacob
419597f51c3SJerin JacobFollowing section details the documentation for individual inbuilt node.
4204dc6d8e6SJerin Jacob
4214dc6d8e6SJerin Jacobethdev_rx
4224dc6d8e6SJerin Jacob~~~~~~~~~
4234dc6d8e6SJerin JacobThis node does ``rte_eth_rx_burst()`` into stream buffer passed to it
4244dc6d8e6SJerin Jacob(src node stream) and does ``rte_node_next_stream_move()`` only when
4254dc6d8e6SJerin Jacobthere are packets received. Each ``rte_node`` works only on one Rx port and
4264dc6d8e6SJerin Jacobqueue that it gets from node->ctx. For each (port X, rx_queue Y),
4274dc6d8e6SJerin Jacoba rte_node is cloned from  ethdev_rx_base_node as ``ethdev_rx-X-Y`` in
4284dc6d8e6SJerin Jacob``rte_node_eth_config()`` along with updating ``node->ctx``.
4294dc6d8e6SJerin JacobEach graph needs to be associated  with a unique rte_node for a (port, rx_queue).
4304dc6d8e6SJerin Jacob
4314dc6d8e6SJerin Jacobethdev_tx
4324dc6d8e6SJerin Jacob~~~~~~~~~
4334dc6d8e6SJerin JacobThis node does ``rte_eth_tx_burst()`` for a burst of objs received by it.
4344dc6d8e6SJerin JacobIt sends the burst to a fixed Tx Port and Queue information from
4354dc6d8e6SJerin Jacobnode->ctx. For each (port X), this ``rte_node`` is cloned from
4364dc6d8e6SJerin Jacobethdev_tx_node_base as "ethdev_tx-X" in ``rte_node_eth_config()``
4374dc6d8e6SJerin Jacobalong with updating node->context.
4384dc6d8e6SJerin Jacob
4394dc6d8e6SJerin JacobSince each graph doesn't need more than one Txq, per port, a Txq is assigned
4404dc6d8e6SJerin Jacobbased on graph id to each rte_node instance. Each graph needs to be associated
4414dc6d8e6SJerin Jacobwith a rte_node for each (port).
4424dc6d8e6SJerin Jacob
4434dc6d8e6SJerin Jacobpkt_drop
4444dc6d8e6SJerin Jacob~~~~~~~~
4454dc6d8e6SJerin JacobThis node frees all the objects passed to it considering them as
4464dc6d8e6SJerin Jacob``rte_mbufs`` that need to be freed.
4474dc6d8e6SJerin Jacob
4484dc6d8e6SJerin Jacobip4_lookup
4494dc6d8e6SJerin Jacob~~~~~~~~~~
4504dc6d8e6SJerin JacobThis node is an intermediate node that does LPM lookup for the received
4514dc6d8e6SJerin Jacobipv4 packets and the result determines each packets next node.
4524dc6d8e6SJerin Jacob
4534dc6d8e6SJerin JacobOn successful LPM lookup, the result contains the ``next_node`` id and
4544dc6d8e6SJerin Jacob``next-hop`` id with which the packet needs to be further processed.
4554dc6d8e6SJerin Jacob
4564dc6d8e6SJerin JacobOn LPM lookup failure, objects are redirected to pkt_drop node.
4574dc6d8e6SJerin Jacob``rte_node_ip4_route_add()`` is control path API to add ipv4 routes.
4584dc6d8e6SJerin JacobTo achieve home run, node use ``rte_node_stream_move()`` as mentioned in above
4594dc6d8e6SJerin Jacobsections.
4604dc6d8e6SJerin Jacob
4614dc6d8e6SJerin Jacobip4_rewrite
4624dc6d8e6SJerin Jacob~~~~~~~~~~~
4634dc6d8e6SJerin JacobThis node gets packets from ``ip4_lookup`` node with next-hop id for each
4644dc6d8e6SJerin Jacobpacket is embedded in ``node_mbuf_priv1(mbuf)->nh``. This id is used
4654dc6d8e6SJerin Jacobto determine the L2 header to be written to the packet before sending
4664dc6d8e6SJerin Jacobthe packet out to a particular ethdev_tx node.
4674dc6d8e6SJerin Jacob``rte_node_ip4_rewrite_add()`` is control path API to add next-hop info.
4684dc6d8e6SJerin Jacob
4690124e18fSPavan Nikhileship4_reassembly
4700124e18fSPavan Nikhilesh~~~~~~~~~~~~~~
4710124e18fSPavan NikhileshThis node is an intermediate node that reassembles ipv4 fragmented packets,
4720124e18fSPavan Nikhileshnon-fragmented packets pass through the node un-effected.
4730124e18fSPavan NikhileshThe node rewrites its stream and moves it to the next node.
4740124e18fSPavan NikhileshThe fragment table and death row table should be setup via the
4750124e18fSPavan Nikhilesh``rte_node_ip4_reassembly_configure`` API.
4760124e18fSPavan Nikhilesh
47720365d79SSunil Kumar Koriip6_lookup
47820365d79SSunil Kumar Kori~~~~~~~~~~
47920365d79SSunil Kumar KoriThis node is an intermediate node that does LPM lookup for the received
48020365d79SSunil Kumar KoriIPv6 packets and the result determines each packets next node.
48120365d79SSunil Kumar Kori
48220365d79SSunil Kumar KoriOn successful LPM lookup, the result contains the ``next_node`` ID
48320365d79SSunil Kumar Koriand `next-hop`` ID with which the packet needs to be further processed.
48420365d79SSunil Kumar Kori
48520365d79SSunil Kumar KoriOn LPM lookup failure, objects are redirected to ``pkt_drop`` node.
48620365d79SSunil Kumar Kori``rte_node_ip6_route_add()`` is control path API to add IPv6 routes.
48720365d79SSunil Kumar KoriTo achieve home run, node use ``rte_node_stream_move()``
48820365d79SSunil Kumar Korias mentioned in above sections.
48920365d79SSunil Kumar Kori
49016ac29cbSAmit Prakash Shuklaip6_rewrite
49116ac29cbSAmit Prakash Shukla~~~~~~~~~~~
49216ac29cbSAmit Prakash ShuklaThis node gets packets from ``ip6_lookup`` node with next-hop ID
49316ac29cbSAmit Prakash Shuklafor each packet is embedded in ``node_mbuf_priv1(mbuf)->nh``.
49416ac29cbSAmit Prakash ShuklaThis ID is used to determine the L2 header to be written to the packet
49516ac29cbSAmit Prakash Shuklabefore sending the packet out to a particular ``ethdev_tx`` node.
49616ac29cbSAmit Prakash Shukla``rte_node_ip6_rewrite_add()`` is control path API to add next-hop info.
49716ac29cbSAmit Prakash Shukla
4984dc6d8e6SJerin Jacobnull
4994dc6d8e6SJerin Jacob~~~~
5004dc6d8e6SJerin JacobThis node ignores the set of objects passed to it and reports that all are
5014dc6d8e6SJerin Jacobprocessed.
5022a0ae651SVamsi Attunuru
5032a0ae651SVamsi Attunurukernel_tx
5042a0ae651SVamsi Attunuru~~~~~~~~~
5052a0ae651SVamsi AttunuruThis node is an exit node that forwards the packets to kernel.
5062a0ae651SVamsi AttunuruIt will be used to forward any control plane traffic to kernel stack from DPDK.
5072a0ae651SVamsi AttunuruIt uses a raw socket interface to transmit the packets,
5082a0ae651SVamsi Attunuruit uses the packet's destination IP address in sockaddr_in address structure
5092a0ae651SVamsi Attunuruand ``sendto`` function to send data on the raw socket.
5102a0ae651SVamsi AttunuruAfter sending the burst of packets to kernel,
5112a0ae651SVamsi Attunuruthis node frees up the packet buffers.
5122d0cf6a7SVamsi Attunuru
5132d0cf6a7SVamsi Attunurukernel_rx
5142d0cf6a7SVamsi Attunuru~~~~~~~~~
5152d0cf6a7SVamsi AttunuruThis node is a source node which receives packets from kernel
5162d0cf6a7SVamsi Attunuruand forwards to any of the intermediate nodes.
5172d0cf6a7SVamsi AttunuruIt uses the raw socket interface to receive packets from kernel.
5182d0cf6a7SVamsi AttunuruUses ``poll`` function to poll on the socket fd
5192d0cf6a7SVamsi Attunurufor ``POLLIN`` events to read the packets from raw socket
5202d0cf6a7SVamsi Attunuruto stream buffer and does ``rte_node_next_stream_move()``
5212d0cf6a7SVamsi Attunuruwhen there are received packets.
52258fbbccaSRakesh Kudurumalla
52358fbbccaSRakesh Kudurumallaip4_local
52458fbbccaSRakesh Kudurumalla~~~~~~~~~
52558fbbccaSRakesh KudurumallaThis node is an intermediate node that does ``packet_type`` lookup for
52658fbbccaSRakesh Kudurumallathe received ipv4 packets and the result determines each packets next node.
52758fbbccaSRakesh Kudurumalla
52858fbbccaSRakesh KudurumallaOn successful ``packet_type`` lookup, for any IPv4 protocol the result
52958fbbccaSRakesh Kudurumallacontains the ``next_node`` id and ``next-hop`` id with which the packet
53058fbbccaSRakesh Kudurumallaneeds to be further processed.
53158fbbccaSRakesh Kudurumalla
53258fbbccaSRakesh KudurumallaOn packet_type lookup failure, objects are redirected to ``pkt_drop`` node.
53358fbbccaSRakesh Kudurumalla``rte_node_ip4_route_add()`` is control path API to add ipv4 address with 32 bit
53458fbbccaSRakesh Kudurumalladepth to receive to packets.
53558fbbccaSRakesh KudurumallaTo achieve home run, node use ``rte_node_stream_move()`` as mentioned in above
53658fbbccaSRakesh Kudurumallasections.
53758fbbccaSRakesh Kudurumalla
53858fbbccaSRakesh Kudurumallaudp4_input
53958fbbccaSRakesh Kudurumalla~~~~~~~~~~
54058fbbccaSRakesh KudurumallaThis node is an intermediate node that does udp destination port lookup for
54158fbbccaSRakesh Kudurumallathe received ipv4 packets and the result determines each packets next node.
54258fbbccaSRakesh Kudurumalla
54358fbbccaSRakesh KudurumallaUser registers a new node ``udp4_input`` into graph library during initialization
54458fbbccaSRakesh Kudurumallaand attach user specified node as edege to this node using
54558fbbccaSRakesh Kudurumalla``rte_node_udp4_usr_node_add()``, and create empty hash table with destination
54658fbbccaSRakesh Kudurumallaport and node id as its feilds.
54758fbbccaSRakesh Kudurumalla
54858fbbccaSRakesh KudurumallaAfter successful addition of user node as edege, edge id is returned to the user.
54958fbbccaSRakesh Kudurumalla
55058fbbccaSRakesh KudurumallaUser would register ``ip4_lookup`` table with specified ip address and 32 bit as mask
55158fbbccaSRakesh Kudurumallafor ip filtration using api ``rte_node_ip4_route_add()``.
55258fbbccaSRakesh Kudurumalla
55358fbbccaSRakesh KudurumallaAfter graph is created user would update hash table with custom port with
55458fbbccaSRakesh Kudurumallaand previously obtained edge id using API ``rte_node_udp4_dst_port_add()``.
55558fbbccaSRakesh Kudurumalla
55658fbbccaSRakesh KudurumallaWhen packet is received lpm look up is performed if ip is matched the packet
55758fbbccaSRakesh Kudurumallais handed over to ip4_local node, then packet is verified for udp proto and
55858fbbccaSRakesh Kudurumallaon success packet is enqueued to ``udp4_input`` node.
55958fbbccaSRakesh Kudurumalla
56058fbbccaSRakesh KudurumallaHash lookup is performed in ``udp4_input`` node with registered destination port
56158fbbccaSRakesh Kudurumallaand destination port in UDP packet , on success packet is handed to ``udp_user_node``.
562