1fb578518SFranco Fichtner.\" Copyright (c) 2011-2013 Matteo Landi, Luigi Rizzo, Universita` di Pisa 2fb578518SFranco Fichtner.\" All rights reserved. 3fb578518SFranco Fichtner.\" 4fb578518SFranco Fichtner.\" Redistribution and use in source and binary forms, with or without 5fb578518SFranco Fichtner.\" modification, are permitted provided that the following conditions 6fb578518SFranco Fichtner.\" are met: 7fb578518SFranco Fichtner.\" 1. Redistributions of source code must retain the above copyright 8fb578518SFranco Fichtner.\" notice, this list of conditions and the following disclaimer. 9fb578518SFranco Fichtner.\" 2. Redistributions in binary form must reproduce the above copyright 10fb578518SFranco Fichtner.\" notice, this list of conditions and the following disclaimer in the 11fb578518SFranco Fichtner.\" documentation and/or other materials provided with the distribution. 12fb578518SFranco Fichtner.\" 13fb578518SFranco Fichtner.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14fb578518SFranco Fichtner.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15fb578518SFranco Fichtner.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16fb578518SFranco Fichtner.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17fb578518SFranco Fichtner.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18fb578518SFranco Fichtner.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19fb578518SFranco Fichtner.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20fb578518SFranco Fichtner.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21fb578518SFranco Fichtner.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22fb578518SFranco Fichtner.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23fb578518SFranco Fichtner.\" SUCH DAMAGE. 24fb578518SFranco Fichtner.\" 25fb578518SFranco Fichtner.\" This document is derived in part from the enet man page (enet.4) 26fb578518SFranco Fichtner.\" distributed with 4.3BSD Unix. 27fb578518SFranco Fichtner.\" 28fb578518SFranco Fichtner.\" $FreeBSD: head/share/man/man4/netmap.4 228017 2011-11-27 06:55:57Z gjb $ 29fb578518SFranco Fichtner.\" 30*f933b737SSascha Wildner.Dd May 25, 2019 31fb578518SFranco Fichtner.Dt NETMAP 4 32fb578518SFranco Fichtner.Os 33fb578518SFranco Fichtner.Sh NAME 34fb578518SFranco Fichtner.Nm netmap 35fb578518SFranco Fichtner.Nd a framework for fast packet I/O 36fb578518SFranco Fichtner.Sh SYNOPSIS 37fb578518SFranco Fichtner.Cd device netmap 38fb578518SFranco Fichtner.Sh DESCRIPTION 39fb578518SFranco Fichtner.Nm 40fb578518SFranco Fichtneris a framework for extremely fast and efficient packet I/O 41fb578518SFranco Fichtner(reaching 14.88 Mpps with a single core at less than 1 GHz) 42fb578518SFranco Fichtnerfor both userspace and kernel clients. 437c417b37SFranco FichtnerUserspace clients can use the 447c417b37SFranco Fichtner.Nm 457c417b37SFranco FichtnerAPI 46fb578518SFranco Fichtnerto send and receive raw packets through physical interfaces 47fb578518SFranco Fichtneror ports of the 487c417b37SFranco Fichtner.Xr vale 4 49fb578518SFranco Fichtnerswitch. 50fb578518SFranco Fichtner.Pp 517c417b37SFranco Fichtner.Xr vale 4 52fb578518SFranco Fichtneris a very fast (reaching 20 Mpps per port) 53fb578518SFranco Fichtnerand modular software switch, 54fb578518SFranco Fichtnerimplemented within the kernel, which can interconnect 55fb578518SFranco Fichtnervirtual ports, physical devices, and the native host stack. 56fb578518SFranco Fichtner.Pp 57fb578518SFranco Fichtner.Nm 58fb578518SFranco Fichtneruses a memory mapped region to share packet buffers, 59fb578518SFranco Fichtnerdescriptors and queues with the kernel. 607c417b37SFranco Fichtner.Xr ioctl 2 617c417b37SFranco Fichtneris used to bind interfaces/ports to file descriptors and 62fb578518SFranco Fichtnerimplement non-blocking I/O, whereas blocking I/O uses 637c417b37SFranco Fichtner.Xr select 2 647c417b37SFranco Fichtnerand 657c417b37SFranco Fichtner.Xr poll 2 . 66fb578518SFranco Fichtner.Nm 67fb578518SFranco Fichtnercan exploit the parallelism in multiqueue devices and 68fb578518SFranco Fichtnermulticore systems. 69fb578518SFranco Fichtner.Pp 70fb578518SFranco FichtnerFor the best performance, 71fb578518SFranco Fichtner.Nm 72fb578518SFranco Fichtnerrequires explicit support in device drivers; 73fb578518SFranco Fichtnera generic emulation layer is available to implement the 74fb578518SFranco Fichtner.Nm 75fb578518SFranco FichtnerAPI on top of unmodified device drivers, 76fb578518SFranco Fichtnerat the price of reduced performance 77fb578518SFranco Fichtner(but still better than what can be achieved with 787c417b37SFranco Fichtner.Xr socket 2 , 797c417b37SFranco Fichtner.Xr bpf 4 , 807c417b37SFranco Fichtneror 817c417b37SFranco Fichtner.Xr pcap 3 ) . 82fb578518SFranco Fichtner.Pp 83fb578518SFranco FichtnerFor a list of devices with native 84fb578518SFranco Fichtner.Nm 857c417b37SFranco Fichtnersupport, see section 867c417b37SFranco Fichtner.Sx SUPPORTED INTERFACES 877c417b37SFranco Fichtnerat the end of this manual page. 887c417b37SFranco Fichtner.Sh OPERATING THE API 89fb578518SFranco Fichtner.Nm 907c417b37SFranco Fichtnerclients must first issue the following code to open the device 917c417b37SFranco Fichtnernode and to bind the file descriptor to a specific interface or port: 927c417b37SFranco Fichtner.Bd -literal -offset indent 937c417b37SFranco Fichtnerfd = open("/dev/netmap"); 947c417b37SFranco Fichtnerioctl(fd, NIOCREGIF, (struct nmreq *)arg); 957c417b37SFranco Fichtner.Ed 967c417b37SFranco Fichtner.Pp 97fb578518SFranco Fichtner.Nm 98fb578518SFranco Fichtnerhas multiple modes of operation controlled by the 99fb578518SFranco Fichtnercontent of the 1007c417b37SFranco Fichtner.Vt struct nmreq 1017c417b37SFranco Fichtnerpassed to 1027c417b37SFranco Fichtner.Xr ioctl 2 . 103fb578518SFranco FichtnerIn particular, the 1047c417b37SFranco Fichtner.Va nr_name 105fb578518SFranco Fichtnerfield specifies whether the client operates on a physical network 106fb578518SFranco Fichtnerinterface or on a port of a 1077c417b37SFranco Fichtner.Xr vale 4 1087c417b37SFranco Fichtnerswitch, as indicated below. 1097c417b37SFranco FichtnerAdditional fields in the 1107c417b37SFranco Fichtner.Vt struct nmreq 111fb578518SFranco Fichtnercontrol the details of operation. 112fb578518SFranco Fichtner.Bl -tag -width XXXX 1137c417b37SFranco Fichtner.It Sy Interface name (e.g. 'em0', 'eth1', ...) 114fb578518SFranco FichtnerThe data path of the interface is disconnected from the host stack. 115fb578518SFranco FichtnerDepending on additional arguments, 116fb578518SFranco Fichtnerthe file descriptor is bound to the NIC (one or all queues), 117fb578518SFranco Fichtneror to the host stack. 1187c417b37SFranco Fichtner.It Sy valeXXX:YYY (arbitrary XXX and YYY) 1197c417b37SFranco FichtnerThe file descriptor is bound to port YYY of a 1207c417b37SFranco Fichtner.Xr vale 4 1217c417b37SFranco Fichtnerswitch called XXX, 122fb578518SFranco Fichtnerwhere XXX and YYY are arbitrary alphanumeric strings. 123fb578518SFranco FichtnerThe string cannot exceed IFNAMSIZ characters, and YYY cannot 124fb578518SFranco Fichtnermatching the name of any existing interface. 125fb578518SFranco Fichtner.Pp 126fb578518SFranco FichtnerThe switch and the port are created if not existing. 1277c417b37SFranco Fichtner.It Sy valeXXX:ifname (ifname is an existing interface) 128fb578518SFranco FichtnerFlags in the argument control whether the physical interface 1297c417b37SFranco Fichtner(and optionally the corresponding host stack endpoint) 1307c417b37SFranco Fichtnerare connected or disconnected from the 1317c417b37SFranco Fichtner.Xr vale 4 1327c417b37SFranco Fichtnerswitch named XXX. 133fb578518SFranco Fichtner.Pp 1347c417b37SFranco FichtnerIn this case 1357c417b37SFranco Fichtner.Xr ioctl 2 1367c417b37SFranco Fichtneris used only for configuring the 1377c417b37SFranco Fichtner.Xr vale 4 1387c417b37SFranco Fichtnerswitch, typically through the 1397c417b37SFranco Fichtner.Cm vale-ctl 140fb578518SFranco Fichtnercommand. 1417c417b37SFranco FichtnerThe file descriptor cannot be used for I/O, and should be passed to 1427c417b37SFranco Fichtner.Xr close 2 1437c417b37SFranco Fichtnerafter issuing 1447c417b37SFranco Fichtner.Xr ioctl 2 . 145fb578518SFranco Fichtner.El 146fb578518SFranco Fichtner.Pp 147fb578518SFranco FichtnerThe binding can be removed (and the interface returns to 148fb578518SFranco Fichtnerregular operation, or the virtual port destroyed) with a 1497c417b37SFranco Fichtner.Xr close 2 150fb578518SFranco Fichtneron the file descriptor. 151fb578518SFranco Fichtner.Pp 152fb578518SFranco FichtnerThe processes owning the file descriptor can then 1537c417b37SFranco Fichtner.Xr mmap 2 154fb578518SFranco Fichtnerthe memory region that contains pre-allocated 155fb578518SFranco Fichtnerbuffers, descriptors and queues, and use them to 156fb578518SFranco Fichtnerread/write raw packets. 157fb578518SFranco FichtnerNon blocking I/O is done with special 1587c417b37SFranco Fichtner.Xr ioctl 2 1597c417b37SFranco Fichtnercommands, whereas the file descriptor can be passed to 1607c417b37SFranco Fichtner.Xr select 2 1617c417b37SFranco Fichtnerand 1627c417b37SFranco Fichtner.Xr poll 2 163fb578518SFranco Fichtnerto be notified about incoming packet or available transmit buffers. 164fb578518SFranco Fichtner.Ss DATA STRUCTURES 165fb578518SFranco FichtnerThe data structures in the mmapped memory are described below 166fb578518SFranco Fichtner(see 167*f933b737SSascha Wildner.In net/netmap/netmap.h 168fb578518SFranco Fichtnerfor reference). 169fb578518SFranco FichtnerAll physical devices operating in 170fb578518SFranco Fichtner.Nm 171fb578518SFranco Fichtnermode use the same memory region, 172fb578518SFranco Fichtnershared by the kernel and all processes who own 173fb578518SFranco Fichtner.Pa /dev/netmap 174fb578518SFranco Fichtnerdescriptors bound to those devices 175fb578518SFranco Fichtner(NOTE: visibility may be restricted in future implementations). 176fb578518SFranco FichtnerVirtual ports instead use separate memory regions, 177fb578518SFranco Fichtnershared only with the kernel. 178fb578518SFranco Fichtner.Pp 179fb578518SFranco FichtnerAll references between the shared data structure 1807c417b37SFranco Fichtnerare relative (offsets or indexes). 1817c417b37SFranco FichtnerSome macros help converting 182fb578518SFranco Fichtnerthem into actual pointers. 1837c417b37SFranco Fichtner.Bl -tag -width XXXX 1847c417b37SFranco Fichtner.It Sy struct netmap_if (one per interface) 185fb578518SFranco Fichtnerindicates the number of rings supported by an interface, their 186fb578518SFranco Fichtnersizes, and the offsets of the 1877c417b37SFranco Fichtner.Nm 1887c417b37SFranco Fichtnerrings associated to the interface. 189fb578518SFranco Fichtner.Pp 1907c417b37SFranco Fichtner.Vt struct netmap_if 191fb578518SFranco Fichtneris at offset 1927c417b37SFranco Fichtner.Va nr_offset 1937c417b37SFranco Fichtnerin the shared memory region indicated by the 1947c417b37SFranco Fichtnerfield in the structure returned by 1957c417b37SFranco Fichtner.Dv NIOCREGIF . 196fb578518SFranco Fichtner.Bd -literal 197fb578518SFranco Fichtnerstruct netmap_if { 198fb578518SFranco Fichtner char ni_name[IFNAMSIZ]; /* name of the interface. */ 199fb578518SFranco Fichtner const u_int ni_version; /* API version */ 200fb578518SFranco Fichtner const u_int ni_rx_rings; /* number of rx ring pairs */ 201fb578518SFranco Fichtner const u_int ni_tx_rings; /* if 0, same as ni_rx_rings */ 202fb578518SFranco Fichtner const ssize_t ring_ofs[]; /* offset of tx and rx rings */ 203fb578518SFranco Fichtner}; 204fb578518SFranco Fichtner.Ed 2057c417b37SFranco Fichtner.It Sy struct netmap_ring (one per ring) 206fb578518SFranco FichtnerContains the positions in the transmit and receive rings to 207fb578518SFranco Fichtnersynchronize the kernel and the application, 208fb578518SFranco Fichtnerand an array of 2097c417b37SFranco Fichtner.Nm 2107c417b37SFranco Fichtnerslots describing the buffers. 2117c417b37SFranco Fichtner.Va reserved 2127c417b37SFranco Fichtneris used in receive rings to tell the kernel the number of slots after 2137c417b37SFranco Fichtner.Va cur 2147c417b37SFranco Fichtnerthat are still in use indicates how many slots starting from 2157c417b37SFranco Fichtner.Va cur 216fb578518SFranco Fichtnerthe 2177c417b37SFranco Fichtner.\" XXX Fix and finish this sentence? 218fb578518SFranco Fichtner.Pp 219fb578518SFranco FichtnerEach physical interface has one 2207c417b37SFranco Fichtner.Vt struct netmap_ring 221fb578518SFranco Fichtnerfor each hardware transmit and receive ring, 222fb578518SFranco Fichtnerplus one extra transmit and one receive structure 223fb578518SFranco Fichtnerthat connect to the host stack. 224fb578518SFranco Fichtner.Bd -literal 225fb578518SFranco Fichtnerstruct netmap_ring { 226fb578518SFranco Fichtner const ssize_t buf_ofs; /* see details */ 227fb578518SFranco Fichtner const uint32_t num_slots; /* number of slots in the ring */ 228fb578518SFranco Fichtner uint32_t avail; /* number of usable slots */ 229fb578518SFranco Fichtner uint32_t cur; /* 'current' read/write index */ 230fb578518SFranco Fichtner uint32_t reserved; /* not refilled before current */ 231fb578518SFranco Fichtner 232fb578518SFranco Fichtner const uint16_t nr_buf_size; 233fb578518SFranco Fichtner uint16_t flags; 234fb578518SFranco Fichtner#define NR_TIMESTAMP 0x0002 /* set timestamp on *sync() */ 235fb578518SFranco Fichtner#define NR_FORWARD 0x0004 /* enable NS_FORWARD for ring */ 236fb578518SFranco Fichtner#define NR_RX_TSTMP 0x0008 /* set rx timestamp in slots */ 237fb578518SFranco Fichtner struct timeval ts; 238fb578518SFranco Fichtner struct netmap_slot slot[0]; /* array of slots */ 239fb578518SFranco Fichtner} 240fb578518SFranco Fichtner.Ed 241fb578518SFranco Fichtner.Pp 2427c417b37SFranco FichtnerIn transmit rings, after a system call 2437c417b37SFranco Fichtner.Va cur 2447c417b37SFranco Fichtnerindicates the first slot that can be used for transmissions, and 2457c417b37SFranco Fichtner.Va avail 2467c417b37SFranco Fichtnerreports how many of them are available. 2477c417b37SFranco FichtnerBefore the next 2487c417b37SFranco Fichtner.Nm Ns -related 2497c417b37SFranco Fichtnersystem call on the file 250fb578518SFranco Fichtnerdescriptor, the application should fill buffers and 2517c417b37SFranco Fichtnerslots with data, and update 2527c417b37SFranco Fichtner.Va cur 2537c417b37SFranco Fichtnerand 2547c417b37SFranco Fichtner.Va avail 255fb578518SFranco Fichtneraccordingly, as shown in the figure below: 256fb578518SFranco Fichtner.Bd -literal 257fb578518SFranco Fichtner cur 258fb578518SFranco Fichtner |----- avail ---| (after syscall) 259fb578518SFranco Fichtner v 260fb578518SFranco Fichtner TX [*****aaaaaaaaaaaaaaaaa**] 261fb578518SFranco Fichtner TX [*****TTTTTaaaaaaaaaaaa**] 262fb578518SFranco Fichtner ^ 263fb578518SFranco Fichtner |-- avail --| (before syscall) 264fb578518SFranco Fichtner cur 265fb578518SFranco Fichtner.Ed 2667c417b37SFranco Fichtner.Pp 2677c417b37SFranco FichtnerIn receive rings, after a system call 2687c417b37SFranco Fichtner.Va cur 2697c417b37SFranco Fichtnerindicates the first slot that contains a valid packet, and 2707c417b37SFranco Fichtner.Va avail 2717c417b37SFranco Fichtnerreports how many of them are available. 2727c417b37SFranco FichtnerBefore the next 2737c417b37SFranco Fichtner.Nm Ns -related 2747c417b37SFranco Fichtnersystem call on the file 275fb578518SFranco Fichtnerdescriptor, the application can process buffers and 276fb578518SFranco Fichtnerrelease them to the kernel updating 2777c417b37SFranco Fichtner.Va cur 2787c417b37SFranco Fichtnerand 2797c417b37SFranco Fichtner.Va avail 2807c417b37SFranco Fichtneraccordingly, as shown in the figure below. 2817c417b37SFranco FichtnerReceive rings have an additional field called 2827c417b37SFranco Fichtner.Va reserved 2837c417b37SFranco Fichtnerto indicate how many buffers before 2847c417b37SFranco Fichtner.Va cur 2857c417b37SFranco Fichtnercannot be released because they are still being processed. 286fb578518SFranco Fichtner.Bd -literal 287fb578518SFranco Fichtner cur 288fb578518SFranco Fichtner |-res-|-- avail --| (after syscall) 289fb578518SFranco Fichtner v 290fb578518SFranco Fichtner RX [**rrrrrrRRRRRRRRRRRR******] 291fb578518SFranco Fichtner RX [**...........rrrrRRR******] 292fb578518SFranco Fichtner |res|--|<avail (before syscall) 293fb578518SFranco Fichtner ^ 294fb578518SFranco Fichtner cur 295fb578518SFranco Fichtner.Ed 2967c417b37SFranco Fichtner.It Sy struct netmap_slot (one per packet) 297fb578518SFranco Fichtnercontains the metadata for a packet: 298fb578518SFranco Fichtner.Bd -literal 299fb578518SFranco Fichtnerstruct netmap_slot { 300fb578518SFranco Fichtner uint32_t buf_idx; /* buffer index */ 301fb578518SFranco Fichtner uint16_t len; /* packet length */ 302fb578518SFranco Fichtner uint16_t flags; /* buf changed, etc. */ 303fb578518SFranco Fichtner#define NS_BUF_CHANGED 0x0001 /* must resync, buffer changed */ 3047c417b37SFranco Fichtner#define NS_REPORT 0x0002 /* tell hw to report results, 305fb578518SFranco Fichtner * e.g. by generating an interrupt 306fb578518SFranco Fichtner */ 307fb578518SFranco Fichtner#define NS_FORWARD 0x0004 /* pass packet to the other endpoint 308fb578518SFranco Fichtner * (host stack or device) 309fb578518SFranco Fichtner */ 310fb578518SFranco Fichtner#define NS_NO_LEARN 0x0008 311fb578518SFranco Fichtner#define NS_INDIRECT 0x0010 312fb578518SFranco Fichtner#define NS_MOREFRAG 0x0020 313fb578518SFranco Fichtner#define NS_PORT_SHIFT 8 314fb578518SFranco Fichtner#define NS_PORT_MASK (0xff << NS_PORT_SHIFT) 315fb578518SFranco Fichtner#define NS_RFRAGS(_slot) (((_slot)->flags >> 8) & 0xff) 316fb578518SFranco Fichtner uint64_t ptr; /* buffer address (indirect buffers) */ 317fb578518SFranco Fichtner}; 318fb578518SFranco Fichtner.Ed 3197c417b37SFranco Fichtner.Pp 320fb578518SFranco FichtnerThe flags control how the the buffer associated to the slot 321fb578518SFranco Fichtnershould be managed. 3227c417b37SFranco Fichtner.It Sy packet buffers 323fb578518SFranco Fichtnerare normally fixed size (2 Kbyte) buffers allocated by the kernel 3247c417b37SFranco Fichtnerthat contain packet data. 325fb578518SFranco Fichtner.El 326fb578518SFranco Fichtner.Pp 3277c417b37SFranco FichtnerAddresses are computed through macros in order to 3287c417b37SFranco Fichtnersupport access to objects in the shared memory region, e.g.: 3297c417b37SFranco Fichtner.Bl -tag -width ".Fn NETMAP_BUF ring buf_idx" 3307c417b37SFranco Fichtner.It Fn NETMAP_TXRING nifp i 3317c417b37SFranco FichtnerReturns the address of the 3327c417b37SFranco Fichtner.Va i Ns -th 3337c417b37SFranco Fichtnertransmit ring. 3347c417b37SFranco Fichtner.It Fn NETMAP_RXRING nifp i 3357c417b37SFranco FichtnerReturns the address of the 3367c417b37SFranco Fichtner.Va i Ns -th 3377c417b37SFranco Fichtnerreceive ring. 3387c417b37SFranco Fichtner.It Fn NETMAP_BUF ring buf_idx 3397c417b37SFranco FichtnerReturns the address of the buffer with index 3407c417b37SFranco Fichtner.Va buf_idx 341fb578518SFranco Fichtner(which can be part of any ring for the given interface). 342fb578518SFranco Fichtner.El 3437c417b37SFranco Fichtner.Ss FLAGS 344fb578518SFranco FichtnerNormally, buffers are associated to slots when interfaces are bound, 345fb578518SFranco Fichtnerand one packet is fully contained in a single buffer. 3467c417b37SFranco FichtnerClients can, however, modify the mapping using the 347fb578518SFranco Fichtnerfollowing flags: 3487c417b37SFranco Fichtner.Bl -tag -width ".Fn NS_RFRAGS slot" 3497c417b37SFranco Fichtner.It Dv NS_BUF_CHANGED 3507c417b37SFranco Fichtnerindicates that the 3517c417b37SFranco Fichtner.Va buf_idx 3527c417b37SFranco Fichtnerin the slot has changed. 353fb578518SFranco FichtnerThis can be useful if the client wants to implement 354fb578518SFranco Fichtnersome form of zero-copy forwarding (e.g. by passing buffers 355fb578518SFranco Fichtnerfrom an input interface to an output interface), or 356fb578518SFranco Fichtnerneeds to process packets out of order. 357fb578518SFranco Fichtner.Pp 358fb578518SFranco FichtnerThe flag MUST be used whenever the buffer index is changed. 3597c417b37SFranco Fichtner.It Dv NS_REPORT 360fb578518SFranco Fichtnerindicates that we want to be woken up when this buffer 3617c417b37SFranco Fichtnerhas been transmitted. 3627c417b37SFranco FichtnerThis reduces performance but insures 363fb578518SFranco Fichtnera prompt notification when a buffer has been sent. 364fb578518SFranco FichtnerNormally, 365fb578518SFranco Fichtner.Nm 366fb578518SFranco Fichtnernotifies transmit completions in batches, hence signals 3677c417b37SFranco Fichtnermay be delayed indefinitely. 3687c417b37SFranco FichtnerHowever, we need such notifications 369fb578518SFranco Fichtnerbefore closing a descriptor. 3707c417b37SFranco Fichtner.It Dv NS_FORWARD 3717c417b37SFranco FichtnerWhen the device is opened in 3727c417b37SFranco Fichtner.Sq transparent 3737c417b37SFranco Fichtnermode, the client can mark slots in receive rings with this flag. 374fb578518SFranco FichtnerFor all marked slots, marked packets are forwarded to 375fb578518SFranco Fichtnerthe other endpoint at the next system call, thus restoring 376fb578518SFranco Fichtner(in a selective way) the connection between the NIC and the 377fb578518SFranco Fichtnerhost stack. 3787c417b37SFranco Fichtner.It Dv NS_NO_LEARN 379fb578518SFranco Fichtnertells the forwarding code that the SRC MAC address for this 3807c417b37SFranco Fichtnerpacket should not be used in the learning bridge. 3817c417b37SFranco Fichtner.It Dv NS_INDIRECT 3827c417b37SFranco Fichtnerindicates that the packet's payload is not in the 3837c417b37SFranco Fichtner.Nm Ns -supplied 3847c417b37SFranco Fichtnerbuffer, but in a user-supplied buffer whose 3857c417b37SFranco Fichtneruser virtual address is in the 3867c417b37SFranco Fichtner.Va ptr 3877c417b37SFranco Fichtnerfield of the slot. 388fb578518SFranco FichtnerThe size can reach 65535 bytes. 3897c417b37SFranco FichtnerThis is only supported on the transmit ring of virtual ports. 3907c417b37SFranco Fichtner.It Dv NS_MOREFRAG 391fb578518SFranco Fichtnerindicates that the packet continues with subsequent buffers; 3927c417b37SFranco Fichtnerthe last buffer in a packet must have the flag cleared. 393fb578518SFranco FichtnerThe maximum length of a chain is 64 buffers. 3947c417b37SFranco FichtnerThis is only supported on virtual ports. 3957c417b37SFranco Fichtner.It Fn NS_RFRAGS slot 396fb578518SFranco Fichtneron receive rings, returns the number of remaining buffers 397fb578518SFranco Fichtnerin a packet, including this one. 3987c417b37SFranco FichtnerSlots with a value greater than 1 also have 3997c417b37SFranco Fichtner.Dv NS_MOREFRAG 4007c417b37SFranco Fichtnerset. 4017c417b37SFranco FichtnerThe length refers to the individual buffer; 4027c417b37SFranco Fichtnerthere is no field for the total length. 403fb578518SFranco Fichtner.Pp 4047c417b37SFranco FichtnerOn transmit rings, if 4057c417b37SFranco Fichtner.Dv NS_DST 4067c417b37SFranco Fichtneris set, it is passed to the lookup 407fb578518SFranco Fichtnerfunction, which can use it e.g. as the index of the destination 408fb578518SFranco Fichtnerport instead of doing an address lookup. 409fb578518SFranco Fichtner.El 4107c417b37SFranco Fichtner.Sh SYSTEM CALLS 411fb578518SFranco Fichtner.Nm 4127c417b37SFranco Fichtnersupports 4137c417b37SFranco Fichtner.Xr ioctl 2 4147c417b37SFranco Fichtnercommands to synchronize the state of the rings 4157c417b37SFranco Fichtnerbetween the kernel and the user processes, as well as 416fb578518SFranco Fichtnerto query and configure the interface. 4177c417b37SFranco FichtnerThe former do not require any argument, whereas the latter use a 4187c417b37SFranco Fichtner.Vt struct nmreq 419fb578518SFranco Fichtnerdefined as follows: 420fb578518SFranco Fichtner.Bd -literal 421fb578518SFranco Fichtnerstruct nmreq { 422fb578518SFranco Fichtner char nr_name[IFNAMSIZ]; 423fb578518SFranco Fichtner uint32_t nr_version; /* API version */ 424fb578518SFranco Fichtner#define NETMAP_API 4 /* current version */ 425fb578518SFranco Fichtner uint32_t nr_offset; /* nifp offset in the shared region */ 426fb578518SFranco Fichtner uint32_t nr_memsize; /* size of the shared region */ 427fb578518SFranco Fichtner uint32_t nr_tx_slots; /* slots in tx rings */ 428fb578518SFranco Fichtner uint32_t nr_rx_slots; /* slots in rx rings */ 429fb578518SFranco Fichtner uint16_t nr_tx_rings; /* number of tx rings */ 430fb578518SFranco Fichtner uint16_t nr_rx_rings; /* number of tx rings */ 431fb578518SFranco Fichtner uint16_t nr_ringid; /* ring(s) we care about */ 432fb578518SFranco Fichtner#define NETMAP_HW_RING 0x4000 /* low bits indicate one hw ring */ 433fb578518SFranco Fichtner#define NETMAP_SW_RING 0x2000 /* we process the sw ring */ 434fb578518SFranco Fichtner#define NETMAP_NO_TX_POLL 0x1000 /* no gratuitous txsync on poll */ 435fb578518SFranco Fichtner#define NETMAP_RING_MASK 0xfff /* the actual ring number */ 436fb578518SFranco Fichtner uint16_t nr_cmd; 437fb578518SFranco Fichtner#define NETMAP_BDG_ATTACH 1 /* attach the NIC */ 438fb578518SFranco Fichtner#define NETMAP_BDG_DETACH 2 /* detach the NIC */ 439fb578518SFranco Fichtner#define NETMAP_BDG_LOOKUP_REG 3 /* register lookup function */ 440fb578518SFranco Fichtner#define NETMAP_BDG_LIST 4 /* get bridge's info */ 441fb578518SFranco Fichtner uint16_t nr_arg1; 442fb578518SFranco Fichtner uint16_t nr_arg2; 443fb578518SFranco Fichtner uint32_t spare2[3]; 444fb578518SFranco Fichtner}; 445fb578518SFranco Fichtner.Ed 4467c417b37SFranco Fichtner.Pp 447fb578518SFranco FichtnerA device descriptor obtained through 448fb578518SFranco Fichtner.Pa /dev/netmap 4497c417b37SFranco Fichtnersupports the 450fb578518SFranco Fichtner.Xr ioctl 2 4517c417b37SFranco Fichtnercommand codes supported by network devices, as well as 4527c417b37SFranco Fichtnerspecific command codes defined in 453*f933b737SSascha Wildner.In net/netmap/netmap.h . 4547c417b37SFranco FichtnerThese specific command codes are as follows: 4557c417b37SFranco Fichtner.Bl -tag -width ".Dv NIOCTXSYNC" 456fb578518SFranco Fichtner.It Dv NIOCGINFO 4577c417b37SFranco Fichtnerreturns 4587c417b37SFranco Fichtner.Dv EINVAL 4597c417b37SFranco Fichtnerif the named device does not support 4607c417b37SFranco Fichtner.Nm . 4617c417b37SFranco FichtnerOtherwise, it returns zero and advisory information 462fb578518SFranco Fichtnerabout the interface. 463fb578518SFranco FichtnerNote that all the information below can change before the 4647c417b37SFranco Fichtnerinterface is actually put into 4657c417b37SFranco Fichtner.Nm 4667c417b37SFranco Fichtnermode. 467fb578518SFranco Fichtner.Pp 4687c417b37SFranco Fichtner.Va nr_memsize 4697c417b37SFranco Fichtnerindicates the size of the 4707c417b37SFranco Fichtner.Nm 4717c417b37SFranco Fichtnermemory region. 4727c417b37SFranco FichtnerPhysical devices all share the same memory region, whereas 4737c417b37SFranco Fichtner.Xr vale 4 4747c417b37SFranco Fichtnerports may have independent regions for each port. 4757c417b37SFranco FichtnerThese sizes can be set through system-wide 4767c417b37SFranco Fichtner.Xr sysctl 8 4777c417b37SFranco Fichtnervariables. 4787c417b37SFranco Fichtner.Va nr_tx_slots 4797c417b37SFranco Fichtnerand 4807c417b37SFranco Fichtner.Va nr_rx_slots 4817c417b37SFranco Fichtnerindicate the size of transmit and receive rings, respectively. 4827c417b37SFranco Fichtner.Va nr_tx_rings 4837c417b37SFranco Fichtnerand 4847c417b37SFranco Fichtner.Va nr_rx_rings 4857c417b37SFranco Fichtnerindicate the number of transmit and receive rings, respectively. 4867c417b37SFranco FichtnerBoth ring number and size may be configured at runtime 4877c417b37SFranco Fichtnerusing interface-specific functions (e.g.\& 4887c417b37SFranco Fichtner.Xr sysctl 8 4897c417b37SFranco Fichtneron BSD, or 4907c417b37SFranco Fichtner.Xr ethtool 8 4917c417b37SFranco Fichtneron Linux). 492fb578518SFranco Fichtner.It Dv NIOCREGIF 4937c417b37SFranco Fichtnerputs the interface specified via 4947c417b37SFranco Fichtner.Va nr_name 4957c417b37SFranco Fichtnerinto 4967c417b37SFranco Fichtner.Nm 4977c417b37SFranco Fichtnermode, disconnecting it from the host stack, and/or defines which 4987c417b37SFranco Fichtnerrings are controlled through this file descriptor. 4997c417b37SFranco FichtnerOn return, it gives the same info as 5007c417b37SFranco Fichtner.Dv NIOCGINFO , 5017c417b37SFranco Fichtnerand 5027c417b37SFranco Fichtner.Va nr_ringid 503fb578518SFranco Fichtnerindicates the identity of the rings controlled through the file 504fb578518SFranco Fichtnerdescriptor. 505fb578518SFranco Fichtner.Pp 5067c417b37SFranco FichtnerPossible values for 5077c417b37SFranco Fichtner.Va nr_ringid 5087c417b37SFranco Fichtnerare as follows: 5097c417b37SFranco Fichtner.Bl -tag -width "Dv NETMAP_HW_RING + i" 510fb578518SFranco Fichtner.It 0 5117c417b37SFranco Fichtnerdefault; all hardware rings 5127c417b37SFranco Fichtner.It Dv NETMAP_SW_RING 5137c417b37SFranco Fichtner.Dq host rings 5147c417b37SFranco Fichtnerconnecting to the host stack 5157c417b37SFranco Fichtner.It Dv NETMAP_HW_RING + i 5167c417b37SFranco Fichtneri-th hardware ring 517fb578518SFranco Fichtner.El 518fb578518SFranco Fichtner.Pp 5197c417b37SFranco FichtnerBy default, a 5207c417b37SFranco Fichtner.Xr poll 2 5217c417b37SFranco Fichtneror 5227c417b37SFranco Fichtner.Xr select 2 5237c417b37SFranco Fichtnercall pushes out any pending packets on the transmit ring, even if 5247c417b37SFranco Fichtnerno write events were specified. 5257c417b37SFranco FichtnerThe feature can be disabled by OR-ing the flag 5267c417b37SFranco Fichtner.Dv NETMAP_NO_TX_SYNC 5277c417b37SFranco Fichtnerinto 5287c417b37SFranco Fichtner.Va nr_ringid . 5297c417b37SFranco FichtnerNormally, you should keep this feature unless you are using 5307c417b37SFranco Fichtnerseparate file descriptors for the send and receive rings, because 5317c417b37SFranco Fichtnerotherwise packets are pushed out only if 5327c417b37SFranco Fichtner.Dv NETMAP_TXSYNC 5337c417b37SFranco Fichtneris called, or the send queue is full. 5347c417b37SFranco Fichtner.Pp 5357c417b37SFranco Fichtner.Dv NIOCREGIF 536fb578518SFranco Fichtnercan be used multiple times to change the association of a 537fb578518SFranco Fichtnerfile descriptor to a ring pair, always within the same device. 538fb578518SFranco Fichtner.Pp 539fb578518SFranco FichtnerWhen registering a virtual interface that is dynamically created to a 540fb578518SFranco Fichtner.Xr vale 4 541fb578518SFranco Fichtnerswitch, we can specify the desired number of rings (1 by default, 5427c417b37SFranco Fichtnerand currently up to 16) by setting the 5437c417b37SFranco Fichtner.Va nr_tx_rings 5447c417b37SFranco Fichtnerand 5457c417b37SFranco Fichtner.Va nr_rx_rings 5467c417b37SFranco Fichtnerfields accordingly. 547fb578518SFranco Fichtner.It Dv NIOCTXSYNC 5487c417b37SFranco Fichtnertells the hardware about new packets to transmit, and updates the 549fb578518SFranco Fichtnernumber of slots available for transmission. 550fb578518SFranco Fichtner.It Dv NIOCRXSYNC 5517c417b37SFranco Fichtnertells the hardware about consumed packets, and asks for newly available 552fb578518SFranco Fichtnerpackets. 553fb578518SFranco Fichtner.El 5547c417b37SFranco Fichtner.Pp 555fb578518SFranco Fichtner.Nm 556fb578518SFranco Fichtneruses 557fb578518SFranco Fichtner.Xr select 2 558fb578518SFranco Fichtnerand 559fb578518SFranco Fichtner.Xr poll 2 560fb578518SFranco Fichtnerto wake up processes when significant events occur, and 561fb578518SFranco Fichtner.Xr mmap 2 562fb578518SFranco Fichtnerto map memory. 563fb578518SFranco Fichtner.Pp 564fb578518SFranco FichtnerApplications may need to create threads and bind them to 565fb578518SFranco Fichtnerspecific cores to improve performance, using standard 5667c417b37SFranco FichtnerOS primitives; see 567fb578518SFranco Fichtner.Xr pthread 3 . 568fb578518SFranco FichtnerIn particular, 569fb578518SFranco Fichtner.Xr pthread_setaffinity_np 3 570fb578518SFranco Fichtnermay be of use. 571fb578518SFranco Fichtner.Sh EXAMPLES 5727c417b37SFranco FichtnerThe following code implements a traffic generator: 5737c417b37SFranco Fichtner.Bd -literal 5747c417b37SFranco Fichtner#include <sys/ioctl.h> 5757c417b37SFranco Fichtner#include <sys/mman.h> 5767c417b37SFranco Fichtner#include <sys/socket.h> 5777c417b37SFranco Fichtner#include <sys/time.h> 5787c417b37SFranco Fichtner#include <sys/types.h> 579*f933b737SSascha Wildner#include <net/netmap/netmap_user.h> 5807c417b37SFranco Fichtner 5817c417b37SFranco Fichtner#include <fcntl.h> 5827c417b37SFranco Fichtner#include <poll.h> 5837c417b37SFranco Fichtner#include <string.h> 5847c417b37SFranco Fichtner 5857c417b37SFranco Fichtnerint 5867c417b37SFranco Fichtnermain(void) 5877c417b37SFranco Fichtner{ 588fb578518SFranco Fichtner struct netmap_if *nifp; 589fb578518SFranco Fichtner struct netmap_ring *ring; 5907c417b37SFranco Fichtner struct pollfd fds; 591fb578518SFranco Fichtner struct nmreq nmr; 5927c417b37SFranco Fichtner void *p; 5937c417b37SFranco Fichtner int fd; 594fb578518SFranco Fichtner 595fb578518SFranco Fichtner fd = open("/dev/netmap", O_RDWR); 596fb578518SFranco Fichtner bzero(&nmr, sizeof(nmr)); 597fb578518SFranco Fichtner strcpy(nmr.nr_name, "ix0"); 5987c417b37SFranco Fichtner nmr.nr_version = NETMAP_API; 599fb578518SFranco Fichtner ioctl(fd, NIOCREGIF, &nmr); 6007c417b37SFranco Fichtner p = mmap(0, nmr.nr_memsize, PROT_WRITE | PROT_READ, 6017c417b37SFranco Fichtner MAP_SHARED, fd, 0); 602fb578518SFranco Fichtner nifp = NETMAP_IF(p, nmr.nr_offset); 603fb578518SFranco Fichtner ring = NETMAP_TXRING(nifp, 0); 604fb578518SFranco Fichtner fds.fd = fd; 605fb578518SFranco Fichtner fds.events = POLLOUT; 6067c417b37SFranco Fichtner 607fb578518SFranco Fichtner for (;;) { 6087c417b37SFranco Fichtner poll(&fds, 1, -1); 609fb578518SFranco Fichtner for (; ring->avail > 0; ring->avail--) { 6107c417b37SFranco Fichtner uint32_t i; 6117c417b37SFranco Fichtner void *buf; 6127c417b37SFranco Fichtner 613fb578518SFranco Fichtner i = ring->cur; 6147c417b37SFranco Fichtner buf = NETMAP_BUF(ring, ring->slot[i].buf_idx); 6157c417b37SFranco Fichtner /* prepare packet in buf */ 6167c417b37SFranco Fichtner ring->slot[i].len = 0; /* packet length */ 617fb578518SFranco Fichtner ring->cur = NETMAP_RING_NEXT(ring, i); 618fb578518SFranco Fichtner } 619fb578518SFranco Fichtner } 6207c417b37SFranco Fichtner} 621fb578518SFranco Fichtner.Ed 622fb578518SFranco Fichtner.Sh SUPPORTED INTERFACES 623fb578518SFranco Fichtner.Nm 624fb578518SFranco Fichtnersupports the following interfaces: 625fb578518SFranco Fichtner.Xr em 4 , 626fb578518SFranco Fichtner.Xr igb 4 , 627fb578518SFranco Fichtner.Xr ixgbe 4 , 628fb578518SFranco Fichtner.Xr lem 4 , 6297c417b37SFranco Fichtnerand 6307c417b37SFranco Fichtner.Xr re 4 . 631fb578518SFranco Fichtner.Sh SEE ALSO 632fb578518SFranco Fichtner.Xr vale 4 6337c417b37SFranco Fichtner.Rs 6347c417b37SFranco Fichtner.%A Luigi Rizzo 6357c417b37SFranco Fichtner.%T Revisiting network I/O APIs: the netmap framework 6367c417b37SFranco Fichtner.%J Communications of the ACM 6377c417b37SFranco Fichtner.%V 55 (3) 6387c417b37SFranco Fichtner.%P 45-51 6397c417b37SFranco Fichtner.%D March 2012 6407c417b37SFranco Fichtner.Re 6417c417b37SFranco Fichtner.Rs 6427c417b37SFranco Fichtner.%A Luigi Rizzo 6437c417b37SFranco Fichtner.%T netmap: a novel framework for fast packet I/O 6447c417b37SFranco Fichtner.%D June 2012 6457c417b37SFranco Fichtner.%O USENIX ATC '12, Boston 6467c417b37SFranco Fichtner.Re 647fb578518SFranco Fichtner.Pp 6487c417b37SFranco Fichtner.Lk http://info.iet.unipi.it/~luigi/netmap/ 649fb578518SFranco Fichtner.Sh AUTHORS 650fb578518SFranco Fichtner.An -nosplit 651fb578518SFranco FichtnerThe 652fb578518SFranco Fichtner.Nm 653fb578518SFranco Fichtnerframework has been originally designed and implemented at the 654fb578518SFranco FichtnerUniversita` di Pisa in 2011 by 655fb578518SFranco Fichtner.An Luigi Rizzo , 656fb578518SFranco Fichtnerand further extended with help from 657fb578518SFranco Fichtner.An Matteo Landi , 658fb578518SFranco Fichtner.An Gaetano Catalli , 659fb578518SFranco Fichtner.An Giuseppe Lettieri , 6607c417b37SFranco Fichtnerand 661fb578518SFranco Fichtner.An Vincenzo Maffione . 662fb578518SFranco Fichtner.Pp 663fb578518SFranco Fichtner.Nm 664fb578518SFranco Fichtnerand 6657c417b37SFranco Fichtner.Xr vale 4 6667c417b37SFranco Fichtnerhave been funded by the European Commission within the FP7 Projects 667fb578518SFranco FichtnerCHANGE (257422) and OPENLAB (287581). 668