xref: /netbsd-src/doc/TODO.smpnet (revision a2e2328d6d63d0479580e9c30629dd48eaadff26)
1*a2e2328dSnia$NetBSD: TODO.smpnet,v 1.50 2024/08/12 10:46:40 nia Exp $
29674e222Sozaki-r
3e87e25beSozaki-rMP-safe components
4e87e25beSozaki-r==================
59674e222Sozaki-r
66647f64dSozaki-rThey work without the big kernel lock (KERNEL_LOCK), i.e., with NET_MPSAFE
76647f64dSozaki-rkernel option.  Some components scale up and some don't.
86647f64dSozaki-r
91d5210cbSozaki-r - Device drivers
102d56e9f1Smsaitoh   - aq(4)
11*a2e2328dSnia   - awge(4)
12c6138d97Smrg   - bcmgenet(4)
13be6f3765Snia   - bge(4)
1441ad4686Snia   - ena(4)
15c6138d97Smrg   - iavf(4)
16c6138d97Smrg   - ixg(4)
17c6138d97Smrg   - ixl(4)
18c6138d97Smrg   - ixv(4)
19c6138d97Smrg   - mcx(4)
20c6138d97Smrg   - rge(4)
21c6138d97Smrg   - se(4)
22c6138d97Smrg   - sunxi_emac(4)
23e87e25beSozaki-r   - vioif(4)
24e87e25beSozaki-r   - vmx(4)
25e87e25beSozaki-r   - wm(4)
26c6138d97Smrg   - xennet(4)
27c6138d97Smrg   - usbnet(4) based adapters:
28c6138d97Smrg     - axe(4)
29c6138d97Smrg     - axen(4)
30c6138d97Smrg     - cdce(4)
31c6138d97Smrg     - cue(4)
32c6138d97Smrg     - kue(4)
33c6138d97Smrg     - mos(4)
34c6138d97Smrg     - mue(4)
35c6138d97Smrg     - smsc(4)
36c6138d97Smrg     - udav(4)
37c6138d97Smrg     - upl(4)
38c6138d97Smrg     - ure(4)
39c6138d97Smrg     - url(4)
40c6138d97Smrg     - urndis(4)
411d5210cbSozaki-r - Layer 2
421d5210cbSozaki-r   - Ethernet (if_ethersubr.c)
431d5210cbSozaki-r   - bridge(4)
441d5210cbSozaki-r     - STP
451d5210cbSozaki-r   - Fast forward (ipflow)
461d5210cbSozaki-r - Layer 3
471d5210cbSozaki-r   - All except for items in the below section
481d5210cbSozaki-r - Interfaces
49a3d8d8e3Snia   - canloop(4)
501d5210cbSozaki-r   - gif(4)
5140e17cbfSozaki-r   - ipsecif(4)
521d5210cbSozaki-r   - l2tp(4)
53a3d8d8e3Snia   - lagg(4)
541d5210cbSozaki-r   - pppoe(4)
551d5210cbSozaki-r     - if_spppsubr.c
5625ae59b7Snia   - tap(4)
571d5210cbSozaki-r   - tun(4)
58a3d8d8e3Snia   - vether(4)
59fbb0de67Sozaki-r   - vlan(4)
601d5210cbSozaki-r - Packet filters
611d5210cbSozaki-r   - npf(7)
627cff9016Smrg   - ipf(4)
631d5210cbSozaki-r - Others
641d5210cbSozaki-r   - bpf(4)
65fbb0de67Sozaki-r   - ipsec(4)
66fbb0de67Sozaki-r   - opencrypto(9)
671d5210cbSozaki-r   - pfil(9)
68e87e25beSozaki-r
69e87e25beSozaki-rNon MP-safe components and kernel options
70e87e25beSozaki-r=========================================
71e87e25beSozaki-r
726647f64dSozaki-rThe components and options aren't MP-safe, i.e., requires the big kernel lock,
736647f64dSozaki-ryet.  Some of them can be used safely even if NET_MPSAFE is enabled because
746647f64dSozaki-rthey're still protected by the big kernel lock.  The others aren't protected and
756647f64dSozaki-rso unsafe, e.g, they may crash the kernel.
766647f64dSozaki-r
776647f64dSozaki-rProtected ones
786647f64dSozaki-r--------------
796647f64dSozaki-r
801d5210cbSozaki-r - Device drivers
811d5210cbSozaki-r   - Most drivers other than ones listed in the above section
826647f64dSozaki-r - Layer 4
836647f64dSozaki-r   - DCCP
846647f64dSozaki-r   - SCTP
856647f64dSozaki-r   - TCP
866647f64dSozaki-r   - UDP
876647f64dSozaki-r
886647f64dSozaki-rUnprotected ones
896647f64dSozaki-r----------------
906647f64dSozaki-r
915562362bSozaki-r - Layer 2
925562362bSozaki-r   - ARCNET (if_arcsubr.c)
935562362bSozaki-r   - IEEE 1394 (if_ieee1394subr.c)
945562362bSozaki-r   - IEEE 802.11 (ieee80211(4))
955562362bSozaki-r - Layer 3
965562362bSozaki-r   - IPSELSRC
975562362bSozaki-r   - MROUTING
985562362bSozaki-r   - PIM
995562362bSozaki-r   - MPLS (mpls(4))
1009faa0319Sozaki-r   - IPv6 address selection policy
1015562362bSozaki-r - Interfaces
102e87e25beSozaki-r   - agr(4)
103e87e25beSozaki-r   - carp(4)
104e87e25beSozaki-r   - faith(4)
105e87e25beSozaki-r   - gre(4)
106e87e25beSozaki-r   - ppp(4)
107e87e25beSozaki-r   - sl(4)
108e87e25beSozaki-r   - stf(4)
109e87e25beSozaki-r   - if_srt
1105562362bSozaki-r - Packet filters
1115562362bSozaki-r   - pf(4)
1125562362bSozaki-r - Others
1135562362bSozaki-r   - AppleTalk (sys/netatalk/)
1145562362bSozaki-r   - Bluetooth (sys/netbt/)
1155562362bSozaki-r   - altq(4)
1165562362bSozaki-r   - kttcp(4)
117e87e25beSozaki-r   - NFS
118e87e25beSozaki-r
119e87e25beSozaki-rKnow issues
120e87e25beSozaki-r===========
1219674e222Sozaki-r
122e3d0b2ccSozaki-rNOMPSAFE
123e3d0b2ccSozaki-r--------
124e3d0b2ccSozaki-r
125e3d0b2ccSozaki-rWe use "NOMPSAFE" as a mark that indicates that the code around it isn't MP-safe
126e3d0b2ccSozaki-ryet.  We use it in comments and also use as part of function names, for example
127e3d0b2ccSozaki-rm_get_rcvif_NOMPSAFE.  Let's use "NOMPSAFE" to make it easy to find non-MP-safe
128e3d0b2ccSozaki-rcodes by grep.
129e3d0b2ccSozaki-r
1309674e222Sozaki-rbpf
131e87e25beSozaki-r---
1329674e222Sozaki-r
1339674e222Sozaki-rMP-ification of bpf requires all of bpf_mtap* are called in normal LWP context
1349674e222Sozaki-ror softint context, i.e., not in hardware interrupt context.  For Tx, all
13532a556f9Sandvarbpf_mtap satisfy the requirement.  For Rx, most of bpf_mtap are called in softint.
1369674e222Sozaki-rUnfortunately some bpf_mtap on Rx are still called in hardware interrupt context.
1379674e222Sozaki-r
1389674e222Sozaki-rThis is the list of the functions that have such bpf_mtap:
1399674e222Sozaki-r
1409674e222Sozaki-r - sca_frame_process() @ sys/dev/ic/hd64570.c
1419674e222Sozaki-r
1429674e222Sozaki-rIdeally we should make the functions run in softint somehow, but we don't have
1439674e222Sozaki-ractual devices, no time (or interest/love) to work on the task, so instead we
1449674e222Sozaki-rprovide a deferred bpf_mtap mechanism that forcibly runs bpf_mtap in softint
1459674e222Sozaki-rcontext.  It's a workaround and once the functions run in softint, we should use
1469674e222Sozaki-rthe original bpf_mtap again.
1476dc0e16bSozaki-r
14802cc0c4dSjdolecekif_mcast_op() - SIOCADDMULTI/SIOCDELMULTI
14902cc0c4dSjdolecek-----------------------------------------
15002cc0c4dSjdolecekHelper function is called to add or remove multicast addresses for
15102cc0c4dSjdolecekinterface.  When called via ioctl it takes IFNET_LOCK(), when called
15202cc0c4dSjdolecekvia sosetopt() it doesn't.
15302cc0c4dSjdolecek
15402cc0c4dSjdolecekVarious network drivers can't assert IFNET_LOCKED() in their if_ioctl
15502cc0c4dSjdolecekbecause of this. Generally drivers still take care to splnet() even
15602cc0c4dSjdolecekwith NET_MPSAFE before calling ether_ioctl(), but they do not take
15702cc0c4dSjdolecekKERNEL_LOCK(), so this is actually unsafe.
15802cc0c4dSjdolecek
1596dc0e16bSozaki-rLingering obsolete variables
1606dc0e16bSozaki-r-----------------------------
1616dc0e16bSozaki-r
1626dc0e16bSozaki-rSome obsolete global variables and member variables of structures remain to
1636dc0e16bSozaki-ravoid breaking old userland programs which directly access such variables via
1646dc0e16bSozaki-rkvm(3).
1656dc0e16bSozaki-r
1666dc0e16bSozaki-rThe following programs still use kvm(3) to get some information related to
1676dc0e16bSozaki-rthe network stack.
1686dc0e16bSozaki-r
1696dc0e16bSozaki-r - netstat(1)
1706dc0e16bSozaki-r - vmstat(1)
1716dc0e16bSozaki-r - fstat(1)
1726dc0e16bSozaki-r
1736dc0e16bSozaki-rnetstat(1) accesses ifnet_list, the head of a list of interface objects
1746dc0e16bSozaki-r(struct ifnet), and traverses each object through ifnet#if_list member variable.
1756dc0e16bSozaki-rifnet_list and ifnet#if_list is obsoleted by ifnet_pslist and
1766dc0e16bSozaki-rifnet#if_pslist_entry respectively. netstat also accesses the IP address list
1770ac7f4ddSandvarof an interface through ifnet#if_addrlist. struct ifaddr, struct in_ifaddr
1786dc0e16bSozaki-rand struct in6_ifaddr are accessed and the following obsolete member variables
1796dc0e16bSozaki-rare stuck: ifaddr#ifa_list, in_ifaddr#ia_hash, in_ifaddr#ia_list,
1806dc0e16bSozaki-rin6_ifaddr#ia_next and in6_ifaddr#_ia6_multiaddrs. Note that netstat already
1816dc0e16bSozaki-rimplements alternative methods to fetch the above information via sysctl(3).
1826dc0e16bSozaki-r
1836dc0e16bSozaki-rvmstat(1) shows statistics of hash tables created by hashinit(9) in the kernel.
1846dc0e16bSozaki-rThe statistic information is retrieved via kvm(3). The global variables
1856dc0e16bSozaki-rin_ifaddrhash and in_ifaddrhashtbl, which are for a hash table of IPv4
1866dc0e16bSozaki-raddresses and obsoleted by in_ifaddrhash_pslist and in_ifaddrhashtbl_pslist,
1876dc0e16bSozaki-rare kept for this purpose. We should provide a means to fetch statistics of
1886dc0e16bSozaki-rhash tables via sysctl(3).
1896dc0e16bSozaki-r
1906dc0e16bSozaki-rfstat(1) shows information of bpf instances. Each bpf instance (struct bpf) is
1916dc0e16bSozaki-robtained via kvm(3). bpf_d#_bd_next, bpf_d#_bd_filter and bpf_d#_bd_list
1926dc0e16bSozaki-rmember variables are obsolete but remain. ifnet#if_xname is also accessed
1936dc0e16bSozaki-rvia struct bpf_if and obsolete ifnet#if_list is required to remain to not change
194a0123401Sozaki-rthe offset of ifnet#if_xname. The statistic counters (bpf#bd_rcount,
195a0123401Sozaki-rbpf#bd_dcount and bpf#bd_ccount) are also victims of this restriction; for
196a0123401Sozaki-rscalability the statistic counters should be per-CPU and we should stop using
197a0123401Sozaki-ratomic operations for them however we have to remain the counters and atomic
198a0123401Sozaki-roperations.
199a38b799eSozaki-r
200a38b799eSozaki-rScalability
201a38b799eSozaki-r-----------
202a38b799eSozaki-r
203a38b799eSozaki-r - Per-CPU rtcaches (used in say IP forwarding) aren't scalable on multiple
204a38b799eSozaki-r   flows per CPU
205a38b799eSozaki-r - ipsec(4) isn't scalable on the number of SA/SP; the cost of a look-up
206a38b799eSozaki-r   is O(n)
2073ceeffeeSknakahara - opencrypto(9)'s crypto_newsession()/crypto_freesession() aren't scalable
2083ceeffeeSknakahara   as they are serialized by one mutex
2093a2af743Sozaki-r
21021a3f65aSozaki-rALTQ
21121a3f65aSozaki-r----
21221a3f65aSozaki-r
21321a3f65aSozaki-rIf ALTQ is enabled in the kernel, it enforces to use just one Tx queue (if_snd)
21421a3f65aSozaki-rfor packet transmissions, resulting in serializing all Tx packet processing on
21521a3f65aSozaki-rthe queue.  We should probably design and implement an alternative queuing
21621a3f65aSozaki-rmechanism that deals with multi-core systems at the first place, not making the
21721a3f65aSozaki-rexisting ALTQ MP-safe because it's just annoying.
218f24c721fSpgoyette
219f24c721fSpgoyetteUsing kernel modules
220f24c721fSpgoyette--------------------
221f24c721fSpgoyette
222f24c721fSpgoyettePlease note that if you enable NET_MPSAFE in your kernel, and you use and
223f24c721fSpgoyetteloadable kernel modules (including compat_xx modules or individual network
224f24c721fSpgoyetteinterface if_xxx device driver modules), you will need to build custom
225f24c721fSpgoyettemodules.  For each module you will need to add the following line to its
226f24c721fSpgoyetteMakefile:
227f24c721fSpgoyette
228f24c721fSpgoyette	CPPFLAGS+=	NET_MPSAFE
229f24c721fSpgoyette
230f24c721fSpgoyetteFailure to do this may result in unpredictable behavior.
231d0b5d19eSozaki-r
232d0b5d19eSozaki-rIPv4 address initialization atomicity
233d0b5d19eSozaki-r-------------------------------------
234d0b5d19eSozaki-r
235d0b5d19eSozaki-rAn IPv4 address is referenced by several data structures: an associated
236d0b5d19eSozaki-rinterface, its local route, a connected route (if necessary), the global list,
237d0b5d19eSozaki-rthe global hash table, etc.  These data structures are not updated atomically,
238d0b5d19eSozaki-ri.e., there can be inconsistent states on an IPv4 address in the kernel during
239d0b5d19eSozaki-rthe initialization of an IPv4 address.
240d0b5d19eSozaki-r
241d0b5d19eSozaki-rOne known failure of the issue is that incoming packets destinating to an
242d0b5d19eSozaki-rinitializing address can loop in the network stack in a short period of time.
243d0b5d19eSozaki-rThe address initialization creates an local route first and then registers an
244d0b5d19eSozaki-rinitializing address to the global hash table that is used to decide if an
245d0b5d19eSozaki-rincoming packet destinates to the host by checking the destination of the packet
24632a556f9Sandvaris registered to the hash table.  So, if the host allows forwarding, an incoming
247d0b5d19eSozaki-rpacket can match on a local route of an initializing address at ip_output while
248d0b5d19eSozaki-rit fails the to-self check described above at ip_input.  Because a matched local
249d0b5d19eSozaki-rroute points a loopback interface as its destination interface, an incoming
250d0b5d19eSozaki-rpacket sends to the network stack (ip_input) again, which results in looping.
251d0b5d19eSozaki-rThe loop stops once an initializing address is registered to the hash table.
252d0b5d19eSozaki-r
253d0b5d19eSozaki-rOne solution of the issue is to reorder the address initialization instructions,
254d0b5d19eSozaki-rfirst register an address to the hash table then create its routes.  Another
255d0b5d19eSozaki-rsolution is to use the routing table for the to-self check instead of using the
256d0b5d19eSozaki-rglobal hash table, like IPv6.
257611478deSozaki-r
258611478deSozaki-rif_flags
259611478deSozaki-r--------
260611478deSozaki-r
261611478deSozaki-rTo avoid data race on if_flags it should be protected by a lock (currently it's
262611478deSozaki-rIFNET_LOCK).  Thus, if_flags should not be accessed on packet processing to
263611478deSozaki-ravoid performance degradation by lock contentions.  Traditionally IFF_RUNNING,
264611478deSozaki-rIFF_UP and IFF_OACTIVE flags of if_flags are checked on packet processing.  If
265611478deSozaki-ryou make a driver MP-safe you must remove such checks.
266611478deSozaki-r
26734e921e5SriastradhDrivers should not touch IFF_ALLMULTI.  They are tempted to do so when updating
26834e921e5Sriastradhhardware multicast filters on SIOCADDMULTI/SIOCDELMULTI.  Instead, they should
26934e921e5Sriastradhuse the ETHER_F_ALLMULTI bit in struct ethercom::ec_flags, under ETHER_LOCK.
27034e921e5Sriastradhether_ioctl takes care of presenting IFF_ALLMULTI according to the current state
27134e921e5Sriastradhof ETHER_F_ALLMULTI when queried with SIOCGIFFLAGS.
272611478deSozaki-r
273611478deSozaki-rAlso IFF_PROMISC is checked in ether_input and we should get rid of it somehow.
274