1$NetBSD: TODO.smpnet,v 1.50 2024/08/12 10:46:40 nia Exp $ 2 3MP-safe components 4================== 5 6They work without the big kernel lock (KERNEL_LOCK), i.e., with NET_MPSAFE 7kernel option. Some components scale up and some don't. 8 9 - Device drivers 10 - aq(4) 11 - awge(4) 12 - bcmgenet(4) 13 - bge(4) 14 - ena(4) 15 - iavf(4) 16 - ixg(4) 17 - ixl(4) 18 - ixv(4) 19 - mcx(4) 20 - rge(4) 21 - se(4) 22 - sunxi_emac(4) 23 - vioif(4) 24 - vmx(4) 25 - wm(4) 26 - xennet(4) 27 - usbnet(4) based adapters: 28 - axe(4) 29 - axen(4) 30 - cdce(4) 31 - cue(4) 32 - kue(4) 33 - mos(4) 34 - mue(4) 35 - smsc(4) 36 - udav(4) 37 - upl(4) 38 - ure(4) 39 - url(4) 40 - urndis(4) 41 - Layer 2 42 - Ethernet (if_ethersubr.c) 43 - bridge(4) 44 - STP 45 - Fast forward (ipflow) 46 - Layer 3 47 - All except for items in the below section 48 - Interfaces 49 - canloop(4) 50 - gif(4) 51 - ipsecif(4) 52 - l2tp(4) 53 - lagg(4) 54 - pppoe(4) 55 - if_spppsubr.c 56 - tap(4) 57 - tun(4) 58 - vether(4) 59 - vlan(4) 60 - Packet filters 61 - npf(7) 62 - ipf(4) 63 - Others 64 - bpf(4) 65 - ipsec(4) 66 - opencrypto(9) 67 - pfil(9) 68 69Non MP-safe components and kernel options 70========================================= 71 72The components and options aren't MP-safe, i.e., requires the big kernel lock, 73yet. Some of them can be used safely even if NET_MPSAFE is enabled because 74they're still protected by the big kernel lock. The others aren't protected and 75so unsafe, e.g, they may crash the kernel. 76 77Protected ones 78-------------- 79 80 - Device drivers 81 - Most drivers other than ones listed in the above section 82 - Layer 4 83 - DCCP 84 - SCTP 85 - TCP 86 - UDP 87 88Unprotected ones 89---------------- 90 91 - Layer 2 92 - ARCNET (if_arcsubr.c) 93 - IEEE 1394 (if_ieee1394subr.c) 94 - IEEE 802.11 (ieee80211(4)) 95 - Layer 3 96 - IPSELSRC 97 - MROUTING 98 - PIM 99 - MPLS (mpls(4)) 100 - IPv6 address selection policy 101 - Interfaces 102 - agr(4) 103 - carp(4) 104 - faith(4) 105 - gre(4) 106 - ppp(4) 107 - sl(4) 108 - stf(4) 109 - if_srt 110 - Packet filters 111 - pf(4) 112 - Others 113 - AppleTalk (sys/netatalk/) 114 - Bluetooth (sys/netbt/) 115 - altq(4) 116 - kttcp(4) 117 - NFS 118 119Know issues 120=========== 121 122NOMPSAFE 123-------- 124 125We use "NOMPSAFE" as a mark that indicates that the code around it isn't MP-safe 126yet. We use it in comments and also use as part of function names, for example 127m_get_rcvif_NOMPSAFE. Let's use "NOMPSAFE" to make it easy to find non-MP-safe 128codes by grep. 129 130bpf 131--- 132 133MP-ification of bpf requires all of bpf_mtap* are called in normal LWP context 134or softint context, i.e., not in hardware interrupt context. For Tx, all 135bpf_mtap satisfy the requirement. For Rx, most of bpf_mtap are called in softint. 136Unfortunately some bpf_mtap on Rx are still called in hardware interrupt context. 137 138This is the list of the functions that have such bpf_mtap: 139 140 - sca_frame_process() @ sys/dev/ic/hd64570.c 141 142Ideally we should make the functions run in softint somehow, but we don't have 143actual devices, no time (or interest/love) to work on the task, so instead we 144provide a deferred bpf_mtap mechanism that forcibly runs bpf_mtap in softint 145context. It's a workaround and once the functions run in softint, we should use 146the original bpf_mtap again. 147 148if_mcast_op() - SIOCADDMULTI/SIOCDELMULTI 149----------------------------------------- 150Helper function is called to add or remove multicast addresses for 151interface. When called via ioctl it takes IFNET_LOCK(), when called 152via sosetopt() it doesn't. 153 154Various network drivers can't assert IFNET_LOCKED() in their if_ioctl 155because of this. Generally drivers still take care to splnet() even 156with NET_MPSAFE before calling ether_ioctl(), but they do not take 157KERNEL_LOCK(), so this is actually unsafe. 158 159Lingering obsolete variables 160----------------------------- 161 162Some obsolete global variables and member variables of structures remain to 163avoid breaking old userland programs which directly access such variables via 164kvm(3). 165 166The following programs still use kvm(3) to get some information related to 167the network stack. 168 169 - netstat(1) 170 - vmstat(1) 171 - fstat(1) 172 173netstat(1) accesses ifnet_list, the head of a list of interface objects 174(struct ifnet), and traverses each object through ifnet#if_list member variable. 175ifnet_list and ifnet#if_list is obsoleted by ifnet_pslist and 176ifnet#if_pslist_entry respectively. netstat also accesses the IP address list 177of an interface through ifnet#if_addrlist. struct ifaddr, struct in_ifaddr 178and struct in6_ifaddr are accessed and the following obsolete member variables 179are stuck: ifaddr#ifa_list, in_ifaddr#ia_hash, in_ifaddr#ia_list, 180in6_ifaddr#ia_next and in6_ifaddr#_ia6_multiaddrs. Note that netstat already 181implements alternative methods to fetch the above information via sysctl(3). 182 183vmstat(1) shows statistics of hash tables created by hashinit(9) in the kernel. 184The statistic information is retrieved via kvm(3). The global variables 185in_ifaddrhash and in_ifaddrhashtbl, which are for a hash table of IPv4 186addresses and obsoleted by in_ifaddrhash_pslist and in_ifaddrhashtbl_pslist, 187are kept for this purpose. We should provide a means to fetch statistics of 188hash tables via sysctl(3). 189 190fstat(1) shows information of bpf instances. Each bpf instance (struct bpf) is 191obtained via kvm(3). bpf_d#_bd_next, bpf_d#_bd_filter and bpf_d#_bd_list 192member variables are obsolete but remain. ifnet#if_xname is also accessed 193via struct bpf_if and obsolete ifnet#if_list is required to remain to not change 194the offset of ifnet#if_xname. The statistic counters (bpf#bd_rcount, 195bpf#bd_dcount and bpf#bd_ccount) are also victims of this restriction; for 196scalability the statistic counters should be per-CPU and we should stop using 197atomic operations for them however we have to remain the counters and atomic 198operations. 199 200Scalability 201----------- 202 203 - Per-CPU rtcaches (used in say IP forwarding) aren't scalable on multiple 204 flows per CPU 205 - ipsec(4) isn't scalable on the number of SA/SP; the cost of a look-up 206 is O(n) 207 - opencrypto(9)'s crypto_newsession()/crypto_freesession() aren't scalable 208 as they are serialized by one mutex 209 210ALTQ 211---- 212 213If ALTQ is enabled in the kernel, it enforces to use just one Tx queue (if_snd) 214for packet transmissions, resulting in serializing all Tx packet processing on 215the queue. We should probably design and implement an alternative queuing 216mechanism that deals with multi-core systems at the first place, not making the 217existing ALTQ MP-safe because it's just annoying. 218 219Using kernel modules 220-------------------- 221 222Please note that if you enable NET_MPSAFE in your kernel, and you use and 223loadable kernel modules (including compat_xx modules or individual network 224interface if_xxx device driver modules), you will need to build custom 225modules. For each module you will need to add the following line to its 226Makefile: 227 228 CPPFLAGS+= NET_MPSAFE 229 230Failure to do this may result in unpredictable behavior. 231 232IPv4 address initialization atomicity 233------------------------------------- 234 235An IPv4 address is referenced by several data structures: an associated 236interface, its local route, a connected route (if necessary), the global list, 237the global hash table, etc. These data structures are not updated atomically, 238i.e., there can be inconsistent states on an IPv4 address in the kernel during 239the initialization of an IPv4 address. 240 241One known failure of the issue is that incoming packets destinating to an 242initializing address can loop in the network stack in a short period of time. 243The address initialization creates an local route first and then registers an 244initializing address to the global hash table that is used to decide if an 245incoming packet destinates to the host by checking the destination of the packet 246is registered to the hash table. So, if the host allows forwarding, an incoming 247packet can match on a local route of an initializing address at ip_output while 248it fails the to-self check described above at ip_input. Because a matched local 249route points a loopback interface as its destination interface, an incoming 250packet sends to the network stack (ip_input) again, which results in looping. 251The loop stops once an initializing address is registered to the hash table. 252 253One solution of the issue is to reorder the address initialization instructions, 254first register an address to the hash table then create its routes. Another 255solution is to use the routing table for the to-self check instead of using the 256global hash table, like IPv6. 257 258if_flags 259-------- 260 261To avoid data race on if_flags it should be protected by a lock (currently it's 262IFNET_LOCK). Thus, if_flags should not be accessed on packet processing to 263avoid performance degradation by lock contentions. Traditionally IFF_RUNNING, 264IFF_UP and IFF_OACTIVE flags of if_flags are checked on packet processing. If 265you make a driver MP-safe you must remove such checks. 266 267Drivers should not touch IFF_ALLMULTI. They are tempted to do so when updating 268hardware multicast filters on SIOCADDMULTI/SIOCDELMULTI. Instead, they should 269use the ETHER_F_ALLMULTI bit in struct ethercom::ec_flags, under ETHER_LOCK. 270ether_ioctl takes care of presenting IFF_ALLMULTI according to the current state 271of ETHER_F_ALLMULTI when queried with SIOCGIFFLAGS. 272 273Also IFF_PROMISC is checked in ether_input and we should get rid of it somehow. 274