#
e835bce2 |
| 16-Jan-2025 |
bluhm <bluhm@openbsd.org> |
Remove net lock from TCP sysctl for keep alive.
Keep copies in seconds for the sysctl and update timer variables atomically when they change. tcp_maxidle was historically calculated in tcp_slowtimo
Remove net lock from TCP sysctl for keep alive.
Keep copies in seconds for the sysctl and update timer variables atomically when they change. tcp_maxidle was historically calculated in tcp_slowtimo() as the timers were called from there. Better calculate maxidle when needed. tcp_timer_init() is useless, just initialize data. While there make the names consistent.
input sthen@; OK mvs@
show more ...
|
#
4e5e13a2 |
| 09-Jan-2025 |
bluhm <bluhm@openbsd.org> |
Run TCP sysctl ident and drop with shared net lock.
Convert exclusive net lock for TCPCTL_IDENT and TCPCTL_DROP to shared net lock and push it down into tcp_ident(). Grab the socket lock there with
Run TCP sysctl ident and drop with shared net lock.
Convert exclusive net lock for TCPCTL_IDENT and TCPCTL_DROP to shared net lock and push it down into tcp_ident(). Grab the socket lock there with in_pcbsolock_ref(). Move socket release from in_pcbsolock() to in_pcbsounlock_rele() and add _ref and _rele suffix to the inpcb socket lock functions. They both lock and refcount now. in_pcbsounlock_rele() ignores NULL sockets to make the unlock path in error case simpler. Socket lock also protects tcp_drop() and tcp_close() now, so the socket pointer from incpb may be NULL during unlock. In tcp_ident() improve consistency check of address family.
OK mvs@
show more ...
|
#
9b315513 |
| 05-Jan-2025 |
bluhm <bluhm@openbsd.org> |
TCP integer sysctl variables are all atomic. Remove net lock.
OK mvs@
|
#
ab8da1a7 |
| 04-Jan-2025 |
mvs <mvs@openbsd.org> |
Relax sockets splicing locking.
Sockets splicing works around sockets buffers which have their own locks for all socket types, especially sblock() on `so_snd' which keeps sockets being spliced.
-
Relax sockets splicing locking.
Sockets splicing works around sockets buffers which have their own locks for all socket types, especially sblock() on `so_snd' which keeps sockets being spliced.
- sosplice() does read-only sockets options and state checks, the only modification is `so_sp' assignment. The SB_SPLICE bit modification, `ssp_socket' and `ssp_soback' assignment protected with `sb_mtx' mutex(9). PCB layer does corresponding checks with `sb_mtx' held, so shared solock() is pretty enough in sosplice() path. Introduce special sosplice_solock_pair() for that purpose.
- sounsplice() requires shared socket lock only around so{r,w}wakeup calls.
- Push exclusive solock() down to tcp(4) case of somove(). Such sockets are not ready do unlocked somove() yet.
ok bluhm
show more ...
|
#
66570633 |
| 01-Jan-2025 |
bluhm <bluhm@openbsd.org> |
Fix whitespace.
|
#
507b5b41 |
| 31-Dec-2024 |
mvs <mvs@openbsd.org> |
Use per-sockbuf mutex(9) to protect `so_snd' buffer of tcp(4) sockets.
Even for tcp(4) case, sosend() only checks `so_snd' free space and sleeps if necessary, actual buffer handling happens in soloc
Use per-sockbuf mutex(9) to protect `so_snd' buffer of tcp(4) sockets.
Even for tcp(4) case, sosend() only checks `so_snd' free space and sleeps if necessary, actual buffer handling happens in solock()ed PCB layer.
Only unlock sosend() path, the somove() is still locked exclusively. The "if (dosolock)" dances are useless, but intentionally left as is.
Tested and ok by bluhm.
show more ...
|
#
77957d73 |
| 30-Dec-2024 |
bluhm <bluhm@openbsd.org> |
Remove net lock from TCP syn cache sysctl.
TCP syn cache is protected by mutex. Make access to its sysctl variables either atomic or put them into this mutex. Then net lock can be removed.
OK mvs@
|
#
f1bf6f4e |
| 19-Dec-2024 |
mvs <mvs@openbsd.org> |
Use per-sockbuf mutex(9) to protect `so_rcv' buffer of tcp(4) sockets.
Only unlock soreceive() path, somove() path still locked exclusively. Also exclusive socket lock will be taken in the soreceiv
Use per-sockbuf mutex(9) to protect `so_rcv' buffer of tcp(4) sockets.
Only unlock soreceive() path, somove() path still locked exclusively. Also exclusive socket lock will be taken in the soreceive() path each time before pru_rcvd() call.
Note, both socket and `sb_mtx' locks are held while SS_CANTRCVMORE modified, so socket lock is enough to check it in the protocol input path.
ok bluhm
show more ...
|
#
2162e93b |
| 08-Nov-2024 |
bluhm <bluhm@openbsd.org> |
TCP send and receive space update are MP safe.
tcp_update_sndspace() and tcp_update_rcvspace() only read global variables that do not change after initialization. Mark them as such. Add braces aro
TCP send and receive space update are MP safe.
tcp_update_sndspace() and tcp_update_rcvspace() only read global variables that do not change after initialization. Mark them as such. Add braces around multi-line if blocks.
ok mvs@
show more ...
|
#
93536db2 |
| 12-Apr-2024 |
bluhm <bluhm@openbsd.org> |
Split single TCP inpcb table into IPv4 and IPv6 parts.
With two separate TCP hash tables, each one becomes smaller. When we remove the exclusive net lock from TCP, contention on internet PCB table
Split single TCP inpcb table into IPv4 and IPv6 parts.
With two separate TCP hash tables, each one becomes smaller. When we remove the exclusive net lock from TCP, contention on internet PCB table mutex will be reduced. UDP has been split earlier into IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with assertions.
OK mvs@
show more ...
|
#
940d25ac |
| 11-Feb-2024 |
bluhm <bluhm@openbsd.org> |
Remove include netinet6/ip6_var.h from netinet/in_pcb.h.
OK mvs@
|
#
a342f0b4 |
| 19-Jan-2024 |
bluhm <bluhm@openbsd.org> |
Unify inpcb API for inet and inet6.
Many functions for IPv4 call their IPv6 counterpart if INP_IPV6 is set at the socket's pcb. By using the generic API consistently, the logic is not in the caller
Unify inpcb API for inet and inet6.
Many functions for IPv4 call their IPv6 counterpart if INP_IPV6 is set at the socket's pcb. By using the generic API consistently, the logic is not in the caller it gets more readable.
OK mvs@
show more ...
|
#
6285ef23 |
| 11-Jan-2024 |
bluhm <bluhm@openbsd.org> |
Fix white spaces in TCP.
|
#
ab485656 |
| 03-Dec-2023 |
bluhm <bluhm@openbsd.org> |
Use INP_IPV6 flag instead of sotopf().
During initialization in_pcballoc() sets INP_IPV6 once to avoid reaching through inp_socket->so_proto->pr_domain->dom_family. Use this flag consistently.
OK
Use INP_IPV6 flag instead of sotopf().
During initialization in_pcballoc() sets INP_IPV6 once to avoid reaching through inp_socket->so_proto->pr_domain->dom_family. Use this flag consistently.
OK sashan@ mvs@
show more ...
|
#
cd28665a |
| 01-Dec-2023 |
bluhm <bluhm@openbsd.org> |
Set inp address, port and rtable together with inpcb hash.
The inpcb hash table is protected by table->inpt_mtx. The hash is based on addresses, ports, and routing table. These fields were not syc
Set inp address, port and rtable together with inpcb hash.
The inpcb hash table is protected by table->inpt_mtx. The hash is based on addresses, ports, and routing table. These fields were not sychronized with the hash. Put writes and hash update into the same critical section. Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(), tcp_connect(), udp_disconnect() to dedicated inpcb set functions. There they use the same table mutex as in_pcbrehash(). in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work and are not included yet.
OK sashan@ mvs@
show more ...
|
#
cff23a6b |
| 01-Dec-2023 |
bluhm <bluhm@openbsd.org> |
Make internet PCB connect more consistent.
The public interface is in_pcbconnect(). It dispatches to in6_pcbconnect() if necessary. Call the former from tcp_connect() and udp_connect(). In in6_pcb
Make internet PCB connect more consistent.
The public interface is in_pcbconnect(). It dispatches to in6_pcbconnect() if necessary. Call the former from tcp_connect() and udp_connect(). In in6_pcbconnect() initialization in6a = NULL is not necessary. in6_pcbselsrc() sets the pointer, but does not read the value. Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc(). It returns a reference to the address of some internal data structure. We want to be sure that in6_addr is not modified this way. IPv4 in_pcbselsrc() solves this by passing a copy of the address.
OK kn@ sashan@ mvs@
show more ...
|
#
952c6363 |
| 28-Nov-2023 |
bluhm <bluhm@openbsd.org> |
Remove struct inpcb from in6_embedscope() parameters.
rip6_output() did modify inp_outputopts6 temporarily to provide different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6 and inp_
Remove struct inpcb from in6_embedscope() parameters.
rip6_output() did modify inp_outputopts6 temporarily to provide different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6 and inp_moptions6 as separate arguments to in6_embedscope(). Simplify the code that deals with these options in in6_embedscope(). Doucument inp_moptions and inp_moptions6 as protected by net lock.
OK kn@
show more ...
|
#
0bfbfbe7 |
| 16-Nov-2023 |
bluhm <bluhm@openbsd.org> |
Run TCP SYN cache timer logik without net lock.
Introduce global TCP SYN cache mutex. Devide timer function in parts protected by mutex and sending with netlock. Split the flags field in dynamic f
Run TCP SYN cache timer logik without net lock.
Introduce global TCP SYN cache mutex. Devide timer function in parts protected by mutex and sending with netlock. Split the flags field in dynamic flags protected by mutex and fixed flags set during initialization. Document whether fields of struct syn_cache are protected by net lock or mutex.
input and OK sashan@
show more ...
|
#
bf0d449c |
| 16-Sep-2023 |
mpi <mpi@openbsd.org> |
Allow counters_read(9) to take an optional scratch buffer.
Using a scratch buffer makes it possible to take a consistent snapshot of per-CPU counters without having to allocate memory.
Makes ddb(4)
Allow counters_read(9) to take an optional scratch buffer.
Using a scratch buffer makes it possible to take a consistent snapshot of per-CPU counters without having to allocate memory.
Makes ddb(4) show uvmexp command work in OOM situations.
ok kn@, mvs@, cheloha@
show more ...
|
#
9e96aff0 |
| 06-Jul-2023 |
bluhm <bluhm@openbsd.org> |
Convert tcp_now() time counter to 64 bit.
After changing tcp now tick to milliseconds, 32 bits will wrap around after 49 days of uptime. That may be a problem in some places of our stack. Better u
Convert tcp_now() time counter to 64 bit.
After changing tcp now tick to milliseconds, 32 bits will wrap around after 49 days of uptime. That may be a problem in some places of our stack. Better use a 64 bit counter.
As timestamp option is 32 bit in TCP protocol, use the lower 32 bit there. There are casts to 32 bits that should behave correctly.
Start with random 63 bit offset to avoid uptime leakage. 2^63 milliseconds result in 2.9*10^8 years of possible uptime.
OK yasuoka@
show more ...
|
#
a3c0391f |
| 02-Jul-2023 |
bluhm <bluhm@openbsd.org> |
Use TSO and LRO on the loopback interface to transfer TCP faster.
If tcplro is activated on lo(4), ignore the MTU with TCP packets. They are passed along with the information that they have to be ch
Use TSO and LRO on the loopback interface to transfer TCP faster.
If tcplro is activated on lo(4), ignore the MTU with TCP packets. They are passed along with the information that they have to be chopped in case they are forwarded later. New netstat(1) counter shows that software LRO is in effect. The feature is currently turned off by default.
tested by jan@; OK claudio@ jan@
show more ...
|
#
a5a54c4a |
| 23-May-2023 |
jan <jan@openbsd.org> |
New counters for LRO packets from hardware TCP offloading.
With tweaks from patrick@ and bluhm@.
OK bluhm@
|
#
c06845b1 |
| 10-May-2023 |
bluhm <bluhm@openbsd.org> |
Implement TCP send offloading, for now in software only. This is meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large
Implement TCP send offloading, for now in software only. This is meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large packets. In IP output the packet is chopped to TCP maximum segment size. This reduces the CPU cycles used by pf. The regular output could be assisted by hardware later, but pf route-to and IPsec needs the software fallback in general. For performance comparison or to workaround possible bugs, sysctl net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows TSO counter with chopped and generated packets. based on work from jan@ tested by jmc@ jan@ Hrvoje Popovski OK jan@ claudio@
show more ...
|
#
b9587575 |
| 14-Mar-2023 |
yasuoka <yasuoka@openbsd.org> |
To avoid misunderstanding, keep variables for tcp keepalive in milliseconds, which is the same unit of tcp_now(). However, keep the unit of sysctl variables in seconds and convert their unit in tcp_
To avoid misunderstanding, keep variables for tcp keepalive in milliseconds, which is the same unit of tcp_now(). However, keep the unit of sysctl variables in seconds and convert their unit in tcp_sysctl(). Additionally revert TCPTV_SRTTDFLT back to 3 seconds, which was mistakenly changed to 1.5 seconds by tcp_timer.h 1.19.
ok claudio
show more ...
|
#
4b9bfff3 |
| 22-Jan-2023 |
mvs <mvs@openbsd.org> |
Move SS_CANTRCVMORE and SS_RCVATMARK bits from `so_state' to `sb_state' of receive buffer. As it was done for SS_CANTSENDMORE bit, the definition kept as is, but now these bits belongs to the `sb_sta
Move SS_CANTRCVMORE and SS_RCVATMARK bits from `so_state' to `sb_state' of receive buffer. As it was done for SS_CANTSENDMORE bit, the definition kept as is, but now these bits belongs to the `sb_state' of receive buffer. `sb_state' ored with `so_state' when socket data exporting to the userland.
ok bluhm@
show more ...
|