History log of /openbsd-src/sys/netinet/tcp_usrreq.c (Results 1 – 25 of 240)
Revision Date Author Comments
# e835bce2 16-Jan-2025 bluhm <bluhm@openbsd.org>

Remove net lock from TCP sysctl for keep alive.

Keep copies in seconds for the sysctl and update timer variables
atomically when they change. tcp_maxidle was historically calculated
in tcp_slowtimo

Remove net lock from TCP sysctl for keep alive.

Keep copies in seconds for the sysctl and update timer variables
atomically when they change. tcp_maxidle was historically calculated
in tcp_slowtimo() as the timers were called from there. Better
calculate maxidle when needed. tcp_timer_init() is useless, just
initialize data. While there make the names consistent.

input sthen@; OK mvs@

show more ...


# 4e5e13a2 09-Jan-2025 bluhm <bluhm@openbsd.org>

Run TCP sysctl ident and drop with shared net lock.

Convert exclusive net lock for TCPCTL_IDENT and TCPCTL_DROP to
shared net lock and push it down into tcp_ident(). Grab the socket
lock there with

Run TCP sysctl ident and drop with shared net lock.

Convert exclusive net lock for TCPCTL_IDENT and TCPCTL_DROP to
shared net lock and push it down into tcp_ident(). Grab the socket
lock there with in_pcbsolock_ref(). Move socket release from
in_pcbsolock() to in_pcbsounlock_rele() and add _ref and _rele
suffix to the inpcb socket lock functions. They both lock and
refcount now. in_pcbsounlock_rele() ignores NULL sockets to make
the unlock path in error case simpler. Socket lock also protects
tcp_drop() and tcp_close() now, so the socket pointer from incpb
may be NULL during unlock. In tcp_ident() improve consistency check
of address family.

OK mvs@

show more ...


# 9b315513 05-Jan-2025 bluhm <bluhm@openbsd.org>

TCP integer sysctl variables are all atomic. Remove net lock.

OK mvs@


# ab8da1a7 04-Jan-2025 mvs <mvs@openbsd.org>

Relax sockets splicing locking.

Sockets splicing works around sockets buffers which have their own locks
for all socket types, especially sblock() on `so_snd' which keeps
sockets being spliced.

-

Relax sockets splicing locking.

Sockets splicing works around sockets buffers which have their own locks
for all socket types, especially sblock() on `so_snd' which keeps
sockets being spliced.

- sosplice() does read-only sockets options and state checks, the only
modification is `so_sp' assignment. The SB_SPLICE bit modification,
`ssp_socket' and `ssp_soback' assignment protected with `sb_mtx'
mutex(9). PCB layer does corresponding checks with `sb_mtx' held, so
shared solock() is pretty enough in sosplice() path. Introduce
special sosplice_solock_pair() for that purpose.

- sounsplice() requires shared socket lock only around so{r,w}wakeup
calls.

- Push exclusive solock() down to tcp(4) case of somove(). Such sockets
are not ready do unlocked somove() yet.


ok bluhm

show more ...


# 66570633 01-Jan-2025 bluhm <bluhm@openbsd.org>

Fix whitespace.


# 507b5b41 31-Dec-2024 mvs <mvs@openbsd.org>

Use per-sockbuf mutex(9) to protect `so_snd' buffer of tcp(4) sockets.

Even for tcp(4) case, sosend() only checks `so_snd' free space and
sleeps if necessary, actual buffer handling happens in soloc

Use per-sockbuf mutex(9) to protect `so_snd' buffer of tcp(4) sockets.

Even for tcp(4) case, sosend() only checks `so_snd' free space and
sleeps if necessary, actual buffer handling happens in solock()ed PCB
layer.

Only unlock sosend() path, the somove() is still locked exclusively. The
"if (dosolock)" dances are useless, but intentionally left as is.

Tested and ok by bluhm.

show more ...


# 77957d73 30-Dec-2024 bluhm <bluhm@openbsd.org>

Remove net lock from TCP syn cache sysctl.

TCP syn cache is protected by mutex. Make access to its sysctl
variables either atomic or put them into this mutex. Then net lock
can be removed.

OK mvs@


# f1bf6f4e 19-Dec-2024 mvs <mvs@openbsd.org>

Use per-sockbuf mutex(9) to protect `so_rcv' buffer of tcp(4) sockets.

Only unlock soreceive() path, somove() path still locked exclusively. Also
exclusive socket lock will be taken in the soreceiv

Use per-sockbuf mutex(9) to protect `so_rcv' buffer of tcp(4) sockets.

Only unlock soreceive() path, somove() path still locked exclusively. Also
exclusive socket lock will be taken in the soreceive() path each time
before pru_rcvd() call.

Note, both socket and `sb_mtx' locks are held while SS_CANTRCVMORE
modified, so socket lock is enough to check it in the protocol input
path.

ok bluhm

show more ...


# 2162e93b 08-Nov-2024 bluhm <bluhm@openbsd.org>

TCP send and receive space update are MP safe.

tcp_update_sndspace() and tcp_update_rcvspace() only read global
variables that do not change after initialization. Mark them as
such. Add braces aro

TCP send and receive space update are MP safe.

tcp_update_sndspace() and tcp_update_rcvspace() only read global
variables that do not change after initialization. Mark them as
such. Add braces around multi-line if blocks.

ok mvs@

show more ...


# 93536db2 12-Apr-2024 bluhm <bluhm@openbsd.org>

Split single TCP inpcb table into IPv4 and IPv6 parts.

With two separate TCP hash tables, each one becomes smaller. When
we remove the exclusive net lock from TCP, contention on internet
PCB table

Split single TCP inpcb table into IPv4 and IPv6 parts.

With two separate TCP hash tables, each one becomes smaller. When
we remove the exclusive net lock from TCP, contention on internet
PCB table mutex will be reduced. UDP has been split earlier into
IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with
assertions.

OK mvs@

show more ...


# 940d25ac 11-Feb-2024 bluhm <bluhm@openbsd.org>

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# a342f0b4 19-Jan-2024 bluhm <bluhm@openbsd.org>

Unify inpcb API for inet and inet6.

Many functions for IPv4 call their IPv6 counterpart if INP_IPV6 is
set at the socket's pcb. By using the generic API consistently,
the logic is not in the caller

Unify inpcb API for inet and inet6.

Many functions for IPv4 call their IPv6 counterpart if INP_IPV6 is
set at the socket's pcb. By using the generic API consistently,
the logic is not in the caller it gets more readable.

OK mvs@

show more ...


# 6285ef23 11-Jan-2024 bluhm <bluhm@openbsd.org>

Fix white spaces in TCP.


# ab485656 03-Dec-2023 bluhm <bluhm@openbsd.org>

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@

show more ...


# cd28665a 01-Dec-2023 bluhm <bluhm@openbsd.org>

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not syc

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@

show more ...


# cff23a6b 01-Dec-2023 bluhm <bluhm@openbsd.org>

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcb

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@

show more ...


# 952c6363 28-Nov-2023 bluhm <bluhm@openbsd.org>

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@

show more ...


# 0bfbfbe7 16-Nov-2023 bluhm <bluhm@openbsd.org>

Run TCP SYN cache timer logik without net lock.

Introduce global TCP SYN cache mutex. Devide timer function in
parts protected by mutex and sending with netlock. Split the flags
field in dynamic f

Run TCP SYN cache timer logik without net lock.

Introduce global TCP SYN cache mutex. Devide timer function in
parts protected by mutex and sending with netlock. Split the flags
field in dynamic flags protected by mutex and fixed flags set during
initialization. Document whether fields of struct syn_cache are
protected by net lock or mutex.

input and OK sashan@

show more ...


# bf0d449c 16-Sep-2023 mpi <mpi@openbsd.org>

Allow counters_read(9) to take an optional scratch buffer.

Using a scratch buffer makes it possible to take a consistent snapshot of
per-CPU counters without having to allocate memory.

Makes ddb(4)

Allow counters_read(9) to take an optional scratch buffer.

Using a scratch buffer makes it possible to take a consistent snapshot of
per-CPU counters without having to allocate memory.

Makes ddb(4) show uvmexp command work in OOM situations.

ok kn@, mvs@, cheloha@

show more ...


# 9e96aff0 06-Jul-2023 bluhm <bluhm@openbsd.org>

Convert tcp_now() time counter to 64 bit.

After changing tcp now tick to milliseconds, 32 bits will wrap
around after 49 days of uptime. That may be a problem in some
places of our stack. Better u

Convert tcp_now() time counter to 64 bit.

After changing tcp now tick to milliseconds, 32 bits will wrap
around after 49 days of uptime. That may be a problem in some
places of our stack. Better use a 64 bit counter.

As timestamp option is 32 bit in TCP protocol, use the lower 32 bit
there. There are casts to 32 bits that should behave correctly.

Start with random 63 bit offset to avoid uptime leakage. 2^63
milliseconds result in 2.9*10^8 years of possible uptime.

OK yasuoka@

show more ...


# a3c0391f 02-Jul-2023 bluhm <bluhm@openbsd.org>

Use TSO and LRO on the loopback interface to transfer TCP faster.

If tcplro is activated on lo(4), ignore the MTU with TCP packets.
They are passed along with the information that they have to be
ch

Use TSO and LRO on the loopback interface to transfer TCP faster.

If tcplro is activated on lo(4), ignore the MTU with TCP packets.
They are passed along with the information that they have to be
chopped in case they are forwarded later. New netstat(1) counter
shows that software LRO is in effect. The feature is currently
turned off by default.

tested by jan@; OK claudio@ jan@

show more ...


# a5a54c4a 23-May-2023 jan <jan@openbsd.org>

New counters for LRO packets from hardware TCP offloading.

With tweaks from patrick@ and bluhm@.

OK bluhm@


# c06845b1 10-May-2023 bluhm <bluhm@openbsd.org>

Implement TCP send offloading, for now in software only. This is
meant as a fallback if network hardware does not support TSO. Driver
support is still work in progress. TCP output generates large

Implement TCP send offloading, for now in software only. This is
meant as a fallback if network hardware does not support TSO. Driver
support is still work in progress. TCP output generates large
packets. In IP output the packet is chopped to TCP maximum segment
size. This reduces the CPU cycles used by pf. The regular output
could be assisted by hardware later, but pf route-to and IPsec needs
the software fallback in general.
For performance comparison or to workaround possible bugs, sysctl
net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows
TSO counter with chopped and generated packets.
based on work from jan@
tested by jmc@ jan@ Hrvoje Popovski
OK jan@ claudio@

show more ...


# b9587575 14-Mar-2023 yasuoka <yasuoka@openbsd.org>

To avoid misunderstanding, keep variables for tcp keepalive in
milliseconds, which is the same unit of tcp_now(). However, keep the
unit of sysctl variables in seconds and convert their unit in
tcp_

To avoid misunderstanding, keep variables for tcp keepalive in
milliseconds, which is the same unit of tcp_now(). However, keep the
unit of sysctl variables in seconds and convert their unit in
tcp_sysctl(). Additionally revert TCPTV_SRTTDFLT back to 3 seconds,
which was mistakenly changed to 1.5 seconds by tcp_timer.h 1.19.

ok claudio

show more ...


# 4b9bfff3 22-Jan-2023 mvs <mvs@openbsd.org>

Move SS_CANTRCVMORE and SS_RCVATMARK bits from `so_state' to `sb_state' of
receive buffer. As it was done for SS_CANTSENDMORE bit, the definition
kept as is, but now these bits belongs to the `sb_sta

Move SS_CANTRCVMORE and SS_RCVATMARK bits from `so_state' to `sb_state' of
receive buffer. As it was done for SS_CANTSENDMORE bit, the definition
kept as is, but now these bits belongs to the `sb_state' of receive
buffer. `sb_state' ored with `so_state' when socket data exporting to the
userland.

ok bluhm@

show more ...


12345678910