xref: /openbsd-src/share/man/man9/sosplice.9 (revision f2da64fbbbf1b03f09f390ab01267c93dfd77c4c)
1.\"	$OpenBSD: sosplice.9,v 1.8 2016/06/13 21:24:43 bluhm Exp $
2.\"
3.\" Copyright (c) 2011-2013 Alexander Bluhm <bluhm@openbsd.org>
4.\"
5.\" Permission to use, copy, modify, and distribute this software for any
6.\" purpose with or without fee is hereby granted, provided that the above
7.\" copyright notice and this permission notice appear in all copies.
8.\"
9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16.\"
17.Dd $Mdocdate: June 13 2016 $
18.Dt SOSPLICE 9
19.Os
20.Sh NAME
21.Nm sosplice ,
22.Nm somove
23.Nd splice two sockets for zero-copy data transfer
24.Sh SYNOPSIS
25.Ft int
26.Fn sosplice "struct socket *so" "int fd" "off_t max" "struct timeval *tv"
27.Ft int
28.Fn somove "struct socket *so" "int wait"
29.Sh DESCRIPTION
30The function
31.Fn sosplice
32is used to splice together a source and a drain socket.
33The source socket is passed as the
34.Fa so
35argument;
36the file descriptor of the drain is passed in
37.Fa fd .
38If
39.Fa fd
40is negative, an existing splicing gets dissolved.
41If
42.Fa max
43is positive, at most that many bytes will get transferred.
44If
45.Fa tv
46is not NULL, a
47.Xr timeout 9
48is scheduled to dissolve splicing in the case when no data can be
49transferred for the specified period of time.
50Socket splicing can be invoked from userland via the
51.Xr setsockopt 2
52system-call at the
53.Dv SOL_SOCKET
54level with the socket option
55.Dv SO_SPLICE .
56.Pp
57Before connecting both sockets, several checks are executed.
58See the
59.Sx ERRORS
60section for possible failures.
61The connection between both sockets is implemented by setting these
62additional fields in
63.Vt struct socket :
64.Pp
65.Bl -dash -compact -offset indent
66.It
67.Vt struct socket Fa *so_splice
68links from the source to the drain socket.
69.It
70.Vt struct socket Fa *so_spliceback
71links back from the drain to the source socket.
72.It
73.Vt off_t Fa so_splicelen
74counts the number of bytes spliced so far from this socket.
75.It
76.Vt off_t Fa so_splicemax
77specifies the maximum number of bytes to splice from this socket if
78non-zero.
79.It
80.Vt struct timeval Fa so_idletv
81specifies the maximum idle time if non-zero.
82.It
83.Vt struct timeout Fa so_idleto
84provides storage for the kernel timeout if idle time is used.
85.El
86.Pp
87After connecting both sockets,
88.Fn sosplice
89calls
90.Fn somove
91to transfer the mbufs already in the source receive buffer to the
92drain send buffer.
93Finally the socket buffer flag
94.Dv SB_SPLICE
95is set on both socket buffers, to indicate that the protocol layer
96has to call
97.Fn somove
98whenever data or space is available.
99.Pp
100The function
101.Fn somove
102transfers data from the source's receive buffer to the drain's send
103buffer.
104It must be called at
105.Xr splsoftnet 9
106and
107.Fa so
108must be a spliced source socket.
109It may be necessary to split an mbuf to handle out-of-band data
110inline or when the maximum splice length has been reached.
111If
112.Fa wait
113is
114.Dv M_WAIT ,
115splitting mbufs will always succeed.
116For
117.Dv M_DONTWAIT
118the out-of-band property might get lost or a short splice might
119happen.
120In the latter case, less than the given maximum number of bytes are
121transferred and userland has to cope with this.
122Note that a short splice cannot happen if
123.Fn somove
124was called by
125.Fn sosplice .
126So a second
127.Xr setsockopt 2
128after a short splice pointing to the same maximum will always
129succeed.
130.Pp
131Before transferring data,
132.Fn somove
133checks both sockets for errors and that the drain socket is connected.
134If the drain cannot send anymore, an
135.Er EPIPE
136error is set on the source socket.
137The data length to move is limited by the optional maximum splice
138length and the space in the drain's send socket buffer.
139Up to this amount of data is taken out of the source's receive
140socket buffer.
141To avoid splicing loops created by userland, the number of times
142an mbuf may be moved between sockets is limited to 128.
143.Pp
144For atomic protocols, either one complete packet is taken out, or
145nothing is taken at all if:
146the packet is bigger than the drain's send buffer size, in which
147case the splicing gets aborted with an
148.Er EMSGSIZE
149error;
150the packet does not fit into the drain's current send buffer space,
151in which case it is left in the source's receive buffer for later
152processing;
153or the maximum splice length is located within a packet, in which
154case splicing gets dissolved like a short splice.
155All address or control mbufs associated with the taken packet are
156dropped.
157.Pp
158If the maximum splice length has been reached, an mbuf may get
159split for non-atomic protocols.
160Otherwise an mbuf is either moved completely to the send buffer or
161left in the receive buffer for later processing.
162If SO_OOBINLINE is set, out-of-band data will get moved as such
163although this might not be reliable.
164The data is sent out to the drain socket via the protocol function.
165If that fails and the drain socket cannot send anymore, an
166.Er EPIPE
167error is set on the source socket.
168.Pp
169For packet oriented protocols
170.Fn somove
171iterates over the next packet queue.
172.Pp
173If a maximum splice length was specified and at least this amount
174of data has been received from the drain socket, splicing gets
175dissolved.
176In this case, an
177.Er EFBIG
178error is set on the source socket if the maximum amount of data has
179been transferred.
180Userland can process this error to distinguish the full splice from
181a short splice or to react to the completed maximum splice immediately.
182If an idle timeout was specified and no data has been transferred
183for that period of time, the handler
184.Fn soidle
185dissolves splicing and sets an
186.Er ETIMEDOUT
187error on the source socket.
188.Pp
189The function
190.Fn sounsplice
191is called to dissolve the socket splicing if the source socket
192cannot receive anymore and its receive buffer is empty; or if the
193drain socket cannot send anymore; or if the maximum has been reached;
194or if an error occurred; or if the idle timeout has fired.
195.Pp
196If the socket buffer flag
197.Dv SB_SPLICE
198is set, the functions
199.Fn sorwakeup
200and
201.Fn sowwakeup
202will call
203.Fn somove
204to trigger the transfer when new data or buffer space is available.
205While socket splicing is active, any
206.Xr read 2
207from the source socket will block and the wakeup will not be delivered
208to the file descriptor.
209A read event or a socket error is signaled to userland after
210dissolving.
211.Sh RETURN VALUES
212.Fn sosplice
213returns 0 on success and otherwise the error number.
214.Fn somove
215returns 0 if socket splicing has been finished and 1 if it continues.
216.Sh ERRORS
217.Fn sosplice
218will succeed unless:
219.Bl -tag -width Er
220.It Bq Er EBADF
221The given file descriptor
222.Fa fd
223is not an active descriptor.
224.It Bq Er EBUSY
225The source or the drain socket is already spliced.
226.It Bq Er EINVAL
227The given maximum value
228.Fa max
229is negative.
230.It Bq Er ENOTCONN
231The source socket requires a connection and is neither connected
232nor in the process of connecting to a peer.
233.It Bq Er ENOTCONN
234The drain socket is neither connected nor in the process of connecting
235to a peer.
236.It Bq Er ENOTSOCK
237The given file descriptor
238.Fa fd
239is not a socket.
240.It Bq Er EOPNOTSUPP
241The source or the drain socket is a listen socket.
242.It Bq Er EPROTONOSUPPORT
243The source socket's protocol layer does not have the
244.Dv PR_SPLICE
245flag set.
246Only TCP and UDP socket splicing is supported.
247.It Bq Er EPROTONOSUPPORT
248The drain socket's protocol does not have the same
249.Fa pr_usrreq
250function as the source.
251.It Bq Er EWOULDBLOCK
252The source socket is non-blocking and the receive buffer is already
253locked.
254.El
255.Sh SEE ALSO
256.Xr setsockopt 2 ,
257.Xr options 4 ,
258.Xr timeout 9
259.Sh HISTORY
260Socket splicing for TCP first appeared in
261.Ox 4.9 ;
262support for UDP was added in
263.Ox 5.3 .
264.Sh AUTHORS
265.An -nosplit
266The idea for socket splicing originally came from
267.An Markus Friedl Aq Mt markus@openbsd.org ,
268and
269.An Alexander Bluhm Aq Mt bluhm@openbsd.org
270implemented it.
271.An Mike Belopuhov Aq Mt mikeb@openbsd.org
272added the timeout feature.
273