xref: /openbsd-src/share/man/man9/sosplice.9 (revision 4c1e55dc91edd6e69ccc60ce855900fbc12cf34f)
1.\"	$OpenBSD: sosplice.9,v 1.3 2011/07/04 00:34:43 mikeb Exp $
2.\"
3.\" Copyright (c) 2011 Alexander Bluhm <bluhm@openbsd.org>
4.\"
5.\" Permission to use, copy, modify, and distribute this software for any
6.\" purpose with or without fee is hereby granted, provided that the above
7.\" copyright notice and this permission notice appear in all copies.
8.\"
9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16.\"
17.Dd $Mdocdate: July 4 2011 $
18.Dt SOSPLICE 9
19.Os
20.Sh NAME
21.Nm sosplice ,
22.Nm somove
23.Nd splice two sockets for zero-copy data transfer
24.Sh SYNOPSIS
25.Ft int
26.Fn sosplice "struct socket *so" "int fd" "off_t max" "struct timeval *tv"
27.Ft int
28.Fn somove "struct socket *so" "int wait"
29.Sh DESCRIPTION
30The function
31.Fn sosplice
32is used to splice together a source and a drain socket.
33The source socket is passed as the
34.Fa so
35argument;
36the file descriptor of the drain is passed in
37.Fa fd .
38If
39.Fa fd
40is negative, an existing splicing gets dissolved.
41If
42.Fa max
43is positive, at most that many bytes will get transferred.
44If
45.Fa tv
46is not NULL, a
47.Xr timeout 9
48is scheduled to dissolve splicing in the case when no data can be
49transferred for the specified period of time.
50Socket splicing can be invoked from user-land via the
51.Xr setsockopt 2
52system-call at the
53.Dv SOL_SOCKET
54level with the socket option
55.Dv SO_SPLICE .
56.Pp
57Before connecting both sockets, several checks are executed.
58See the
59.Sx ERRORS
60section for possible failures.
61The connection between both sockets is implemented by setting these
62additional fields in
63.Vt struct socket :
64.Pp
65.Bl -dash -compact -offset indent
66.It
67.Vt struct socket Fa *so_splice
68links from the source to the drain socket.
69.It
70.Vt struct socket Fa *so_spliceback
71links back from the drain to the source socket.
72.It
73.Vt off_t Fa so_splicelen
74counts the number of bytes spliced so far from this socket.
75.It
76.Vt off_t Fa so_splicemax
77specifies the maximum number of bytes to splice from this socket if
78non-zero.
79.El
80.Pp
81After connecting both sockets,
82.Fn sosplice
83calls
84.Fn somove
85to transfer the mbufs already in the source receive buffer to the
86drain send buffer.
87Finally the socket buffer flag
88.Dv SB_SPLICE
89is set on both socket buffers, to indicate that the protocol layer
90has to call
91.Fn somove
92whenever data or space is available.
93.Pp
94The function
95.Fn somove
96transfers data from the source's receive buffer to the drain's send
97buffer.
98It must be called at
99.Xr splsoftnet 9
100and
101.Fa so
102must be a spliced drain socket.
103It may be necessary to split an mbuf to handle out-of-band data
104inline or when the maximum splice length has been reached.
105If
106.Fa wait
107is
108.Dv M_WAIT ,
109splitting mbufs will always succeed.
110For
111.Dv M_DONTWAIT
112the out-of-band property might get lost or a short splice might
113happen.
114In the latter case, less than the given maximum number of bytes are
115transferred and user-land has to cope with this.
116Note that a short splice cannot happen if
117.Fn somove
118was called by
119.Fn sosplice .
120So a second
121.Xr setsockopt 2
122after a short splice pointing to the same maximum will always
123succeed.
124.Pp
125Before transferring data,
126.Fn somove
127checks both sockets for errors and that the drain socket is connected.
128If the drain cannot send anymore, an
129.Er EPIPE
130error is set on the source socket.
131The data length to move is limited by the optional maximum splice
132length and the space in the drain's send socket buffer.
133Up to this amount of data is taken out of the source's receive
134socket buffer.
135.Pp
136If the maximum splice length has been reached, an mbuf may get
137split.
138Otherwise an mbuf is either moved completely to the send buffer or
139left in the receive buffer for later processing.
140If SO_OOBINLINE is set, out-of-band data will get moved as such
141although this might not be reliable.
142The data is sent out to the drain socket via the protocol function.
143If that fails and the drain socket cannot send anymore, an
144.Er EPIPE
145error is set on the source socket.
146.Pp
147If the idle timeout was specified and no data was transferred
148for that period of time, splicing gets dissolved and an
149.Er ETIMEDOUT
150error is set on the source socket.
151.Pp
152Finally the socket splicing gets dissolved if the source socket
153cannot receive anymore and its receive buffer is empty; or if the
154drain socket cannot send anymore; or if the maximum has been reached;
155or if an error occurred.
156.Pp
157If the socket buffer flag
158.Dv SB_SPLICE
159is set, the functions
160.Fn sorwakeup
161and
162.Fn sowwakeup
163will call
164.Fn somove
165to trigger the transfer when new data or buffer space is available.
166While socket splicing is active, any
167.Xr read 2
168from the source socket will block and the wakeup will not be delivered
169to the file descriptor.
170A read event is signaled to user-land after dissolving.
171.Sh RETURN VALUES
172.Fn sosplice
173returns 0 on success and otherwise the error number.
174.Fn somove
175returns 0 if socket splicing has been finished and 1 if it continues.
176.Sh ERRORS
177.Fn sosplice
178will succeed unless:
179.Bl -tag -width Er
180.It Bq Er EBADF
181The given file descriptor
182.Fa fd
183is not an active descriptor.
184.It Bq Er EBUSY
185The source or the drain socket is already spliced.
186.It Bq Er EINVAL
187The given maximum value
188.Fa max
189is negative.
190.It Bq Er ENOTCONN
191The source or the drain socket is neither connected nor in the
192process of connecting to a peer.
193.It Bq Er ENOTSOCK
194The given file descriptor
195.Fa fd
196is not a socket.
197.It Bq Er EOPNOTSUPP
198The source or the drain socket is a listen socket.
199.It Bq Er EPROTONOSUPPORT
200The source socket's protocol layer does not have the
201.Dv PR_SPLICE
202flag set.
203At the moment only TCP supports socket splicing.
204.It Bq Er EPROTONOSUPPORT
205The drain socket's protocol does not have the same
206.Fa pr_usrreq
207function as the source.
208.It Bq Er EWOULDBLOCK
209The source socket is non-blocking and the receive buffer is already
210locked.
211.El
212.Sh SEE ALSO
213.Xr setsockopt 2 ,
214.Xr options 4 ,
215.Xr timeout 9
216.Sh HISTORY
217Socket splicing first appeared in
218.Ox 4.9 .
219.Sh AUTHORS
220.An -nosplit
221The idea for socket splicing originally came from
222.An Markus Friedl Aq markus@openbsd.org ,
223and
224.An Alexander Bluhm Aq bluhm@openbsd.org
225implemented it.
226