1.\" $OpenBSD: sosplice.9,v 1.3 2011/07/04 00:34:43 mikeb Exp $ 2.\" 3.\" Copyright (c) 2011 Alexander Bluhm <bluhm@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\" 17.Dd $Mdocdate: July 4 2011 $ 18.Dt SOSPLICE 9 19.Os 20.Sh NAME 21.Nm sosplice , 22.Nm somove 23.Nd splice two sockets for zero-copy data transfer 24.Sh SYNOPSIS 25.Ft int 26.Fn sosplice "struct socket *so" "int fd" "off_t max" "struct timeval *tv" 27.Ft int 28.Fn somove "struct socket *so" "int wait" 29.Sh DESCRIPTION 30The function 31.Fn sosplice 32is used to splice together a source and a drain socket. 33The source socket is passed as the 34.Fa so 35argument; 36the file descriptor of the drain is passed in 37.Fa fd . 38If 39.Fa fd 40is negative, an existing splicing gets dissolved. 41If 42.Fa max 43is positive, at most that many bytes will get transferred. 44If 45.Fa tv 46is not NULL, a 47.Xr timeout 9 48is scheduled to dissolve splicing in the case when no data can be 49transferred for the specified period of time. 50Socket splicing can be invoked from user-land via the 51.Xr setsockopt 2 52system-call at the 53.Dv SOL_SOCKET 54level with the socket option 55.Dv SO_SPLICE . 56.Pp 57Before connecting both sockets, several checks are executed. 58See the 59.Sx ERRORS 60section for possible failures. 61The connection between both sockets is implemented by setting these 62additional fields in 63.Vt struct socket : 64.Pp 65.Bl -dash -compact -offset indent 66.It 67.Vt struct socket Fa *so_splice 68links from the source to the drain socket. 69.It 70.Vt struct socket Fa *so_spliceback 71links back from the drain to the source socket. 72.It 73.Vt off_t Fa so_splicelen 74counts the number of bytes spliced so far from this socket. 75.It 76.Vt off_t Fa so_splicemax 77specifies the maximum number of bytes to splice from this socket if 78non-zero. 79.El 80.Pp 81After connecting both sockets, 82.Fn sosplice 83calls 84.Fn somove 85to transfer the mbufs already in the source receive buffer to the 86drain send buffer. 87Finally the socket buffer flag 88.Dv SB_SPLICE 89is set on both socket buffers, to indicate that the protocol layer 90has to call 91.Fn somove 92whenever data or space is available. 93.Pp 94The function 95.Fn somove 96transfers data from the source's receive buffer to the drain's send 97buffer. 98It must be called at 99.Xr splsoftnet 9 100and 101.Fa so 102must be a spliced drain socket. 103It may be necessary to split an mbuf to handle out-of-band data 104inline or when the maximum splice length has been reached. 105If 106.Fa wait 107is 108.Dv M_WAIT , 109splitting mbufs will always succeed. 110For 111.Dv M_DONTWAIT 112the out-of-band property might get lost or a short splice might 113happen. 114In the latter case, less than the given maximum number of bytes are 115transferred and user-land has to cope with this. 116Note that a short splice cannot happen if 117.Fn somove 118was called by 119.Fn sosplice . 120So a second 121.Xr setsockopt 2 122after a short splice pointing to the same maximum will always 123succeed. 124.Pp 125Before transferring data, 126.Fn somove 127checks both sockets for errors and that the drain socket is connected. 128If the drain cannot send anymore, an 129.Er EPIPE 130error is set on the source socket. 131The data length to move is limited by the optional maximum splice 132length and the space in the drain's send socket buffer. 133Up to this amount of data is taken out of the source's receive 134socket buffer. 135.Pp 136If the maximum splice length has been reached, an mbuf may get 137split. 138Otherwise an mbuf is either moved completely to the send buffer or 139left in the receive buffer for later processing. 140If SO_OOBINLINE is set, out-of-band data will get moved as such 141although this might not be reliable. 142The data is sent out to the drain socket via the protocol function. 143If that fails and the drain socket cannot send anymore, an 144.Er EPIPE 145error is set on the source socket. 146.Pp 147If the idle timeout was specified and no data was transferred 148for that period of time, splicing gets dissolved and an 149.Er ETIMEDOUT 150error is set on the source socket. 151.Pp 152Finally the socket splicing gets dissolved if the source socket 153cannot receive anymore and its receive buffer is empty; or if the 154drain socket cannot send anymore; or if the maximum has been reached; 155or if an error occurred. 156.Pp 157If the socket buffer flag 158.Dv SB_SPLICE 159is set, the functions 160.Fn sorwakeup 161and 162.Fn sowwakeup 163will call 164.Fn somove 165to trigger the transfer when new data or buffer space is available. 166While socket splicing is active, any 167.Xr read 2 168from the source socket will block and the wakeup will not be delivered 169to the file descriptor. 170A read event is signaled to user-land after dissolving. 171.Sh RETURN VALUES 172.Fn sosplice 173returns 0 on success and otherwise the error number. 174.Fn somove 175returns 0 if socket splicing has been finished and 1 if it continues. 176.Sh ERRORS 177.Fn sosplice 178will succeed unless: 179.Bl -tag -width Er 180.It Bq Er EBADF 181The given file descriptor 182.Fa fd 183is not an active descriptor. 184.It Bq Er EBUSY 185The source or the drain socket is already spliced. 186.It Bq Er EINVAL 187The given maximum value 188.Fa max 189is negative. 190.It Bq Er ENOTCONN 191The source or the drain socket is neither connected nor in the 192process of connecting to a peer. 193.It Bq Er ENOTSOCK 194The given file descriptor 195.Fa fd 196is not a socket. 197.It Bq Er EOPNOTSUPP 198The source or the drain socket is a listen socket. 199.It Bq Er EPROTONOSUPPORT 200The source socket's protocol layer does not have the 201.Dv PR_SPLICE 202flag set. 203At the moment only TCP supports socket splicing. 204.It Bq Er EPROTONOSUPPORT 205The drain socket's protocol does not have the same 206.Fa pr_usrreq 207function as the source. 208.It Bq Er EWOULDBLOCK 209The source socket is non-blocking and the receive buffer is already 210locked. 211.El 212.Sh SEE ALSO 213.Xr setsockopt 2 , 214.Xr options 4 , 215.Xr timeout 9 216.Sh HISTORY 217Socket splicing first appeared in 218.Ox 4.9 . 219.Sh AUTHORS 220.An -nosplit 221The idea for socket splicing originally came from 222.An Markus Friedl Aq markus@openbsd.org , 223and 224.An Alexander Bluhm Aq bluhm@openbsd.org 225implemented it. 226