1.\" $OpenBSD: sosplice.9,v 1.8 2016/06/13 21:24:43 bluhm Exp $ 2.\" 3.\" Copyright (c) 2011-2013 Alexander Bluhm <bluhm@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\" 17.Dd $Mdocdate: June 13 2016 $ 18.Dt SOSPLICE 9 19.Os 20.Sh NAME 21.Nm sosplice , 22.Nm somove 23.Nd splice two sockets for zero-copy data transfer 24.Sh SYNOPSIS 25.Ft int 26.Fn sosplice "struct socket *so" "int fd" "off_t max" "struct timeval *tv" 27.Ft int 28.Fn somove "struct socket *so" "int wait" 29.Sh DESCRIPTION 30The function 31.Fn sosplice 32is used to splice together a source and a drain socket. 33The source socket is passed as the 34.Fa so 35argument; 36the file descriptor of the drain is passed in 37.Fa fd . 38If 39.Fa fd 40is negative, an existing splicing gets dissolved. 41If 42.Fa max 43is positive, at most that many bytes will get transferred. 44If 45.Fa tv 46is not NULL, a 47.Xr timeout 9 48is scheduled to dissolve splicing in the case when no data can be 49transferred for the specified period of time. 50Socket splicing can be invoked from userland via the 51.Xr setsockopt 2 52system-call at the 53.Dv SOL_SOCKET 54level with the socket option 55.Dv SO_SPLICE . 56.Pp 57Before connecting both sockets, several checks are executed. 58See the 59.Sx ERRORS 60section for possible failures. 61The connection between both sockets is implemented by setting these 62additional fields in 63.Vt struct socket : 64.Pp 65.Bl -dash -compact -offset indent 66.It 67.Vt struct socket Fa *so_splice 68links from the source to the drain socket. 69.It 70.Vt struct socket Fa *so_spliceback 71links back from the drain to the source socket. 72.It 73.Vt off_t Fa so_splicelen 74counts the number of bytes spliced so far from this socket. 75.It 76.Vt off_t Fa so_splicemax 77specifies the maximum number of bytes to splice from this socket if 78non-zero. 79.It 80.Vt struct timeval Fa so_idletv 81specifies the maximum idle time if non-zero. 82.It 83.Vt struct timeout Fa so_idleto 84provides storage for the kernel timeout if idle time is used. 85.El 86.Pp 87After connecting both sockets, 88.Fn sosplice 89calls 90.Fn somove 91to transfer the mbufs already in the source receive buffer to the 92drain send buffer. 93Finally the socket buffer flag 94.Dv SB_SPLICE 95is set on both socket buffers, to indicate that the protocol layer 96has to call 97.Fn somove 98whenever data or space is available. 99.Pp 100The function 101.Fn somove 102transfers data from the source's receive buffer to the drain's send 103buffer. 104It must be called at 105.Xr splsoftnet 9 106and 107.Fa so 108must be a spliced source socket. 109It may be necessary to split an mbuf to handle out-of-band data 110inline or when the maximum splice length has been reached. 111If 112.Fa wait 113is 114.Dv M_WAIT , 115splitting mbufs will always succeed. 116For 117.Dv M_DONTWAIT 118the out-of-band property might get lost or a short splice might 119happen. 120In the latter case, less than the given maximum number of bytes are 121transferred and userland has to cope with this. 122Note that a short splice cannot happen if 123.Fn somove 124was called by 125.Fn sosplice . 126So a second 127.Xr setsockopt 2 128after a short splice pointing to the same maximum will always 129succeed. 130.Pp 131Before transferring data, 132.Fn somove 133checks both sockets for errors and that the drain socket is connected. 134If the drain cannot send anymore, an 135.Er EPIPE 136error is set on the source socket. 137The data length to move is limited by the optional maximum splice 138length and the space in the drain's send socket buffer. 139Up to this amount of data is taken out of the source's receive 140socket buffer. 141To avoid splicing loops created by userland, the number of times 142an mbuf may be moved between sockets is limited to 128. 143.Pp 144For atomic protocols, either one complete packet is taken out, or 145nothing is taken at all if: 146the packet is bigger than the drain's send buffer size, in which 147case the splicing gets aborted with an 148.Er EMSGSIZE 149error; 150the packet does not fit into the drain's current send buffer space, 151in which case it is left in the source's receive buffer for later 152processing; 153or the maximum splice length is located within a packet, in which 154case splicing gets dissolved like a short splice. 155All address or control mbufs associated with the taken packet are 156dropped. 157.Pp 158If the maximum splice length has been reached, an mbuf may get 159split for non-atomic protocols. 160Otherwise an mbuf is either moved completely to the send buffer or 161left in the receive buffer for later processing. 162If SO_OOBINLINE is set, out-of-band data will get moved as such 163although this might not be reliable. 164The data is sent out to the drain socket via the protocol function. 165If that fails and the drain socket cannot send anymore, an 166.Er EPIPE 167error is set on the source socket. 168.Pp 169For packet oriented protocols 170.Fn somove 171iterates over the next packet queue. 172.Pp 173If a maximum splice length was specified and at least this amount 174of data has been received from the drain socket, splicing gets 175dissolved. 176In this case, an 177.Er EFBIG 178error is set on the source socket if the maximum amount of data has 179been transferred. 180Userland can process this error to distinguish the full splice from 181a short splice or to react to the completed maximum splice immediately. 182If an idle timeout was specified and no data has been transferred 183for that period of time, the handler 184.Fn soidle 185dissolves splicing and sets an 186.Er ETIMEDOUT 187error on the source socket. 188.Pp 189The function 190.Fn sounsplice 191is called to dissolve the socket splicing if the source socket 192cannot receive anymore and its receive buffer is empty; or if the 193drain socket cannot send anymore; or if the maximum has been reached; 194or if an error occurred; or if the idle timeout has fired. 195.Pp 196If the socket buffer flag 197.Dv SB_SPLICE 198is set, the functions 199.Fn sorwakeup 200and 201.Fn sowwakeup 202will call 203.Fn somove 204to trigger the transfer when new data or buffer space is available. 205While socket splicing is active, any 206.Xr read 2 207from the source socket will block and the wakeup will not be delivered 208to the file descriptor. 209A read event or a socket error is signaled to userland after 210dissolving. 211.Sh RETURN VALUES 212.Fn sosplice 213returns 0 on success and otherwise the error number. 214.Fn somove 215returns 0 if socket splicing has been finished and 1 if it continues. 216.Sh ERRORS 217.Fn sosplice 218will succeed unless: 219.Bl -tag -width Er 220.It Bq Er EBADF 221The given file descriptor 222.Fa fd 223is not an active descriptor. 224.It Bq Er EBUSY 225The source or the drain socket is already spliced. 226.It Bq Er EINVAL 227The given maximum value 228.Fa max 229is negative. 230.It Bq Er ENOTCONN 231The source socket requires a connection and is neither connected 232nor in the process of connecting to a peer. 233.It Bq Er ENOTCONN 234The drain socket is neither connected nor in the process of connecting 235to a peer. 236.It Bq Er ENOTSOCK 237The given file descriptor 238.Fa fd 239is not a socket. 240.It Bq Er EOPNOTSUPP 241The source or the drain socket is a listen socket. 242.It Bq Er EPROTONOSUPPORT 243The source socket's protocol layer does not have the 244.Dv PR_SPLICE 245flag set. 246Only TCP and UDP socket splicing is supported. 247.It Bq Er EPROTONOSUPPORT 248The drain socket's protocol does not have the same 249.Fa pr_usrreq 250function as the source. 251.It Bq Er EWOULDBLOCK 252The source socket is non-blocking and the receive buffer is already 253locked. 254.El 255.Sh SEE ALSO 256.Xr setsockopt 2 , 257.Xr options 4 , 258.Xr timeout 9 259.Sh HISTORY 260Socket splicing for TCP first appeared in 261.Ox 4.9 ; 262support for UDP was added in 263.Ox 5.3 . 264.Sh AUTHORS 265.An -nosplit 266The idea for socket splicing originally came from 267.An Markus Friedl Aq Mt markus@openbsd.org , 268and 269.An Alexander Bluhm Aq Mt bluhm@openbsd.org 270implemented it. 271.An Mike Belopuhov Aq Mt mikeb@openbsd.org 272added the timeout feature. 273