1.\" $NetBSD: tcp.4,v 1.31 2015/02/14 13:02:38 wiz Exp $ 2.\" $FreeBSD: tcp.4,v 1.11.2.16 2004/02/16 22:21:47 bms Exp $ 3.\" 4.\" Copyright (c) 1983, 1991, 1993 5.\" The Regents of the University of California. All rights reserved. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 3. Neither the name of the University nor the names of its contributors 16.\" may be used to endorse or promote products derived from this software 17.\" without specific prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 22.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 29.\" SUCH DAMAGE. 30.\" 31.\" @(#)tcp.4 8.1 (Berkeley) 6/5/93 32.\" 33.Dd February 14, 2015 34.Dt TCP 4 35.Os 36.Sh NAME 37.Nm tcp 38.Nd Internet Transmission Control Protocol 39.Sh SYNOPSIS 40.In sys/socket.h 41.In netinet/in.h 42.Ft int 43.Fn socket AF_INET SOCK_STREAM 0 44.Ft int 45.Fn socket AF_INET6 SOCK_STREAM 0 46.Sh DESCRIPTION 47The 48.Tn TCP 49provides reliable, flow-controlled, two-way transmission of data. 50It is a byte-stream protocol used to support the 51.Dv SOCK_STREAM 52abstraction. 53.Tn TCP 54uses the standard Internet address format and, in addition, provides 55a per-host collection of 56.Dq port addresses . 57Thus, each address is composed of an Internet address specifying 58the host and network, with a specific 59.Tn TCP 60port on the host identifying the peer entity. 61.Pp 62Sockets using 63.Tn TCP 64are either 65.Dq active 66or 67.Dq passive . 68Active sockets initiate connections to passive 69sockets. 70By default 71.Tn TCP 72sockets are created active; to create a passive socket the 73.Xr listen 2 74system call must be used 75after binding the socket with the 76.Xr bind 2 77system call. 78Only passive sockets may use the 79.Xr accept 2 80call to accept incoming connections. 81Only active sockets may use the 82.Xr connect 2 83call to initiate connections. 84.Pp 85Passive sockets may 86.Dq underspecify 87their location to match incoming connection requests from multiple networks. 88This technique, termed 89.Dq wildcard addressing , 90allows a single 91server to provide service to clients on multiple networks. 92To create a socket which listens on all networks, the Internet 93address 94.Dv INADDR_ANY 95must be bound. 96The 97.Tn TCP 98port may still be specified at this time; if the port is not 99specified the system will assign one. 100Once a connection has been established the socket's address is 101fixed by the peer entity's location. 102The address assigned the socket is the address associated with the 103network interface through which packets are being transmitted and received. 104Normally this address corresponds to the peer entity's network. 105.Pp 106.Tn TCP 107supports a number of socket options which can be set with 108.Xr setsockopt 2 109and tested with 110.Xr getsockopt 2 : 111.Bl -tag -width TCP_KEEPINTVL 112.It Dv TCP_NODELAY 113Under most circumstances, 114.Tn TCP 115sends data when it is presented; 116when outstanding data has not yet been acknowledged, it gathers 117small amounts of output to be sent in a single packet once 118an acknowledgment is received. 119For a small number of clients, such as window systems 120that send a stream of mouse events which receive no replies, 121this packetization may cause significant delays. 122Therefore, 123.Tn TCP 124provides a boolean option, 125.Dv TCP_NODELAY 126(from 127.In netinet/tcp.h , 128to defeat this algorithm. 129.It Dv TCP_MAXSEG 130By default, a sender- and receiver-TCP 131will negotiate among themselves to determine the maximum segment size 132to be used for each connection. 133The 134.Dv TCP_MAXSEG 135option allows the user to determine the result of this negotiation, 136and to reduce it if desired. 137.It Dv TCP_MD5SIG 138This option enables the use of MD5 digests (also known as TCP-MD5) 139on writes to the specified socket. 140In the current release, only outgoing traffic is digested; 141digests on incoming traffic are not verified. 142The current default behavior for the system is to respond to a system 143advertising this option with TCP-MD5; this may change. 144.Pp 145One common use for this in a 146.Nx 147router deployment is to enable 148based routers to interwork with Cisco equipment at peering points. 149Support for this feature conforms to RFC 2385. 150Only IPv4 (AF_INET) sessions are supported. 151.Pp 152In order for this option to function correctly, it is necessary for the 153administrator to add a tcp-md5 key entry to the system's security 154associations database (SADB) using the 155.Xr setkey 8 156utility. 157This entry must have an SPI of 0x1000 and can therefore only be specified 158on a per-host basis at this time. 159.Pp 160If an SADB entry cannot be found for the destination, the outgoing traffic 161will have an invalid digest option prepended, and the following error message 162will be visible on the system console: 163.Em "tcp_signature_compute: SADB lookup failed for %d.%d.%d.%d" . 164.It Dv TCP_KEEPIDLE 165.\" XXX: We always do it. 166.\" When the 167.\" .Dv SO_KEEPALIVE 168.\" option is enabled, 169TCP probes a connection that 170has been idle for some amount of time. 171The default value for this idle period is 4 hours. 172The 173.Dv TCP_KEEPIDLE 174option can be used to affect this value for a given socket, and specifies 175the number of seconds of idle time between keepalive probes. 176This option takes an 177.Vt "unsigned int" 178value, with a value greater than 0. 179.\" range of 1 to N (where N is 180.\" the 181.\" .Xr sysctl 8 182.\" variable 183.\" .Dv net.inet.tcp.keepidle ). 184.\" divided by 185.\" .Dv PR_SLOWHZ 186.\" which is defined in the 187.\" .In sys/protosw.h 188.\" header file). 189.It Dv TCP_KEEPINTVL 190When the 191.Dv SO_KEEPALIVE 192option is enabled, TCP probes a connection that 193has been idle for some amount of time. 194If the remote system does not 195respond to a keepalive probe, TCP retransmits the probe after some 196amount of time. 197The default value for this retransmit interval is 150 seconds. 198The 199.Dv TCP_KEEPINTVL 200option can be used to affect this value for 201a given socket, and specifies the number of seconds to wait before 202retransmitting a keepalive probe. 203This option takes an 204.Vt "unsigned int" 205value, with a value greater than 0. 206.\" range of 1 to N (where N is the 207.\" .Xr sysctl 8 208.\" variable 209.\" .Dv net.inet.tcp.keepintvl ). 210.It Dv TCP_KEEPCNT 211When the 212.Dv SO_KEEPALIVE 213option is enabled, TCP probes a connection that 214has been idle for some amount of time. 215If the remote system does not 216respond to a keepalive probe, TCP retransmits the probe a certain 217number of times before a connection is considered to be broken. 218The default value for this keepalive probe retransmit limit is 8. 219The 220.Dv TCP_KEEPCNT 221option can be used to affect this value for a given socket, 222and specifies the maximum number of keepalive probes to be sent. 223This option takes an 224.Vt "unsigned int" 225value, with a value greater than 0. 226.\" range of 0 to N (where N is the 227.\" .Xr sysctl 8 228.\" variable 229.\" .Dv net.inet.tcp.keepcnt ). 230.It Dv TCP_KEEPINIT 231If a TCP connection cannot be established within some amount of time, 232TCP will time out the connect attempt. 233The default value for this initial connection establishment timeout 234is 150 seconds. 235The 236.Dv TCP_KEEPINIT 237option can be used to affect this initial timeout period for a given 238socket, and specifies the number of seconds to wait before the connect 239attempt is timed out. 240For passive connections, the 241.Dv TCP_KEEPINIT 242option value is inherited from the listening socket. 243This option takes an 244.Vt "unsigned int" 245value, with a value greater than 0. 246.It Dv TCP_INFO 247Information about a socket's underlying TCP session may be retrieved 248by passing the read-only option 249.Dv TPC_INFO 250to 251.Xr getsockopt 2 . 252It accepts a single argument: a pointer to an instance of 253.Vt "struct tcp_info" . 254.Pp 255This API is subject to change; consult the source to determine 256which fields are currently filled out by this option. 257.Nx 258specific additions include 259send window size, 260receive window size, 261and 262bandwidth-controlled window space. 263.\" range of 0 to N (where N is the 264.\" .Xr sysctl 8 265.\" variable 266.\" .Dv net.inet.tcp.keepinit ). 267.El 268.Pp 269The option level for the 270.Xr setsockopt 2 271call is the protocol number for 272.Tn TCP , 273available from 274.Xr getprotobyname 3 . 275.Pp 276In the historical 277.Bx 278.Tn TCP 279implementation, if the 280.Dv TCP_NODELAY 281option was set on a passive socket, the sockets returned by 282.Xr accept 2 283erroneously did not have the 284.Dv TCP_NODELAY 285option set; the behavior was corrected to inherit 286.Dv TCP_NODELAY 287in 288.Nx 1.6 . 289.Pp 290Options at the 291.Tn IP 292network level may be used with 293.Tn TCP ; 294see 295.Xr ip 4 296or 297.Xr ip6 4 . 298Incoming connection requests that are source-routed are noted, 299and the reverse source route is used in responding. 300.Pp 301There are many adjustable parameters that control various aspects 302of the 303.Nx 304TCP behavior; these parameters are documented in 305.Xr sysctl 7 , 306and they include: 307.Bl -bullet -compact 308.It 309RFC 1323 extensions for high performance 310.It 311Send/receive buffer sizes 312.It 313Default maximum segment size (MSS) 314.It 315SYN cache parameters 316.It 317Hughes/Touch/Heidemann Congestion Window Monitoring algorithm 318.It 319Keepalive parameters 320.It 321newReno algorithm for congestion control 322.It 323Logging of connection refusals 324.It 325RST packet rate limits 326.It 327SACK (Selective Acknowledgment) 328.It 329ECN (Explicit Congestion Notification) 330.It 331Congestion window increase methods; the traditional packet counting or 332RFC 3465 Appropriate Byte Counting 333.It 334RFC 3390: Increased initial window size 335.El 336.Sh DIAGNOSTICS 337A socket operation may fail with one of the following errors returned: 338.Bl -tag -width [EADDRNOTAVAIL] 339.It Bq Er EISCONN 340when trying to establish a connection on a socket which 341already has one; 342.It Bq Er ENOBUFS 343when the system runs out of memory for 344an internal data structure; 345.It Bq Er ETIMEDOUT 346when a connection was dropped 347due to excessive retransmissions; 348.It Bq Er ECONNRESET 349when the remote peer 350forces the connection to be closed; 351.It Bq Er ECONNREFUSED 352when the remote 353peer actively refuses connection establishment (usually because 354no process is listening to the port); 355.It Bq Er EADDRINUSE 356when an attempt 357is made to create a socket with a port which has already been 358allocated; 359.It Bq Er EADDRNOTAVAIL 360when an attempt is made to create a 361socket with a network address for which no network interface 362exists. 363.El 364.Sh SEE ALSO 365.Xr getsockopt 2 , 366.Xr socket 2 , 367.Xr inet 4 , 368.Xr inet6 4 , 369.Xr intro 4 , 370.Xr ip 4 , 371.Xr ip6 4 , 372.Xr sysctl 7 373.Rs 374.%R RFC 375.%N 793 376.%D September 1981 377.%T "Transmission Control Protocol" 378.Re 379.Rs 380.%R RFC 381.%N 1122 382.%D October 1989 383.%T "Requirements for Internet Hosts -- Communication Layers" 384.Re 385.Sh HISTORY 386The 387.Nm 388protocol stack appeared in 389.Bx 4.2 . 390