1.\" $NetBSD: tcp.4,v 1.25 2010/03/22 18:58:31 joerg Exp $ 2.\" $FreeBSD: tcp.4,v 1.11.2.16 2004/02/16 22:21:47 bms Exp $ 3.\" 4.\" Copyright (c) 1983, 1991, 1993 5.\" The Regents of the University of California. All rights reserved. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 3. Neither the name of the University nor the names of its contributors 16.\" may be used to endorse or promote products derived from this software 17.\" without specific prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 22.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 29.\" SUCH DAMAGE. 30.\" 31.\" @(#)tcp.4 8.1 (Berkeley) 6/5/93 32.\" 33.Dd June 19, 2007 34.Dt TCP 4 35.Os 36.Sh NAME 37.Nm tcp 38.Nd Internet Transmission Control Protocol 39.Sh SYNOPSIS 40.In sys/socket.h 41.In netinet/in.h 42.Ft int 43.Fn socket AF_INET SOCK_STREAM 0 44.Ft int 45.Fn socket AF_INET6 SOCK_STREAM 0 46.Sh DESCRIPTION 47The 48.Tn TCP 49provides reliable, flow-controlled, two-way transmission of data. 50It is a byte-stream protocol used to support the 51.Dv SOCK_STREAM 52abstraction. 53.Tn TCP 54uses the standard Internet address format and, in addition, provides 55a per-host collection of 56.Dq port addresses . 57Thus, each address is composed of an Internet address specifying 58the host and network, with a specific 59.Tn TCP 60port on the host identifying the peer entity. 61.Pp 62Sockets using 63.Tn TCP 64are either 65.Dq active 66or 67.Dq passive . 68Active sockets initiate connections to passive 69sockets. 70By default 71.Tn TCP 72sockets are created active; to create a passive socket the 73.Xr listen 2 74system call must be used 75after binding the socket with the 76.Xr bind 2 77system call. 78Only passive sockets may use the 79.Xr accept 2 80call to accept incoming connections. 81Only active sockets may use the 82.Xr connect 2 83call to initiate connections. 84.Pp 85Passive sockets may 86.Dq underspecify 87their location to match incoming connection requests from multiple networks. 88This technique, termed 89.Dq wildcard addressing , 90allows a single 91server to provide service to clients on multiple networks. 92To create a socket which listens on all networks, the Internet 93address 94.Dv INADDR_ANY 95must be bound. 96The 97.Tn TCP 98port may still be specified at this time; if the port is not 99specified the system will assign one. 100Once a connection has been established the socket's address is 101fixed by the peer entity's location. 102The address assigned the socket is the address associated with the 103network interface through which packets are being transmitted and received. 104Normally this address corresponds to the peer entity's network. 105.Pp 106.Tn TCP 107supports a number of socket options which can be set with 108.Xr setsockopt 2 109and tested with 110.Xr getsockopt 2 : 111.Bl -tag -width TCP_KEEPINTVL 112.It Dv TCP_NODELAY 113Under most circumstances, 114.Tn TCP 115sends data when it is presented; 116when outstanding data has not yet been acknowledged, it gathers 117small amounts of output to be sent in a single packet once 118an acknowledgement is received. 119For a small number of clients, such as window systems 120that send a stream of mouse events which receive no replies, 121this packetization may cause significant delays. 122Therefore, 123.Tn TCP 124provides a boolean option, 125.Dv TCP_NODELAY 126(from 127.In netinet/tcp.h , 128to defeat this algorithm. 129.It Dv TCP_MAXSEG 130By default, a sender- and receiver-TCP 131will negotiate among themselves to determine the maximum segment size 132to be used for each connection. 133The 134.Dv TCP_MAXSEG 135option allows the user to determine the result of this negotiation, 136and to reduce it if desired. 137.It Dv TCP_MD5SIG 138This option enables the use of MD5 digests (also known as TCP-MD5) 139on writes to the specified socket. 140In the current release, only outgoing traffic is digested; 141digests on incoming traffic are not verified. 142The current default behavior for the system is to respond to a system 143advertising this option with TCP-MD5; this may change. 144.Pp 145One common use for this in a 146.Nx 147router deployment is to enable 148based routers to interwork with Cisco equipment at peering points. 149Support for this feature conforms to RFC 2385. 150Only IPv4 (AF_INET) sessions are supported. 151.Pp 152In order for this option to function correctly, it is necessary for the 153administrator to add a tcp-md5 key entry to the system's security 154associations database (SADB) using the 155.Xr setkey 8 156utility. 157This entry must have an SPI of 0x1000 and can therefore only be specified 158on a per-host basis at this time. 159.Pp 160If an SADB entry cannot be found for the destination, the outgoing traffic 161will have an invalid digest option prepended, and the following error message 162will be visible on the system console: 163.Em "tcp_signature_compute: SADB lookup failed for %d.%d.%d.%d" . 164.It Dv TCP_KEEPIDLE 165.\" XXX: We always do it. 166.\" When the 167.\" .Dv SO_KEEPALIVE 168.\" option is enabled, 169TCP probes a connection that 170has been idle for some amount of time. 171The default value for this idle period is 4 hours. 172The 173.Dv TCP_KEEPIDLE 174option can be used to affect this value for a given socket, and specifies 175the number of seconds of idle time between keepalive probes. 176This option takes an 177.Vt "unsigned int" 178value, with a value greater than 0. 179.\" range of 1 to N (where N is 180.\" the 181.\" .Xr sysctl 8 182.\" variable 183.\" .Dv net.inet.tcp.keepidle ). 184.\" divided by 185.\" .Dv PR_SLOWHZ 186.\" which is defined in the 187.\" .In sys/protosw.h 188.\" header file). 189.It Dv TCP_KEEPINTVL 190When the 191.Dv SO_KEEPALIVE 192option is enabled, TCP probes a connection that 193has been idle for some amount of time. 194If the remote system does not 195respond to a keepalive probe, TCP retransmits the probe after some 196amount of time. 197The default value for this retransmit interval is 150 seconds. 198The 199.Dv TCP_KEEPINTVL 200option can be used to affect this value for 201a given socket, and specifies the number of seconds to wait before 202retransmitting a keepalive probe. 203This option takes an 204.Vt "unsigned int" 205value, with a value greater than 0. 206.\" range of 1 to N (where N is the 207.\" .Xr sysctl 8 208.\" variable 209.\" .Dv net.inet.tcp.keepintvl ). 210.It Dv TCP_KEEPCNT 211When the 212.Dv SO_KEEPALIVE 213option is enabled, TCP probes a connection that 214has been idle for some amount of time. 215If the remote system does not 216respond to a keepalive probe, TCP retransmits the probe a certain 217number of times before a connection is considered to be broken. 218The default value for this keepalive probe retransmit limit is 8. 219The 220.Dv TCP_KEEPCNT 221option can be used to affect this value for a given socket, 222and specifies the maximum number of keepalive probes to be sent. 223This option takes an 224.Vt "unsigned int" 225value, with a value greater than 0. 226.\" range of 0 to N (where N is the 227.\" .Xr sysctl 8 228.\" variable 229.\" .Dv net.inet.tcp.keepcnt ). 230.It Dv TCP_KEEPINIT 231If a TCP connection cannot be established within some amount of time, 232TCP will time out the connect attempt. 233The default value for this initial connection establishment timeout 234is 150 seconds. 235The 236.Dv TCP_KEEPINIT 237option can be used to affect this initial timeout period for a given 238socket, and specifies the number of seconds to wait before the connect 239attempt is timed out. 240For passive connections, the 241.Dv TCP_KEEPINIT 242option value is inherited from the listening socket. 243This option takes an 244.Vt "unsigned int" 245value, with a value greater than 0. 246.\" range of 0 to N (where N is the 247.\" .Xr sysctl 8 248.\" variable 249.\" .Dv net.inet.tcp.keepinit ). 250.El 251.Pp 252The option level for the 253.Xr setsockopt 2 254call is the protocol number for 255.Tn TCP , 256available from 257.Xr getprotobyname 3 . 258.Pp 259In the historical 260.Bx 261.Tn TCP 262implementation, if the 263.Dv TCP_NODELAY 264option was set on a passive socket, the sockets returned by 265.Xr accept 2 266erroneously did not have the 267.Dv TCP_NODELAY 268option set; the behavior was corrected to inherit 269.Dv TCP_NODELAY 270in 271.Nx 1.6 . 272.Pp 273Options at the 274.Tn IP 275network level may be used with 276.Tn TCP ; 277see 278.Xr ip 4 279or 280.Xr ip6 4 . 281Incoming connection requests that are source-routed are noted, 282and the reverse source route is used in responding. 283.Pp 284There are many adjustable parameters that control various aspects 285of the 286.Nx 287TCP behavior; these parameters are documented in 288.Xr sysctl 7 , 289and they include: 290.Bl -bullet -compact 291.It 292RFC 1323 extensions for high performance 293.It 294Send/receive buffer sizes 295.It 296Default maximum segment size (MSS) 297.It 298SYN cache parameters 299.It 300Initial window size 301.It 302Hughes/Touch/Heidemann Congestion Window Monitoring algorithm 303.It 304Keepalive parameters 305.It 306newReno algorithm for congestion control 307.It 308Logging of connection refusals 309.It 310RST packet rate limits 311.It 312SACK (Selective Acknowledgment) 313.It 314ECN (Explicit Congestion Notification) 315.It 316Congestion window increase methods; the traditional packet counting or 317RFC 3465 Appropriate Byte Counting 318.El 319.Sh DIAGNOSTICS 320A socket operation may fail with one of the following errors returned: 321.Bl -tag -width [EADDRNOTAVAIL] 322.It Bq Er EISCONN 323when trying to establish a connection on a socket which 324already has one; 325.It Bq Er ENOBUFS 326when the system runs out of memory for 327an internal data structure; 328.It Bq Er ETIMEDOUT 329when a connection was dropped 330due to excessive retransmissions; 331.It Bq Er ECONNRESET 332when the remote peer 333forces the connection to be closed; 334.It Bq Er ECONNREFUSED 335when the remote 336peer actively refuses connection establishment (usually because 337no process is listening to the port); 338.It Bq Er EADDRINUSE 339when an attempt 340is made to create a socket with a port which has already been 341allocated; 342.It Bq Er EADDRNOTAVAIL 343when an attempt is made to create a 344socket with a network address for which no network interface 345exists. 346.El 347.Sh SEE ALSO 348.Xr getsockopt 2 , 349.Xr socket 2 , 350.Xr inet 4 , 351.Xr inet6 4 , 352.Xr intro 4 , 353.Xr ip 4 , 354.Xr ip6 4 , 355.Xr sysctl 7 356.Rs 357.%R RFC 358.%N 793 359.%D September 1981 360.%T "Transmission Control Protocol" 361.Re 362.Rs 363.%R RFC 364.%N 1122 365.%D October 1989 366.%T "Requirements for Internet Hosts -- Communication Layers" 367.Re 368.Sh HISTORY 369The 370.Nm 371protocol stack appeared in 372.Bx 4.2 . 373