xref: /netbsd-src/share/man/man4/tcp.4 (revision b1c86f5f087524e68db12794ee9c3e3da1ab17a0)
1.\"	$NetBSD: tcp.4,v 1.25 2010/03/22 18:58:31 joerg Exp $
2.\"	$FreeBSD: tcp.4,v 1.11.2.16 2004/02/16 22:21:47 bms Exp $
3.\"
4.\" Copyright (c) 1983, 1991, 1993
5.\"	The Regents of the University of California.  All rights reserved.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\" 3. Neither the name of the University nor the names of its contributors
16.\"    may be used to endorse or promote products derived from this software
17.\"    without specific prior written permission.
18.\"
19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
29.\" SUCH DAMAGE.
30.\"
31.\"     @(#)tcp.4	8.1 (Berkeley) 6/5/93
32.\"
33.Dd June 19, 2007
34.Dt TCP 4
35.Os
36.Sh NAME
37.Nm tcp
38.Nd Internet Transmission Control Protocol
39.Sh SYNOPSIS
40.In sys/socket.h
41.In netinet/in.h
42.Ft int
43.Fn socket AF_INET SOCK_STREAM 0
44.Ft int
45.Fn socket AF_INET6 SOCK_STREAM 0
46.Sh DESCRIPTION
47The
48.Tn TCP
49provides reliable, flow-controlled, two-way transmission of data.
50It is a byte-stream protocol used to support the
51.Dv SOCK_STREAM
52abstraction.
53.Tn TCP
54uses the standard Internet address format and, in addition, provides
55a per-host collection of
56.Dq port addresses .
57Thus, each address is composed of an Internet address specifying
58the host and network, with a specific
59.Tn TCP
60port on the host identifying the peer entity.
61.Pp
62Sockets using
63.Tn TCP
64are either
65.Dq active
66or
67.Dq passive .
68Active sockets initiate connections to passive
69sockets.
70By default
71.Tn TCP
72sockets are created active; to create a passive socket the
73.Xr listen 2
74system call must be used
75after binding the socket with the
76.Xr bind 2
77system call.
78Only passive sockets may use the
79.Xr accept 2
80call to accept incoming connections.
81Only active sockets may use the
82.Xr connect 2
83call to initiate connections.
84.Pp
85Passive sockets may
86.Dq underspecify
87their location to match incoming connection requests from multiple networks.
88This technique, termed
89.Dq wildcard addressing ,
90allows a single
91server to provide service to clients on multiple networks.
92To create a socket which listens on all networks, the Internet
93address
94.Dv INADDR_ANY
95must be bound.
96The
97.Tn TCP
98port may still be specified at this time; if the port is not
99specified the system will assign one.
100Once a connection has been established the socket's address is
101fixed by the peer entity's location.
102The address assigned the socket is the address associated with the
103network interface through which packets are being transmitted and received.
104Normally this address corresponds to the peer entity's network.
105.Pp
106.Tn TCP
107supports a number of socket options which can be set with
108.Xr setsockopt 2
109and tested with
110.Xr getsockopt 2 :
111.Bl -tag -width TCP_KEEPINTVL
112.It Dv TCP_NODELAY
113Under most circumstances,
114.Tn TCP
115sends data when it is presented;
116when outstanding data has not yet been acknowledged, it gathers
117small amounts of output to be sent in a single packet once
118an acknowledgement is received.
119For a small number of clients, such as window systems
120that send a stream of mouse events which receive no replies,
121this packetization may cause significant delays.
122Therefore,
123.Tn TCP
124provides a boolean option,
125.Dv TCP_NODELAY
126(from
127.In netinet/tcp.h ,
128to defeat this algorithm.
129.It Dv TCP_MAXSEG
130By default, a sender- and receiver-TCP
131will negotiate among themselves to determine the maximum segment size
132to be used for each connection.
133The
134.Dv TCP_MAXSEG
135option allows the user to determine the result of this negotiation,
136and to reduce it if desired.
137.It Dv TCP_MD5SIG
138This option enables the use of MD5 digests (also known as TCP-MD5)
139on writes to the specified socket.
140In the current release, only outgoing traffic is digested;
141digests on incoming traffic are not verified.
142The current default behavior for the system is to respond to a system
143advertising this option with TCP-MD5; this may change.
144.Pp
145One common use for this in a
146.Nx
147router deployment is to enable
148based routers to interwork with Cisco equipment at peering points.
149Support for this feature conforms to RFC 2385.
150Only IPv4 (AF_INET) sessions are supported.
151.Pp
152In order for this option to function correctly, it is necessary for the
153administrator to add a tcp-md5 key entry to the system's security
154associations database (SADB) using the
155.Xr setkey 8
156utility.
157This entry must have an SPI of 0x1000 and can therefore only be specified
158on a per-host basis at this time.
159.Pp
160If an SADB entry cannot be found for the destination, the outgoing traffic
161will have an invalid digest option prepended, and the following error message
162will be visible on the system console:
163.Em "tcp_signature_compute: SADB lookup failed for %d.%d.%d.%d" .
164.It Dv TCP_KEEPIDLE
165.\" XXX: We always do it.
166.\" When the
167.\" .Dv SO_KEEPALIVE
168.\" option is enabled,
169TCP probes a connection that
170has been idle for some amount of time.
171The default value for this idle period is 4 hours.
172The
173.Dv TCP_KEEPIDLE
174option can be used to affect this value for a given socket, and specifies
175the number of seconds of idle time between keepalive probes.
176This option takes an
177.Vt "unsigned int"
178value, with a value greater than 0.
179.\" range of 1 to N (where N is
180.\" the
181.\" .Xr sysctl 8
182.\" variable
183.\" .Dv net.inet.tcp.keepidle ).
184.\" divided by
185.\" .Dv  PR_SLOWHZ
186.\" which is defined in the
187.\" .In sys/protosw.h
188.\" header file).
189.It Dv TCP_KEEPINTVL
190When the
191.Dv SO_KEEPALIVE
192option is enabled, TCP probes a connection that
193has been idle for some amount of time.
194If the remote system does not
195respond to a keepalive probe, TCP retransmits the probe after some
196amount of time.
197The default value for this retransmit interval is 150 seconds.
198The
199.Dv TCP_KEEPINTVL
200option can be used to affect this value for
201a given socket, and specifies the number of seconds to wait before
202retransmitting a keepalive probe.
203This option takes an
204.Vt "unsigned int"
205value, with a value greater than 0.
206.\" range of 1 to N (where N is the
207.\" .Xr sysctl 8
208.\" variable
209.\" .Dv net.inet.tcp.keepintvl ).
210.It Dv TCP_KEEPCNT
211When the
212.Dv SO_KEEPALIVE
213option is enabled, TCP probes a connection that
214has been idle for some amount of time.
215If the remote system does not
216respond to a keepalive probe, TCP retransmits the probe a certain
217number of times before a connection is considered to be broken.
218The default value for this keepalive probe retransmit limit is 8.
219The
220.Dv TCP_KEEPCNT
221option can be used to affect this value for a given socket,
222and specifies the maximum number of keepalive probes to be sent.
223This option takes an
224.Vt "unsigned int"
225value, with a value greater than 0.
226.\" range of 0 to N (where N is the
227.\" .Xr sysctl 8
228.\" variable
229.\" .Dv net.inet.tcp.keepcnt ).
230.It Dv TCP_KEEPINIT
231If a TCP connection cannot be established within some amount of time,
232TCP will time out the connect attempt.
233The default value for this initial connection establishment timeout
234is 150 seconds.
235The
236.Dv TCP_KEEPINIT
237option can be used to affect this initial timeout period for a given
238socket, and specifies the number of seconds to wait before the connect
239attempt is timed out.
240For passive connections, the
241.Dv TCP_KEEPINIT
242option value is inherited from the listening socket.
243This option takes an
244.Vt "unsigned int"
245value, with a value greater than 0.
246.\" range of 0 to N (where N is the
247.\" .Xr sysctl 8
248.\" variable
249.\" .Dv net.inet.tcp.keepinit ).
250.El
251.Pp
252The option level for the
253.Xr setsockopt 2
254call is the protocol number for
255.Tn TCP ,
256available from
257.Xr getprotobyname 3 .
258.Pp
259In the historical
260.Bx
261.Tn TCP
262implementation, if the
263.Dv TCP_NODELAY
264option was set on a passive socket, the sockets returned by
265.Xr accept 2
266erroneously did not have the
267.Dv TCP_NODELAY
268option set; the behavior was corrected to inherit
269.Dv TCP_NODELAY
270in
271.Nx 1.6 .
272.Pp
273Options at the
274.Tn IP
275network level may be used with
276.Tn TCP ;
277see
278.Xr ip 4
279or
280.Xr ip6 4 .
281Incoming connection requests that are source-routed are noted,
282and the reverse source route is used in responding.
283.Pp
284There are many adjustable parameters that control various aspects
285of the
286.Nx
287TCP behavior; these parameters are documented in
288.Xr sysctl 7 ,
289and they include:
290.Bl -bullet -compact
291.It
292RFC 1323 extensions for high performance
293.It
294Send/receive buffer sizes
295.It
296Default maximum segment size (MSS)
297.It
298SYN cache parameters
299.It
300Initial window size
301.It
302Hughes/Touch/Heidemann Congestion Window Monitoring algorithm
303.It
304Keepalive parameters
305.It
306newReno algorithm for congestion control
307.It
308Logging of connection refusals
309.It
310RST packet rate limits
311.It
312SACK (Selective Acknowledgment)
313.It
314ECN (Explicit Congestion Notification)
315.It
316Congestion window increase methods; the traditional packet counting or
317RFC 3465 Appropriate Byte Counting
318.El
319.Sh DIAGNOSTICS
320A socket operation may fail with one of the following errors returned:
321.Bl -tag -width [EADDRNOTAVAIL]
322.It Bq Er EISCONN
323when trying to establish a connection on a socket which
324already has one;
325.It Bq Er ENOBUFS
326when the system runs out of memory for
327an internal data structure;
328.It Bq Er ETIMEDOUT
329when a connection was dropped
330due to excessive retransmissions;
331.It Bq Er ECONNRESET
332when the remote peer
333forces the connection to be closed;
334.It Bq Er ECONNREFUSED
335when the remote
336peer actively refuses connection establishment (usually because
337no process is listening to the port);
338.It Bq Er EADDRINUSE
339when an attempt
340is made to create a socket with a port which has already been
341allocated;
342.It Bq Er EADDRNOTAVAIL
343when an attempt is made to create a
344socket with a network address for which no network interface
345exists.
346.El
347.Sh SEE ALSO
348.Xr getsockopt 2 ,
349.Xr socket 2 ,
350.Xr inet 4 ,
351.Xr inet6 4 ,
352.Xr intro 4 ,
353.Xr ip 4 ,
354.Xr ip6 4 ,
355.Xr sysctl 7
356.Rs
357.%R RFC
358.%N 793
359.%D September 1981
360.%T "Transmission Control Protocol"
361.Re
362.Rs
363.%R RFC
364.%N 1122
365.%D October 1989
366.%T "Requirements for Internet Hosts -- Communication Layers"
367.Re
368.Sh HISTORY
369The
370.Nm
371protocol stack appeared in
372.Bx 4.2 .
373