xref: /netbsd-src/lib/libc/db/man/btree.3 (revision b6d38f1490762d34462cc54455270ba1ef27f010)
1*b6d38f14Snjoly.\"	$NetBSD: btree.3,v 1.13 2012/10/13 15:28:33 njoly Exp $
22c84ad3aScgd.\"
39f0aa214Scgd.\" Copyright (c) 1990, 1993
49f0aa214Scgd.\"	The Regents of the University of California.  All rights reserved.
59f0aa214Scgd.\"
69f0aa214Scgd.\" Redistribution and use in source and binary forms, with or without
79f0aa214Scgd.\" modification, are permitted provided that the following conditions
89f0aa214Scgd.\" are met:
99f0aa214Scgd.\" 1. Redistributions of source code must retain the above copyright
109f0aa214Scgd.\"    notice, this list of conditions and the following disclaimer.
119f0aa214Scgd.\" 2. Redistributions in binary form must reproduce the above copyright
129f0aa214Scgd.\"    notice, this list of conditions and the following disclaimer in the
139f0aa214Scgd.\"    documentation and/or other materials provided with the distribution.
14eb7c1594Sagc.\" 3. Neither the name of the University nor the names of its contributors
159f0aa214Scgd.\"    may be used to endorse or promote products derived from this software
169f0aa214Scgd.\"    without specific prior written permission.
179f0aa214Scgd.\"
189f0aa214Scgd.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
199f0aa214Scgd.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
209f0aa214Scgd.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
219f0aa214Scgd.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
229f0aa214Scgd.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
239f0aa214Scgd.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
249f0aa214Scgd.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
259f0aa214Scgd.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
269f0aa214Scgd.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
279f0aa214Scgd.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
289f0aa214Scgd.\" SUCH DAMAGE.
299f0aa214Scgd.\"
3017140cefScgd.\"	@(#)btree.3	8.4 (Berkeley) 8/18/94
319f0aa214Scgd.\"
32f77fff8cSwiz.Dd April 17, 2003
33f77fff8cSwiz.Dt BTREE 3
34f77fff8cSwiz.Os
35f77fff8cSwiz.Sh NAME
36f77fff8cSwiz.Nm btree
37f77fff8cSwiz.Nd btree database access method
38f77fff8cSwiz.Sh SYNOPSIS
39f77fff8cSwiz.In sys/types.h
40f77fff8cSwiz.In db.h
41f77fff8cSwiz.Sh DESCRIPTION
429f0aa214ScgdThe routine
43f77fff8cSwiz.Fn dbopen
449f0aa214Scgdis the library interface to database files.
459f0aa214ScgdOne of the supported file formats is btree files.
469f0aa214ScgdThe general description of the database access methods is in
47f77fff8cSwiz.Xr dbopen 3 ,
489f0aa214Scgdthis manual page describes only the btree specific information.
49f77fff8cSwiz.Pp
509f0aa214ScgdThe btree data structure is a sorted, balanced tree structure storing
519f0aa214Scgdassociated key/data pairs.
52f77fff8cSwiz.Pp
539f0aa214ScgdThe btree access method specific data structure provided to
54f77fff8cSwiz.Fn dbopen
55f77fff8cSwizis defined in the
561c3412faSjoerg.In db.h
57f77fff8cSwizinclude file as follows:
58f77fff8cSwiz.Bd -literal
599f0aa214Scgdtypedef struct {
609f0aa214Scgd	u_long flags;
619f0aa214Scgd	u_int cachesize;
62a6d14e36Scgd	int maxkeypage;
639f0aa214Scgd	int minkeypage;
64a6d14e36Scgd	u_int psize;
659f0aa214Scgd	int (*compare)(const DBT *key1, const DBT *key2);
66a6d14e36Scgd	size_t (*prefix)(const DBT *key1, const DBT *key2);
67a6d14e36Scgd	int lorder;
689f0aa214Scgd} BTREEINFO;
69f77fff8cSwiz.Ed
70f77fff8cSwiz.Pp
719f0aa214ScgdThe elements of this structure are as follows:
720a1b8c0eSwiz.Bl -tag -width maxkeypagex
73f77fff8cSwiz.It Fa flags
74f77fff8cSwizThe flag value is specified by or'ing any of the following values:
75f77fff8cSwiz.Bl -tag -width R_DUP -offset indent
76f77fff8cSwiz.It Dv R_DUP
77f77fff8cSwizPermit duplicate keys in the tree, i.e. permit insertion if the key to
78f77fff8cSwizbe inserted already exists in the tree.
799f0aa214ScgdThe default behavior, as described in
80f77fff8cSwiz.Xr dbopen 3 ,
819f0aa214Scgdis to overwrite a matching key when inserting a new key or to fail if
82f77fff8cSwizthe
83f77fff8cSwiz.Dv R_NOOVERWRITE
84f77fff8cSwizflag is specified.
85f77fff8cSwizThe
86f77fff8cSwiz.Dv R_DUP
87f77fff8cSwizflag is overridden by the
88f77fff8cSwiz.Dv R_NOOVERWRITE
89f77fff8cSwizflag, and if the
90f77fff8cSwiz.Dv R_NOOVERWRITE
91f77fff8cSwizflag is specified, attempts to insert duplicate keys into the tree
92f77fff8cSwizwill fail.
93f77fff8cSwiz.Pp
949f0aa214ScgdIf the database contains duplicate keys, the order of retrieval of
959f0aa214Scgdkey/data pairs is undefined if the
96f77fff8cSwiz.Em get
979f0aa214Scgdroutine is used, however,
98f77fff8cSwiz.Em seq
99f77fff8cSwizroutine calls with the
100f77fff8cSwiz.Dv R_CURSOR
101f77fff8cSwizflag set will always return the logical
102f77fff8cSwiz.Dq first
103f77fff8cSwizof any group of duplicate keys.
104f77fff8cSwiz.El
105f77fff8cSwiz.It Fa cachesize
1069f0aa214ScgdA suggested maximum size (in bytes) of the memory cache.
1079f0aa214ScgdThis value is
108f77fff8cSwiz.Em only
109f77fff8cSwizadvisory, and the access method will allocate more memory rather than
110f77fff8cSwizfail.
111f77fff8cSwizSince every search examines the root page of the tree, caching the
112f77fff8cSwizmost recently used pages substantially improves access time.
113f77fff8cSwizIn addition, physical writes are delayed as long as possible, so a
114f77fff8cSwizmoderate cache can reduce the number of I/O operations significantly.
115f77fff8cSwizObviously, using a cache increases (but only increases) the likelihood
116f77fff8cSwizof corruption or lost data if the system crashes while a tree is being
117f77fff8cSwizmodified.
1189f0aa214ScgdIf
119f77fff8cSwiz.Fa cachesize
1209f0aa214Scgdis 0 (no size is specified) a default cache is used.
121f77fff8cSwiz.It Fa maxkeypage
122a6d14e36ScgdThe maximum number of keys which will be stored on any single page.
123a6d14e36ScgdNot currently implemented.
1249f0aa214Scgd.\" The maximum number of keys which will be stored on any single page.
1259f0aa214Scgd.\" Because of the way the btree data structure works,
126f77fff8cSwiz.\" .Fa maxkeypage
1279f0aa214Scgd.\" must always be greater than or equal to 2.
1289f0aa214Scgd.\" If
129f77fff8cSwiz.\" .Fa maxkeypage
1309f0aa214Scgd.\" is 0 (no maximum number of keys is specified) the page fill factor is
1319f0aa214Scgd.\" made as large as possible (which is almost invariably what is wanted).
132f77fff8cSwiz.It Fa minkeypage
1339f0aa214ScgdThe minimum number of keys which will be stored on any single page.
1349f0aa214ScgdThis value is used to determine which keys will be stored on overflow
135f77fff8cSwizpages, i.e., if a key or data item is longer than the pagesize divided
136f77fff8cSwizby the
137f77fff8cSwiz.Fa minkeypage
138f77fff8cSwizvalue, it will be stored on overflow pages instead of in the page
139f77fff8cSwizitself.
1409f0aa214ScgdIf
141f77fff8cSwiz.Fa minkeypage
1429f0aa214Scgdis 0 (no minimum number of keys is specified) a value of 2 is used.
143f77fff8cSwiz.It Fa psize
144f77fff8cSwizPage size is the size (in bytes) of the pages used for nodes in the
145f77fff8cSwiztree.
146a6d14e36ScgdThe minimum page size is 512 bytes and the maximum page size is 64K.
147a6d14e36ScgdIf
148f77fff8cSwiz.Fa psize
149a6d14e36Scgdis 0 (no page size is specified) a page size is chosen based on the
150a6d14e36Scgdunderlying file system I/O block size.
151f77fff8cSwiz.It Fa compare
1529f0aa214ScgdCompare is the key comparison function.
153f77fff8cSwizIt must return an integer less than, equal to, or greater than zero if
154f77fff8cSwizthe first key argument is considered to be respectively less than,
155f77fff8cSwizequal to, or greater than the second key argument.
156f77fff8cSwizThe same comparison function must be used on a given tree every time
157f77fff8cSwizit is opened.
1589f0aa214ScgdIf
159f77fff8cSwiz.Fa compare
160f77fff8cSwizis
161f77fff8cSwiz.Dv NULL
162f77fff8cSwiz(no comparison function is specified), the keys are compared
1639f0aa214Scgdlexically, with shorter keys considered less than longer keys.
164f77fff8cSwiz.It Fa prefix
1659f0aa214ScgdPrefix is the prefix comparison function.
166f77fff8cSwizIf specified, this routine must return the number of bytes of the
167f77fff8cSwizsecond key argument which are necessary to determine that it is
168f77fff8cSwizgreater than the first key argument.
1699f0aa214ScgdIf the keys are equal, the key length should be returned.
170f77fff8cSwizNote, the usefulness of this routine is very data dependent, but, in
171f77fff8cSwizsome data sets can produce significantly reduced tree sizes and search
172f77fff8cSwiztimes.
1739f0aa214ScgdIf
174f77fff8cSwiz.Fa prefix
175f77fff8cSwizis
176f77fff8cSwiz.Dv NULL
177f77fff8cSwiz(no prefix function is specified),
178f77fff8cSwiz.Em and
179f77fff8cSwizno comparison function is specified, a default lexical comparison
180f77fff8cSwizroutine is used.
1819f0aa214ScgdIf
182f77fff8cSwiz.Fa prefix
183f77fff8cSwizis
184f77fff8cSwiz.Dv NULL
185f77fff8cSwizand a comparison routine is specified, no prefix comparison is done.
186f77fff8cSwiz.It Fa lorder
187a6d14e36ScgdThe byte order for integers in the stored database metadata.
188a6d14e36ScgdThe number should represent the order as an integer; for example,
189a6d14e36Scgdbig endian order would be the number 4,321.
190a6d14e36ScgdIf
191f77fff8cSwiz.Fa lorder
192a6d14e36Scgdis 0 (no order is specified) the current host order is used.
193f77fff8cSwiz.El
194f77fff8cSwiz.Pp
195f77fff8cSwizIf the file already exists (and the
196f77fff8cSwiz.Dv O_TRUNC
197f77fff8cSwizflag is not specified), the values specified for the parameters flags,
198f77fff8cSwizlorder and psize are ignored in favor of the values used when the tree
199f77fff8cSwizwas created.
200f77fff8cSwiz.Pp
201f77fff8cSwizForward sequential scans of a tree are from the least key to the
202f77fff8cSwizgreatest.
203f77fff8cSwiz.Pp
204f77fff8cSwizSpace freed up by deleting key/data pairs from the tree is never
205f77fff8cSwizreclaimed, although it is normally made available for reuse.
2069f0aa214ScgdThis means that the btree storage structure is grow-only.
207f77fff8cSwizThe only solutions are to avoid excessive deletions, or to create a
208f77fff8cSwizfresh tree periodically from a scan of an existing one.
209f77fff8cSwiz.Pp
2109f0aa214ScgdSearches, insertions, and deletions in a btree will all complete in
2119f0aa214ScgdO lg base N where base is the average fill factor.
212f77fff8cSwizOften, inserting ordered data into btrees results in a low fill
213f77fff8cSwizfactor.
214f77fff8cSwizThis implementation has been modified to make ordered insertion the
215f77fff8cSwizbest case, resulting in a much better than normal page fill factor.
216f77fff8cSwiz.Sh ERRORS
21717140cefScgdThe
218f77fff8cSwiz.Nm
21917140cefScgdaccess method routines may fail and set
220f77fff8cSwiz.Va errno
22117140cefScgdfor any of the errors specified for the library routine
222f77fff8cSwiz.Xr dbopen 3 .
223f77fff8cSwiz.Sh SEE ALSO
224f77fff8cSwiz.Xr dbopen 3 ,
225f77fff8cSwiz.Xr hash 3 ,
226f77fff8cSwiz.Xr mpool 3 ,
227f77fff8cSwiz.Xr recno 3
228f77fff8cSwiz.Rs
229f77fff8cSwiz.%T "The Ubiquitous B-tree"
230f77fff8cSwiz.%A "Douglas Comer"
231f77fff8cSwiz.%J "ACM Comput. Surv."
232f77fff8cSwiz.%V 2
233f77fff8cSwiz.%N 11
234f77fff8cSwiz.%D June 1979
235f77fff8cSwiz.%P 121-138
236f77fff8cSwiz.Re
237f77fff8cSwiz.Rs
238f77fff8cSwiz.%T "Prefix B-trees"
239f77fff8cSwiz.%A "Bayer"
240f77fff8cSwiz.%A "Unterauer"
241f77fff8cSwiz.%J "ACM Transactions on Database Systems"
242f77fff8cSwiz.%V Vol. 2
243f77fff8cSwiz.%N 1
244f77fff8cSwiz.%D March 1977
245f77fff8cSwiz.%P 11-26
246f77fff8cSwiz.Re
247f77fff8cSwiz.Rs
248f77fff8cSwiz.%B "The Art of Computer Programming Vol. 3: Sorting and Searching"
249f77fff8cSwiz.%A "D.E. Knuth"
250f77fff8cSwiz.%D 1968
251f77fff8cSwiz.%P 471-480
252f77fff8cSwiz.Re
253f77fff8cSwiz.Sh BUGS
2549f0aa214ScgdOnly big and little endian byte order is supported.
255