1*b6d38f14Snjoly.\" $NetBSD: btree.3,v 1.13 2012/10/13 15:28:33 njoly Exp $ 22c84ad3aScgd.\" 39f0aa214Scgd.\" Copyright (c) 1990, 1993 49f0aa214Scgd.\" The Regents of the University of California. All rights reserved. 59f0aa214Scgd.\" 69f0aa214Scgd.\" Redistribution and use in source and binary forms, with or without 79f0aa214Scgd.\" modification, are permitted provided that the following conditions 89f0aa214Scgd.\" are met: 99f0aa214Scgd.\" 1. Redistributions of source code must retain the above copyright 109f0aa214Scgd.\" notice, this list of conditions and the following disclaimer. 119f0aa214Scgd.\" 2. Redistributions in binary form must reproduce the above copyright 129f0aa214Scgd.\" notice, this list of conditions and the following disclaimer in the 139f0aa214Scgd.\" documentation and/or other materials provided with the distribution. 14eb7c1594Sagc.\" 3. Neither the name of the University nor the names of its contributors 159f0aa214Scgd.\" may be used to endorse or promote products derived from this software 169f0aa214Scgd.\" without specific prior written permission. 179f0aa214Scgd.\" 189f0aa214Scgd.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 199f0aa214Scgd.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 209f0aa214Scgd.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 219f0aa214Scgd.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 229f0aa214Scgd.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 239f0aa214Scgd.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 249f0aa214Scgd.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 259f0aa214Scgd.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 269f0aa214Scgd.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 279f0aa214Scgd.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 289f0aa214Scgd.\" SUCH DAMAGE. 299f0aa214Scgd.\" 3017140cefScgd.\" @(#)btree.3 8.4 (Berkeley) 8/18/94 319f0aa214Scgd.\" 32f77fff8cSwiz.Dd April 17, 2003 33f77fff8cSwiz.Dt BTREE 3 34f77fff8cSwiz.Os 35f77fff8cSwiz.Sh NAME 36f77fff8cSwiz.Nm btree 37f77fff8cSwiz.Nd btree database access method 38f77fff8cSwiz.Sh SYNOPSIS 39f77fff8cSwiz.In sys/types.h 40f77fff8cSwiz.In db.h 41f77fff8cSwiz.Sh DESCRIPTION 429f0aa214ScgdThe routine 43f77fff8cSwiz.Fn dbopen 449f0aa214Scgdis the library interface to database files. 459f0aa214ScgdOne of the supported file formats is btree files. 469f0aa214ScgdThe general description of the database access methods is in 47f77fff8cSwiz.Xr dbopen 3 , 489f0aa214Scgdthis manual page describes only the btree specific information. 49f77fff8cSwiz.Pp 509f0aa214ScgdThe btree data structure is a sorted, balanced tree structure storing 519f0aa214Scgdassociated key/data pairs. 52f77fff8cSwiz.Pp 539f0aa214ScgdThe btree access method specific data structure provided to 54f77fff8cSwiz.Fn dbopen 55f77fff8cSwizis defined in the 561c3412faSjoerg.In db.h 57f77fff8cSwizinclude file as follows: 58f77fff8cSwiz.Bd -literal 599f0aa214Scgdtypedef struct { 609f0aa214Scgd u_long flags; 619f0aa214Scgd u_int cachesize; 62a6d14e36Scgd int maxkeypage; 639f0aa214Scgd int minkeypage; 64a6d14e36Scgd u_int psize; 659f0aa214Scgd int (*compare)(const DBT *key1, const DBT *key2); 66a6d14e36Scgd size_t (*prefix)(const DBT *key1, const DBT *key2); 67a6d14e36Scgd int lorder; 689f0aa214Scgd} BTREEINFO; 69f77fff8cSwiz.Ed 70f77fff8cSwiz.Pp 719f0aa214ScgdThe elements of this structure are as follows: 720a1b8c0eSwiz.Bl -tag -width maxkeypagex 73f77fff8cSwiz.It Fa flags 74f77fff8cSwizThe flag value is specified by or'ing any of the following values: 75f77fff8cSwiz.Bl -tag -width R_DUP -offset indent 76f77fff8cSwiz.It Dv R_DUP 77f77fff8cSwizPermit duplicate keys in the tree, i.e. permit insertion if the key to 78f77fff8cSwizbe inserted already exists in the tree. 799f0aa214ScgdThe default behavior, as described in 80f77fff8cSwiz.Xr dbopen 3 , 819f0aa214Scgdis to overwrite a matching key when inserting a new key or to fail if 82f77fff8cSwizthe 83f77fff8cSwiz.Dv R_NOOVERWRITE 84f77fff8cSwizflag is specified. 85f77fff8cSwizThe 86f77fff8cSwiz.Dv R_DUP 87f77fff8cSwizflag is overridden by the 88f77fff8cSwiz.Dv R_NOOVERWRITE 89f77fff8cSwizflag, and if the 90f77fff8cSwiz.Dv R_NOOVERWRITE 91f77fff8cSwizflag is specified, attempts to insert duplicate keys into the tree 92f77fff8cSwizwill fail. 93f77fff8cSwiz.Pp 949f0aa214ScgdIf the database contains duplicate keys, the order of retrieval of 959f0aa214Scgdkey/data pairs is undefined if the 96f77fff8cSwiz.Em get 979f0aa214Scgdroutine is used, however, 98f77fff8cSwiz.Em seq 99f77fff8cSwizroutine calls with the 100f77fff8cSwiz.Dv R_CURSOR 101f77fff8cSwizflag set will always return the logical 102f77fff8cSwiz.Dq first 103f77fff8cSwizof any group of duplicate keys. 104f77fff8cSwiz.El 105f77fff8cSwiz.It Fa cachesize 1069f0aa214ScgdA suggested maximum size (in bytes) of the memory cache. 1079f0aa214ScgdThis value is 108f77fff8cSwiz.Em only 109f77fff8cSwizadvisory, and the access method will allocate more memory rather than 110f77fff8cSwizfail. 111f77fff8cSwizSince every search examines the root page of the tree, caching the 112f77fff8cSwizmost recently used pages substantially improves access time. 113f77fff8cSwizIn addition, physical writes are delayed as long as possible, so a 114f77fff8cSwizmoderate cache can reduce the number of I/O operations significantly. 115f77fff8cSwizObviously, using a cache increases (but only increases) the likelihood 116f77fff8cSwizof corruption or lost data if the system crashes while a tree is being 117f77fff8cSwizmodified. 1189f0aa214ScgdIf 119f77fff8cSwiz.Fa cachesize 1209f0aa214Scgdis 0 (no size is specified) a default cache is used. 121f77fff8cSwiz.It Fa maxkeypage 122a6d14e36ScgdThe maximum number of keys which will be stored on any single page. 123a6d14e36ScgdNot currently implemented. 1249f0aa214Scgd.\" The maximum number of keys which will be stored on any single page. 1259f0aa214Scgd.\" Because of the way the btree data structure works, 126f77fff8cSwiz.\" .Fa maxkeypage 1279f0aa214Scgd.\" must always be greater than or equal to 2. 1289f0aa214Scgd.\" If 129f77fff8cSwiz.\" .Fa maxkeypage 1309f0aa214Scgd.\" is 0 (no maximum number of keys is specified) the page fill factor is 1319f0aa214Scgd.\" made as large as possible (which is almost invariably what is wanted). 132f77fff8cSwiz.It Fa minkeypage 1339f0aa214ScgdThe minimum number of keys which will be stored on any single page. 1349f0aa214ScgdThis value is used to determine which keys will be stored on overflow 135f77fff8cSwizpages, i.e., if a key or data item is longer than the pagesize divided 136f77fff8cSwizby the 137f77fff8cSwiz.Fa minkeypage 138f77fff8cSwizvalue, it will be stored on overflow pages instead of in the page 139f77fff8cSwizitself. 1409f0aa214ScgdIf 141f77fff8cSwiz.Fa minkeypage 1429f0aa214Scgdis 0 (no minimum number of keys is specified) a value of 2 is used. 143f77fff8cSwiz.It Fa psize 144f77fff8cSwizPage size is the size (in bytes) of the pages used for nodes in the 145f77fff8cSwiztree. 146a6d14e36ScgdThe minimum page size is 512 bytes and the maximum page size is 64K. 147a6d14e36ScgdIf 148f77fff8cSwiz.Fa psize 149a6d14e36Scgdis 0 (no page size is specified) a page size is chosen based on the 150a6d14e36Scgdunderlying file system I/O block size. 151f77fff8cSwiz.It Fa compare 1529f0aa214ScgdCompare is the key comparison function. 153f77fff8cSwizIt must return an integer less than, equal to, or greater than zero if 154f77fff8cSwizthe first key argument is considered to be respectively less than, 155f77fff8cSwizequal to, or greater than the second key argument. 156f77fff8cSwizThe same comparison function must be used on a given tree every time 157f77fff8cSwizit is opened. 1589f0aa214ScgdIf 159f77fff8cSwiz.Fa compare 160f77fff8cSwizis 161f77fff8cSwiz.Dv NULL 162f77fff8cSwiz(no comparison function is specified), the keys are compared 1639f0aa214Scgdlexically, with shorter keys considered less than longer keys. 164f77fff8cSwiz.It Fa prefix 1659f0aa214ScgdPrefix is the prefix comparison function. 166f77fff8cSwizIf specified, this routine must return the number of bytes of the 167f77fff8cSwizsecond key argument which are necessary to determine that it is 168f77fff8cSwizgreater than the first key argument. 1699f0aa214ScgdIf the keys are equal, the key length should be returned. 170f77fff8cSwizNote, the usefulness of this routine is very data dependent, but, in 171f77fff8cSwizsome data sets can produce significantly reduced tree sizes and search 172f77fff8cSwiztimes. 1739f0aa214ScgdIf 174f77fff8cSwiz.Fa prefix 175f77fff8cSwizis 176f77fff8cSwiz.Dv NULL 177f77fff8cSwiz(no prefix function is specified), 178f77fff8cSwiz.Em and 179f77fff8cSwizno comparison function is specified, a default lexical comparison 180f77fff8cSwizroutine is used. 1819f0aa214ScgdIf 182f77fff8cSwiz.Fa prefix 183f77fff8cSwizis 184f77fff8cSwiz.Dv NULL 185f77fff8cSwizand a comparison routine is specified, no prefix comparison is done. 186f77fff8cSwiz.It Fa lorder 187a6d14e36ScgdThe byte order for integers in the stored database metadata. 188a6d14e36ScgdThe number should represent the order as an integer; for example, 189a6d14e36Scgdbig endian order would be the number 4,321. 190a6d14e36ScgdIf 191f77fff8cSwiz.Fa lorder 192a6d14e36Scgdis 0 (no order is specified) the current host order is used. 193f77fff8cSwiz.El 194f77fff8cSwiz.Pp 195f77fff8cSwizIf the file already exists (and the 196f77fff8cSwiz.Dv O_TRUNC 197f77fff8cSwizflag is not specified), the values specified for the parameters flags, 198f77fff8cSwizlorder and psize are ignored in favor of the values used when the tree 199f77fff8cSwizwas created. 200f77fff8cSwiz.Pp 201f77fff8cSwizForward sequential scans of a tree are from the least key to the 202f77fff8cSwizgreatest. 203f77fff8cSwiz.Pp 204f77fff8cSwizSpace freed up by deleting key/data pairs from the tree is never 205f77fff8cSwizreclaimed, although it is normally made available for reuse. 2069f0aa214ScgdThis means that the btree storage structure is grow-only. 207f77fff8cSwizThe only solutions are to avoid excessive deletions, or to create a 208f77fff8cSwizfresh tree periodically from a scan of an existing one. 209f77fff8cSwiz.Pp 2109f0aa214ScgdSearches, insertions, and deletions in a btree will all complete in 2119f0aa214ScgdO lg base N where base is the average fill factor. 212f77fff8cSwizOften, inserting ordered data into btrees results in a low fill 213f77fff8cSwizfactor. 214f77fff8cSwizThis implementation has been modified to make ordered insertion the 215f77fff8cSwizbest case, resulting in a much better than normal page fill factor. 216f77fff8cSwiz.Sh ERRORS 21717140cefScgdThe 218f77fff8cSwiz.Nm 21917140cefScgdaccess method routines may fail and set 220f77fff8cSwiz.Va errno 22117140cefScgdfor any of the errors specified for the library routine 222f77fff8cSwiz.Xr dbopen 3 . 223f77fff8cSwiz.Sh SEE ALSO 224f77fff8cSwiz.Xr dbopen 3 , 225f77fff8cSwiz.Xr hash 3 , 226f77fff8cSwiz.Xr mpool 3 , 227f77fff8cSwiz.Xr recno 3 228f77fff8cSwiz.Rs 229f77fff8cSwiz.%T "The Ubiquitous B-tree" 230f77fff8cSwiz.%A "Douglas Comer" 231f77fff8cSwiz.%J "ACM Comput. Surv." 232f77fff8cSwiz.%V 2 233f77fff8cSwiz.%N 11 234f77fff8cSwiz.%D June 1979 235f77fff8cSwiz.%P 121-138 236f77fff8cSwiz.Re 237f77fff8cSwiz.Rs 238f77fff8cSwiz.%T "Prefix B-trees" 239f77fff8cSwiz.%A "Bayer" 240f77fff8cSwiz.%A "Unterauer" 241f77fff8cSwiz.%J "ACM Transactions on Database Systems" 242f77fff8cSwiz.%V Vol. 2 243f77fff8cSwiz.%N 1 244f77fff8cSwiz.%D March 1977 245f77fff8cSwiz.%P 11-26 246f77fff8cSwiz.Re 247f77fff8cSwiz.Rs 248f77fff8cSwiz.%B "The Art of Computer Programming Vol. 3: Sorting and Searching" 249f77fff8cSwiz.%A "D.E. Knuth" 250f77fff8cSwiz.%D 1968 251f77fff8cSwiz.%P 471-480 252f77fff8cSwiz.Re 253f77fff8cSwiz.Sh BUGS 2549f0aa214ScgdOnly big and little endian byte order is supported. 255