All rights reserved.
%sccs.include.redist.man%
@(#)dbopen.3 5.7 (Berkeley) 11/08/90
#include <sys/types.h> #include <db.h> DB * btree_open(const char *file, int flags, int mode, const BTREEINFO * private); DB * hash_open(const char *file, int flags, int mode, const HASHINFO * private); DB * recno_open(const char *file, int flags, int mode, const RECNOINFO * private);
All access methods store any database metadata in architecture-independent ormats. (Obviously, portability of the data forming the key/data pairs is the concern of the application program.)
Each routine opens file for reading and/or writing. Databases never intended to be preserved on disk may be created by setting the file parameter to NULL. The flags and mode arguments are as specified to the open (2) routine, however, only the O_CREAT, O_EXCL, O_RDONLY, O_RDWR, O_TRUNC and O_WRONLY flags are meaningful. If entries are to be inserted into or deleted from a database file, it must be opened with the O_RDWR flag. The argument private is a pointer to a private, access-method specific structure described below.
The open routines return a pointer to a DB structure on success and NULL on error. The DB contains at least the following fields:
typedef struct {
void *private;
int (*close)(), (*delete)(), (*get)(), (*put)(), (*seq)(), (*sync)();
The elements of this structure consist of a pointer to a private, access method dependent structure and a set of routines which perform various functions. All of these routines take a pointer to a structure as returned by one of the open routines, one or more pointers to key/data structures, and, optionally, a flag value.
private A pointer to an internal structure private to the access method.
close A pointer to a routine to flush any cached information to disk, free any allocated resources, and close the database file. Its function prototype is: close(const DB *db); Since key/data pairs may be cached in memory, failing to close the file with the close routine may result in inconsistent or lost information. The close routine returns 0 on error and 1 on success.
delete A pointer to a routine to remove key/data pairs from the database. Its function prototype is: delete(const DB *db, const DBT *key); The delete routine returns 0 on error, 1 on success, and -1 if the specified key was not in the file.
get A pointer to a routine which is the interface for keyed retrieval from the database. Its function prototype is: get(const DB *db, DBT *key, DBT *data); The address and length of the data associated with the specified key are returned in the structure referenced by data . The get routine returns 0 on error, 1 on success, and -1 if the key was not in the file.
put A pointer to a routine to store key/data pairs in the database. Its function prototype is: put(const DB *db, const DBT *key, const DBT *data, u_long flag); By default, if the key already exists in the file, it is replaced.
The parameter flag may be set to one of the following values:
R_APPEND Append the data immediately after the data referenced by key , creating a new record. (This implies that the access method is able to create new keys itself, i.e. the keys are ordered and independent, for example, record numbers. Applicable only to the RECNO access method.)
R_DUP If the key already exists in the file, create a duplicate key/data pair. (This implies that keys are not required to be unique. Applicable only to the BTREE access method.)
R_INSERT Insert the data immediately before the data referenced by key , creating a new record. (This implies that the access method is able to create new keys itself, i.e. the keys are ordered and independent, for example, record numbers. Applicable only to the RECNO access method.)
R_NOOVERWRITE Enter the new key/data pair only if the key does not previously exist.
seq A pointer to a routine which is the interface for sequential retrieval from the database. Its function prototype is: seq(const DB *db, DBT *key, DBT *data, u_long flag); The address and length of the key are returned in the structure referenced by key , and the address and length of the data are returned in the structure referenced by data .
The flag value must be set to one of the following values:
R_CURSOR Set the ``cursor'' to the location of the specified key. (This implies that the access method has a implicit order which does not change. Applicable only to the BTREE and RECNO access methods.)
R_FIRST The first key of the hash table is returned.
R_LAST The last key of the hash table is returned. (This implies that the access method has a implicit order which does not change. Applicable only to the BTREE and RECNO access methods.)
R_NEXT Retrieve the record immediately after the most recently requested record. The order of retrieval of duplicate records is undefined.
R_PREV Retrieve the record immediately before the most recently requested record. The order of retrieval of duplicate records is undefined. (This implies that the access method has a implicit order which does not change. Applicable only to the BTREE and RECNO access methods.)
sync A pointer to a routine to flush any cached information to disk. Its function prototype is: sync(const DB *db); If the database is in memory only, the sync routine is a no-op. The sync routine returns 0 on error and 1 on success.
Both keys and data are represented by the following data structure:typedef struct {
u_char *data;
size_t size;
The elements of the DBT structure are defined as follows:
data A pointer to a byte string.
size The length of the byte string.
Key/data strings must fit into available memory.
The private data structure provided to btree_open is as follows:
typedef struct {
u_int psize;
u_int cachesize;
int (*compare)(const void *, const void *);
The elements of this structure are defined as follows:
psize Page size is the size in bytes of the pages used for nodes in the tree. If the file already exists, the specified value is ignored and the value specified when the tree was created is used. If psize is zero, an appropriate page size is chosen (based on the system memory and/or file system constraints), but will never be less than 512 bytes.
cachesize A suggested maximum size, in bytes, of the memory cache. Setting this value to zero specifies that an appropriate amount of memory should be used. Since every search examines the root page of the tree, caching the most recently used pages substantially improves access time. In addition, physical writes are delayed as long as possible, so a moderate cache can reduce the number of I/O operations significantly. Obviously, using a cache increases the likelihood of corruption or lost data if the system crashes while a tree is being modified, however, caching 10 pages decreases by between two and three orders of magnitude the creation time of a large tree.
compare Compare is a user defined comparison function. It must return an integer less than, equal to, or greater than zero if the first argument is considered to be respectively less than, equal to, or greater than the second. The same comparison function must be used on a given tree every time it is opened. If no comparison function is specified, strcmp (3) is used.
If the pointer to the private data structure is NULL, the btree_open routine will use appropriate values.
If the database file already exists, and the O_TRUNC flag is not specified to btree_open , the parameter psize ignored.
Key structures may reference byte strings of slightly less than one-half the tree's page size only (see psize ). Data structures may reference byte strings of essentially unlimited length.
Searches, insertions, and deletions in a btree are all guaranteed to complete in logarithmic time.
Forward sequential scans of a tree are from the least key to the greatest.
Space freed up by deleting records from a btree is never reclaimed, although it is made available for reuse. The only exception to this is that space occupied by large data items (those greater than one quarter the size of a page) is neither reclaimed nor reused. This means that the btree storage structure is grow-only. The only solutions are to avoid excessive deletions, or to create a fresh tree periodically from a scan of an existing one.
int bsize;
u_int cachesize;
int ffactor;
int nelem;
u_long (*hash)(const void *, const size_t);
The elements of this structure are defined as follows:
bsize Bsize defines the hash table bucket size, and is, by default, 1024 bytes. For tables with large data items, it may be preferable to increase the page size, and, conversely, applications doing exclusively in-memory hashing may want to use a very small bucket size, for example, 256, to minimize hash chain collisions.
cachesize A suggested maximum size, in bytes, of the memory cache. Setting this value to zero specifies that an appropriate amount of memory should be used.
ffactor Ffactor indicates a desired density within the hash table. It is an approximation of the number of keys allowed to accumulate in any one bucket, determining when the hash table grows or shrinks. The default value is 5.
hash Hash is a user defined hash function. Since no hash function performs equally well on all possible data, the user may find that the built-in hash function does poorly on a particular data set. Any user specified hash function should take two arguments, a pointer to a byte string and a length, and return an u_long to be used as the hash value.
nelem Nelem is an estimate of the final size of the hash table. If not set, the default value is 1. If not set or set too low, hash tables will expand gracefully as keys are entered, although a slight performance degradation may be noticed.
If the pointer to the private data structure is NULL, the hash_open routine will use appropriate values.
If the hash table already exists, and the O_TRUNC flag is not specified to hash_open , the parameters bsize , ffactor , and nelem are ignored.
If a hash function is specified, hash_open will attempt to determine if the hash function specified is the same as the one with which the database was created, and will fail if it is not.
Both key and data structures may reference byte strings of essentially unlimited length.
Backward compatible interfaces to the routines described in dbm (3), hsearch (3), and ndbm (3) are provided, however, these interfaces are not compatible with previous file formats.
u_long flags;
u_int cachesize;
size_t reclen;
u_char bval;
The elements of this structure are defined as follows:
flags The flag value is specified by or 'ing any of the following values:
R_FIXEDLEN The records are fixed-length, not byte delimited. The structure element reclen specifies the length of the record, and the structure element bval is used as the pad character.
cachesize A suggested maximum size, in bytes, of the memory cache. Setting this value to zero specifies that an appropriate amount of memory should be used.
reclen The length of a fixed-length record.
bval The delimiting byte to be used to mark the end of a record for variable-length records, and the pad character for fixed-length records.
Variable-length and fixed-length data files require retrieval key structures to reference a byte followed by three u_long numbers. The numbers are used as a record number, a byte offset and a record length, respectively, and the byte is a flag value which indicates the validity of the other fields. This access method does no validity checking as to the correctness of any of these values, nor is it constrained to use the values provided. If any of the record number, byte offset or record length are not specified by the calling routine, and the record retrieval is successful, the correct values are copied into the caller's key structure. The flag value is specified by or 'ing the following values:
R_LENGTH The record length is valid.
R_OFFSET The byte offset is valid.
R_RECNO The record number is valid.
Data structures may reference byte strings of essentially unlimited length, however, the strings must fit into available memory.
[EINVAL] A parameter has been specified (hash function, pad byte etc.) that is incompatible with the current file specification or there is a mismatch between the version number of file and the software.
[EBADFORMAT] A file used by an open routine is incorrectly formatted, corrupted, or otherwise unusable.
<Note, this errno does not currently exist.>
The get routines may fail and set errno for any of the errors specified for the library routine malloc (3).
The close routines may fail and set errno for any of the errors specified for the library routines close (2), free (3), or fsync (2).
The sync routines may fail and set errno for any of the errors specified for the library routine fsync (2).
Btrees should reclaim unused pages automatically, and key lengths should be unbounded.
Currently, none of the access methods provide any control for concurrent access, locking, or transactions.