xref: /openbsd-src/share/man/man9/vnode.9 (revision db3296cf5c1dd9058ceecc3a29fe4aaa0bd26000)
1.\"     $OpenBSD: vnode.9,v 1.18 2003/06/06 20:56:32 jmc Exp $
2.\"
3.\" Copyright (c) 2001 Constantine Sapuntzakis
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\"
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. The name of the author may not be used to endorse or promote products
13.\"    derived from this software without specific prior written permission.
14.\"
15.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
16.\" INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
17.\" AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
18.\" THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
19.\" EXEMPLARY, OR CONSEQUENTIAL  DAMAGES (INCLUDING, BUT NOT LIMITED TO,
20.\" PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
21.\" OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
22.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
23.\" OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
24.\" ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
25.\"
26.Dd February 22, 2001
27.Dt VNODE 9
28.Os
29.Sh NAME
30.Nm vnode
31.Nd an overview of vnodes
32.Sh DESCRIPTION
33A vnode is an object in kernel memory that speaks the UNIX file
34interface (open, read, write, close, readdir, etc.).
35Vnodes can represent files, directories, FIFOs, domain sockets, block devices,
36character devices.
37.Pp
38Each vnode has a set of methods which start with string 'VOP_'.
39These methods include VOP_OPEN, VOP_READ, VOP_WRITE, VOP_RENAME, VOP_CLOSE,
40VOP_MKDIR.
41Many of these methods correspond closely to the equivalent
42file system call - open, read, write, rename, etc.
43Each file system (FFS, NFS, etc.) provides implementations for these methods.
44.Pp
45The Virtual File System (VFS) library maintains a pool of vnodes.
46File systems cannot allocate their own vnodes; they must use the functions
47provided by the VFS to create and manage vnodes.
48.Ss Vnode life cycle
49When a client of the VFS requests a new vnode, the vnode allocation
50code can reuse an old vnode object that is no longer in use.
51Whether a vnode is in use is tracked by the vnode reference count
52(v_usecount).
53By convention, each open file handle holds a reference
54as do VM objects backed by files.
55A vnode with a reference count of 1 or more will not be de-allocated or
56re-used to point to a different file.
57So, if you want to ensure that your vnode doesn't become a different
58file under you, you better be sure you have a reference to it.
59A vnode that points to a valid file and has a reference count of 1 or more
60is called "active".
61.Pp
62When a vnode's reference count drops to zero, it becomes "inactive",
63that is, a candidate for reuse.
64An "inactive" vnode still refers to a valid file and one can try to
65reactivate it using
66.Xr vget 9
67(this is used a lot by caches).
68.Pp
69Before the VFS can reuse an inactive vnode to refer to another file,
70it must clean all information pertaining to the old file.
71A cleaned out vnode is called a "reclaimed" vnode.
72.Pp
73To support forceable unmounts and the
74.Xr revoke 2
75system call, the VFS may "reclaim" a vnode with a positive reference
76count.
77The "reclaimed" vnode is given to the dead file system, which
78returns errors for most operations.
79The reclaimed vnode will not be
80re-used for another file until its reference count hits zero.
81.Ss Vnode pool
82The
83.Xr getnewvnode 9
84system call allocates a vnode from the pool, possibly reusing an
85"inactive" vnode, and returns it to the caller.
86The vnode returned has a reference count (v_usecount) of 1.
87.Pp
88The
89.Xr vref 9
90call increments the reference count on the vnode.
91It may only be on a vnode with reference count of 1 or greater.
92The
93.Xr vrele 9
94and
95.Xr vput 9
96calls decrement the reference count.
97In addition, the
98.Xr vput 9
99call also releases the vnode lock.
100.Pp
101The
102.Xr vget 9
103call, when used on an inactive vnode, will make the vnode "active"
104by bumping the reference count to one.
105When called on an active vnode, vget increases the reference count by one.
106However, if the vnode is being reclaimed concurrently, then vget will fail
107and return an error.
108.Pp
109The
110.Xr vgone 9
111and
112.Xr vgonel 9
113orchestrate the reclamation of a vnode.
114They can be called on both active and inactive vnodes.
115.Pp
116When transitioning a vnode to the "reclaimed" state, the VFS will call
117.Xr VOP_RECLAIM 9
118method.
119File systems use this method to free any file-system specific data
120they attached to the vnode.
121.Ss Vnode locks
122The vnode actually has three different types of lock: the vnode lock,
123the vnode interlock, and the vnode reclamation lock (VXLOCK).
124.Ss The vnode lock
125The vnode lock and its consistent use accomplishes the following:
126.Bl -bullet
127.It
128It keeps a locked vnode from changing across certain pairs of VOP_ calls,
129thus preserving cached data.
130For example, it keeps the directory from
131changing between a VOP_LOOKUP call and a VOP_CREATE.
132The VOP_LOOKUP call makes sure the name doesn't already exist in the
133directory and finds free room in the directory for the new entry.
134The VOP_CREATE can then go ahead and create the file without checking if
135it already exists or looking for free space.
136.It
137Some file systems rely on it to ensure that only one "thread" at a time
138is calling VOP_ vnode operations on a given file or directory.
139Otherwise, the file system's behavior is undefined.
140.It
141On rare occasions, code will hold the vnode lock so that a series of
142VOP_ operations occurs as an atomic unit.
143(Of course, this doesn't work with network file systems like NFSv2 that don't
144have any notion of bundling a bunch of operations into an atomic unit.)
145.It
146While the vnode lock is held, the vnode will not be reclaimed.
147.El
148.Pp
149There is a discipline to using the vnode lock.
150Some VOP_ operations require that the vnode lock is held before being called.
151A description of this rather arcane locking discipline is in
152.Pa sys/kern/vnode_if.src .
153.Pp
154The vnode lock is acquired by calling
155.Xr vn_lock 9
156and released by calling
157.Xr VOP_UNLOCK 9 .
158.Pp
159A process is allowed to sleep while holding the vnode lock.
160.Pp
161The implementation of the vnode lock is the responsibility of the individual
162file systems.
163Not all file systems implement it.
164.Pp
165To prevent deadlocks, when acquiring locks on multiple vnodes, the lock
166of parent directory must be acquired before the lock on the child directory.
167.Ss Vnode interlock
168The vnode interlock (vp->v_interlock) is a spinlock.
169It is useful on multi-processor systems for acquiring a quick exclusive
170lock on the contents of the vnode.
171It MUST NOT be held while sleeping.
172(What fields does it cover? What about splbio/interrupt issues?)
173.Pp
174Operations on this lock are a no-op on uniprocessor systems.
175.Ss Other Vnode synchronization
176The vnode reclamation lock (VXLOCK) is used to prevent multiple
177processes from entering the vnode reclamation code.
178It is also used as a flag to indicate that reclamation is in progress.
179The VXWANT flag is set by processes that wish to be woken up when reclamation
180is finished.
181.Pp
182The
183.Xr vwaitforio 9
184call is used to wait for all outstanding write I/Os associated with a
185vnode to complete.
186.Ss Version number/capability
187The vnode capability, v_id, is a 32-bit version number on the vnode.
188Every time a vnode is reassigned to a new file, the vnode capability
189is changed.
190This is used by code that wishes to keep pointers to vnodes but doesn't want
191to hold a reference (e.g., caches).
192The code keeps both a vnode * and a copy of the capability.
193The code can later compare the vnode's capability to its copy and see
194if the vnode still points to the same file.
195.Pp
196Note: for this to work, memory assigned to hold a struct vnode can
197only be used for another purpose when all pointers to it have disappeared.
198Since the vnode pool has no way of knowing when all pointers have
199disappeared, it never frees memory it has allocated for vnodes.
200.Ss Vnode fields
201Most of the fields of the vnode structure should be treated as opaque
202and only manipulated through the proper APIs.
203This section describes the fields that are manipulated directly.
204.Pp
205The v_flag attribute contains random flags related to various functions.
206They are summarized in table ...
207.Pp
208The v_tag attribute indicates what file system the vnode belongs to.
209Very little code actually uses this attribute and its use is deprecated.
210Programmers should seriously consider using more object-oriented approaches
211(e.g. function tables).
212There is no safe way of defining new v_tags for loadable file systems.
213The v_tag attribute is read-only.
214.Pp
215The v_type attribute indicates what type of file (e.g. directory,
216regular, FIFO) this vnode is.
217This is used by the generic code for various checks.
218For example, the
219.Xr read 2
220system call returns an error when a read is attempted on a directory.
221.Pp
222The v_data attribute allows a file system to attach a piece of file
223system specific memory to the vnode.
224This contains information about the file that is specific to
225the file system.
226.Pp
227The v_numoutput attribute indicates the number of pending synchronous
228and asynchronous writes on the vnode.
229It does not track the number of dirty buffers attached to the vnode.
230The attribute is used by code like fsync to wait for all writes
231to complete before returning to the user.
232This attribute must be manipulated at splbio().
233.Pp
234The v_writecount attribute tracks the number of write calls pending
235on the vnode.
236.Ss RULES
237The vast majority of vnode functions may not be called from interrupt
238context.
239The exceptions are bgetvp and brelvp.
240The following fields of the vnode are manipulated at interrupt level:
241v_numoutput, v_holdcnt, v_dirtyblkhd, v_cleanblkhd, v_bioflag, v_freelist,
242and v_synclist.
243Any access to these fields should be protected by splbio.
244.Sh HISTORY
245This document first appeared in
246.Ox 2.9 .
247