1.\" $OpenBSD: vnode.9,v 1.29 2018/06/04 19:42:54 kn Exp $ 2.\" 3.\" Copyright (c) 2001 Constantine Sapuntzakis 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. The name of the author may not be used to endorse or promote products 13.\" derived from this software without specific prior written permission. 14.\" 15.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, 16.\" INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY 17.\" AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL 18.\" THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 19.\" EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 20.\" PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; 21.\" OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, 22.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR 23.\" OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 24.\" ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 25.\" 26.Dd $Mdocdate: June 4 2018 $ 27.Dt VNODE 9 28.Os 29.Sh NAME 30.Nm vnode 31.Nd an overview of vnodes 32.Sh DESCRIPTION 33A 34.Em vnode 35is an object in kernel memory that speaks the 36.Ux 37file interface (open, read, write, close, readdir, etc.). 38Vnodes can represent files, directories, FIFOs, domain sockets, block devices, 39character devices. 40.Pp 41Each vnode has a set of methods which start with the string 42.Dq VOP_ . 43These methods include 44.Fn VOP_OPEN , 45.Fn VOP_READ , 46.Fn VOP_WRITE , 47.Fn VOP_RENAME , 48.Fn VOP_CLOSE , 49and 50.Fn VOP_MKDIR . 51Many of these methods correspond closely to the equivalent 52file system call \- 53.Xr open 2 , 54.Xr read 2 , 55.Xr write 2 , 56.Xr rename 2 , 57etc. 58Each file system (FFS, NFS, etc.) provides implementations for these methods. 59.Pp 60The Virtual File System library (see 61.Xr vfs 9 ) 62maintains a pool of vnodes. 63File systems cannot allocate their own vnodes; they must use the functions 64provided by the VFS to create and manage vnodes. 65.Pp 66The definition of a vnode is as follows: 67.Bd -literal 68struct vnode { 69 struct uvm_vnode *v_uvm; /* uvm data */ 70 struct vops *v_op; /* vnode operations vector */ 71 enum vtype v_type; /* vnode type */ 72 enum vtagtype v_tag; /* type of underlying data */ 73 u_int v_flag; /* vnode flags (see below) */ 74 u_int v_usecount; /* reference count of users */ 75 /* reference count of writers */ 76 u_int v_writecount; 77 /* Flags that can be read/written in interrupts */ 78 u_int v_bioflag; 79 u_int v_holdcnt; /* buffer references */ 80 u_int v_id; /* capability identifier */ 81 u_int v_inflight; 82 struct mount *v_mount; /* ptr to vfs we are in */ 83 TAILQ_ENTRY(vnode) v_freelist; /* vnode freelist */ 84 LIST_ENTRY(vnode) v_mntvnodes; /* vnodes for mount point */ 85 struct buf_rb_bufs v_bufs_tree; /* lookup of all bufs */ 86 struct buflists v_cleanblkhd; /* clean blocklist head */ 87 struct buflists v_dirtyblkhd; /* dirty blocklist head */ 88 u_int v_numoutput; /* num of writes in progress */ 89 LIST_ENTRY(vnode) v_synclist; /* vnode with dirty buffers */ 90 union { 91 struct mount *vu_mountedhere;/* ptr to mounted vfs (VDIR) */ 92 struct socket *vu_socket; /* unix ipc (VSOCK) */ 93 struct specinfo *vu_specinfo; /* device (VCHR, VBLK) */ 94 struct fifoinfo *vu_fifoinfo; /* fifo (VFIFO) */ 95 } v_un; 96 97 /* VFS namecache */ 98 struct namecache_rb_cache v_nc_tree; 99 TAILQ_HEAD(, namecache) v_cache_dst; /* cache entries to us */ 100 101 void *v_data; /* private data for fs */ 102 struct selinfo v_selectinfo; /* identity of poller(s) */ 103}; 104#define v_mountedhere v_un.vu_mountedhere 105#define v_socket v_un.vu_socket 106#define v_specinfo v_un.vu_specinfo 107#define v_fifoinfo v_un.vu_fifoinfo 108.Ed 109.Ss Vnode life cycle 110When a client of the VFS requests a new vnode, the vnode allocation 111code can reuse an old vnode object that is no longer in use. 112Whether a vnode is in use is tracked by the vnode reference count 113.Pq Va v_usecount . 114By convention, each open file handle holds a reference 115as do VM objects backed by files. 116A vnode with a reference count of 1 or more will not be deallocated or 117reused to point to a different file. 118So, if you want to ensure that your vnode doesn't become a different 119file under you, you better be sure you have a reference to it. 120A vnode that points to a valid file and has a reference count of 1 or more 121is called 122.Em active . 123.Pp 124When a vnode's reference count drops to zero, it becomes 125.Em inactive , 126that is, a candidate for reuse. 127An inactive vnode still refers to a valid file and one can try to 128reactivate it using 129.Xr vget 9 130(this is used a lot by caches). 131.Pp 132Before the VFS can reuse an inactive vnode to refer to another file, 133it must clean all information pertaining to the old file. 134A cleaned out vnode is called a 135.Em reclaimed 136vnode. 137.Pp 138To support forceable unmounts and the 139.Xr revoke 2 140system call, the VFS may reclaim a vnode with a positive reference 141count. 142The reclaimed vnode is given to the dead file system, which 143returns errors for most operations. 144The reclaimed vnode will not be 145reused for another file until its reference count hits zero. 146.Ss Vnode pool 147The 148.Xr getnewvnode 9 149call allocates a vnode from the pool, possibly reusing an 150inactive vnode, and returns it to the caller. 151The vnode returned has a reference count 152.Pq Va v_usecount 153of 1. 154.Pp 155The 156.Xr vref 9 157call increments the reference count on the vnode. 158It may only be on a vnode with reference count of 1 or greater. 159The 160.Xr vrele 9 161and 162.Xr vput 9 163calls decrement the reference count. 164In addition, the 165.Xr vput 9 166call also releases the vnode lock. 167.Pp 168The 169.Xr vget 9 170call, when used on an inactive vnode, will make the vnode active 171by bumping the reference count to one. 172When called on an active vnode, 173.Fn vget 174increases the reference count by one. 175However, if the vnode is being reclaimed concurrently, then 176.Fn vget 177will fail and return an error. 178.Pp 179The 180.Xr vgone 9 181and 182.Xr vgonel 9 183calls 184orchestrate the reclamation of a vnode. 185They can be called on both active and inactive vnodes. 186.Pp 187When transitioning a vnode to the reclaimed state, the VFS will call the 188.Xr VOP_RECLAIM 9 189method. 190File systems use this method to free any file-system-specific data 191they attached to the vnode. 192.Ss Vnode locks 193The vnode actually has two different types of locks: the vnode lock 194and the vnode reclamation lock 195.Pq Dv VXLOCK . 196.Ss The vnode lock 197The vnode lock and its consistent use accomplishes the following: 198.Bl -bullet 199.It 200It keeps a locked vnode from changing across certain pairs of VOP_ calls, 201thus preserving cached data. 202For example, it keeps the directory from 203changing between a 204.Xr VOP_LOOKUP 9 205call and a 206.Xr VOP_CREATE 9 . 207The 208.Fn VOP_LOOKUP 209call makes sure the name doesn't already exist in the 210directory and finds free room in the directory for the new entry. 211The 212.Fn VOP_CREATE 213call can then go ahead and create the file without checking if 214it already exists or looking for free space. 215.It 216Some file systems rely on it to ensure that only one 217.Dq thread 218at a time 219is calling VOP_ vnode operations on a given file or directory. 220Otherwise, the file system's behavior is undefined. 221.It 222On rare occasions, code will hold the vnode lock so that a series of 223VOP_ operations occurs as an atomic unit. 224(Of course, this doesn't work with network file systems like NFSv2 that don't 225have any notion of bundling a bunch of operations into an atomic unit.) 226.It 227While the vnode lock is held, the vnode will not be reclaimed. 228.El 229.Pp 230There is a discipline to using the vnode lock. 231Some VOP_ operations require that the vnode lock is held before being called. 232.Pp 233The vnode lock is acquired by calling 234.Xr vn_lock 9 235and released by calling 236.Xr VOP_UNLOCK 9 . 237.Pp 238A process is allowed to sleep while holding the vnode lock. 239.Pp 240The implementation of the vnode lock is the responsibility of the individual 241file systems. 242Not all file systems implement it. 243.Pp 244To prevent deadlocks, when acquiring locks on multiple vnodes, the lock 245of parent directory must be acquired before the lock on the child directory. 246.Ss Other vnode synchronization 247The vnode reclamation lock 248.Pq Dv VXLOCK 249is used to prevent multiple 250processes from entering the vnode reclamation code. 251It is also used as a flag to indicate that reclamation is in progress. 252The 253.Dv VXWANT 254flag is set by processes that wish to be woken up when reclamation 255is finished. 256.Pp 257The 258.Xr vwaitforio 9 259call is used to wait for all outstanding write I/Os associated with a 260vnode to complete. 261.Ss Version number/capability 262The vnode capability, 263.Va v_id , 264is a 32-bit version number on the vnode. 265Every time a vnode is reassigned to a new file, the vnode capability 266is changed. 267This is used by code that wishes to keep pointers to vnodes but doesn't want 268to hold a reference (e.g., caches). 269The code keeps both a vnode pointer and a copy of the capability. 270The code can later compare the vnode's capability to its copy and see 271if the vnode still points to the same file. 272.Pp 273Note: for this to work, memory assigned to hold a 274.Vt struct vnode 275can 276only be used for another purpose when all pointers to it have disappeared. 277Since the vnode pool has no way of knowing when all pointers have 278disappeared, it never frees memory it has allocated for vnodes. 279.Ss Vnode fields 280Most of the fields of the vnode structure should be treated as opaque 281and only manipulated through the proper APIs. 282This section describes the fields that are manipulated directly. 283.Pp 284The 285.Va v_flag 286attribute contains random flags related to various functions. 287They are summarized in the following table: 288.Pp 289.Bl -tag -width 10n -compact -offset indent 290.It Dv VROOT 291This vnode is the root of its file system. 292.It Dv VTEXT 293This vnode is a pure text prototype. 294.It Dv VSYSTEM 295This vnode is being used by kernel. 296.It Dv VISTTY 297This vnode represents a 298.Xr tty 4 . 299.It Dv VXLOCK 300This vnode is locked to change its underlying type. 301.It Dv VXWANT 302A process is waiting for this vnode. 303.It Dv VALIASED 304This vnode has an alias. 305.It Dv VLOCKSWORK 306This vnode's underlying file system supports locking discipline. 307.El 308.Pp 309The 310.Va v_tag 311attribute indicates what file system the vnode belongs to. 312Very little code actually uses this attribute and its use is deprecated. 313Programmers should seriously consider using more object-oriented approaches 314(e.g. function tables). 315There is no safe way of defining new 316.Va v_tag Ns 's 317for loadable file systems. 318The 319.Va v_tag 320attribute is read-only. 321.Pp 322The 323.Va v_type 324attribute indicates what type of file (e.g. directory, 325regular, FIFO) this vnode is. 326This is used by the generic code for various checks. 327For example, the 328.Xr read 2 329system call returns zero when a read is attempted on a directory. 330.Pp 331Possible types are: 332.Pp 333.Bl -tag -width 10n -offset indent -compact 334.It Dv VNON 335This vnode has no type. 336.It Dv VREG 337This vnode represents a regular file. 338.It Dv VDIR 339This vnode represents a directory. 340.It Dv VBLK 341This vnode represents a block device. 342.It Dv VCHR 343This vnode represents a character device. 344.It Dv VLNK 345This vnode represents a symbolic link. 346.It Dv VSOCK 347This vnode represents a socket. 348.It Dv VFIFO 349This vnode represents a named pipe. 350.It Dv VBAD 351This vnode represents a bad or dead file. 352.El 353.Pp 354The 355.Va v_data 356attribute allows a file system to attach a piece of file 357system specific memory to the vnode. 358This contains information about the file that is specific to 359the file system (such as an inode pointer in the case of FFS). 360.Pp 361The 362.Va v_numoutput 363attribute indicates the number of pending synchronous 364and asynchronous writes on the vnode. 365It does not track the number of dirty buffers attached to the vnode. 366The attribute is used by code like 367.Xr fsync 2 368to wait for all writes 369to complete before returning to the user. 370This attribute must be manipulated at 371.Xr splbio 9 . 372.Pp 373The 374.Va v_writecount 375attribute tracks the number of write calls pending 376on the vnode. 377.Ss Rules 378The vast majority of vnode functions may not be called from interrupt 379context. 380The exceptions are 381.Fn bgetvp 382and 383.Fn brelvp . 384The following fields of the vnode are manipulated at interrupt level: 385.Va v_numoutput , v_holdcnt , v_dirtyblkhd , 386.Va v_cleanblkhd , v_bioflag , v_freelist , 387and 388.Va v_synclist . 389Any access to these fields should be protected by 390.Xr splbio 9 . 391.Sh SEE ALSO 392.Xr uvm 9 , 393.Xr vaccess 9 , 394.Xr vclean 9 , 395.Xr vcount 9 , 396.Xr vdevgone 9 , 397.Xr vfinddev 9 , 398.Xr vflush 9 , 399.Xr vflushbuf 9 , 400.Xr vfs 9 , 401.Xr vget 9 , 402.Xr vgone 9 , 403.Xr vhold 9 , 404.Xr vinvalbuf 9 , 405.Xr vn_lock 9 , 406.Xr VOP_LOOKUP 9 , 407.Xr vput 9 , 408.Xr vrecycle 9 , 409.Xr vref 9 , 410.Xr vrele 9 , 411.Xr vwaitforio 9 , 412.Xr vwakeup 9 413.Sh HISTORY 414This document first appeared in 415.Ox 2.9 . 416