1.Dd February 22, 2001 2.Dt vnode 9 3.Os OpenBSD 2.9 4.Sh NAME 5.Nm vnode 6.Nd an overview of vnodes 7.Sh DESCRIPTION 8The vnode is the kernel object that corresponds to a file (actually, 9a file, a directory, a fifo, a domain socket, a symlink, or a device). 10.Pp 11Each vnode has a set of methods corresponding to file operations 12(vop_open, vop_read, vop_write, vop_rename, vop_mkdir, vop_close). 13These methods are implemented by the individual file systems and 14are dispatched through function pointers. 15.Pp 16In addition, the VFS has functions for maintaining a pool of vnodes, 17associating vnodes with mount points, and associating vnodes with buffers. 18The individual file systems cannot override these functions. As such, 19individual file systems cannot allocate their own vnodes. 20.Pp 21In general, the contents of a struct vnode should not be examined or 22modified by the users of vnode methods. There are some rather common 23exceptions detailed later in this document. 24.Pp 25The vast majority of the vnode functions CANNOT be called from interrupt 26context. 27.Ss Vnode pool 28All the vnodes in the kernel are allocated out of a shared pool. 29The 30.Xr getnewvnode 9 31system call returns a fresh vnode from the vnode 32pool. The vnode returned has a reference count (v_usecount) of 1. 33.Pp 34The 35.Xr vref 9 36call increments the reference count on the vnode. The 37.Xr vrele 9 38and 39.Xr vput 9 40calls decrement the reference count. 41In addition, the 42.Xr vput 9 43call also releases the vnode lock. 44.Pp 45When a vnode's reference count becomes zero, the vnode pool places it 46a pool of free vnodes, eligible to be assigned to a different file. 47The vnode pool calls the 48.Xr vop_inactive 9 49method to inform the file system that the reference count has reached zero. 50.Pp 51When placed in the pool of free vnodes, the vnode is not otherwise altered. 52In fact, it can often be retrieved before it is reassigned to a different file. 53This is useful when the system closes a file and opens it again in rapid 54succession. The 55.Xr vget 9 56call is used to revive the vnode. Note, callers should ensure the vnode 57they get back has not been reassigned to a different file. 58.Pp 59When the vnode pool decides to reclaim the vnode to satisfy a getnewvnode 60request, it calls the 61.Xr vop_reclaim 9 62method. File systems 63often use this method to free any file-system specific data they 64attach to the vnode. 65.Pp 66A file system can force a vnode with a reference count of zero 67to be reclaimed earlier by calling the 68.Xr vrecycle 9 69call. The 70.Xr vrecycle 9 71call is a null operation if the reference count is greater than zero. 72.Pp 73The 74.Xr vgone 9 75and 76.Xr vgonel 9 77calls will force the pool to reclaim 78the vnode even if it has a non-zero reference count. If the vnode had 79a non-zero reference count, the vnode is then assigned an operations 80vector corresponding to the "dead" file system. In this operations 81vector, most operations return errors. 82.Ss Vnode locks 83Note to beginners: locks don't actually prevent memory from being read 84or overwritten. Instead, they are an object that, where used, allows 85only one piece of code to proceed through the locked section. If you 86do not surround a stretch of code with a lock, it can and probably 87will eventually be executed simultaneously with other stretches of code 88(including stretches ). Chances are the results will be unexpected and 89disappointing to both the user and you. 90.Pp 91The vnode actually has three different types of lock: the vnode lock, 92the vnode interlock, and the vnode reclamation lock (VXLOCK). 93.Ss The vnode lock 94The most general lock is the vnode lock. This lock is acquired by 95calling 96.Xr vn_lock 9 97and released by calling 98.Xr vn_unlock 9 99. The vnode lock is used to serialize operations through the file system for 100a given file when there are multiple concurrent requests on the same file. 101Many file system functions require that you hold the vnode lock on entry. 102The vnode lock may be held when sleeping. 103.Pp 104The 105.Xr revoke 2 106and forcible unmount features in BSD UNIX allows a 107user to invalidate files and their associated vnodes at almost any 108time, even if there are active open files on it. While in a region of code 109protected by the vnode lock, the process is guaranteed that the vnode 110will not be reclaimed or invalidated. 111.Pp 112The vnode lock is a multiple-reader or single-writer lock. An 113exclusive vnode lock may be acquired multiple times by the same 114process. 115.Pp 116The vnode lock is somewhat messy because it is used for many purposes. 117Some clients of the vnode interface use it to try to bundle a series 118of VOP_ method calls into an atomic group. Many file systems rely on 119it to prevent race conditions in updating file system specific data 120structures (as opposed to having their own locks). 121.Pp 122The implementation of the vnode lock is the responsibility of the individual 123file systems. Not all file system implement it. 124.Pp 125To prevent deadlocks, when acquiring locks on multiple vnodes, the lock 126of parent directory must be acquired before the lock on the child directory. 127.Pp 128Interrupt handlers must not acquire vnode locks. 129.Ss Vnode interlock 130The vnode interlock (vp->v_interlock) is a spinlock. It is useful on 131multi-processor systems for acquiring a quick exclusive lock on the 132contents of the vnode. It MUST NOT be held while sleeping. (What 133fields does it cover? What about splbio/interrupt issues?) 134.Pp 135Operations on this lock are a no-op on uniprocessor systems. 136.Ss Other Vnode synchronization 137The vnode reclamation lock (VXLOCK) is used to prevent multiple 138processes from entering the vnode reclamation code. It is also used as 139a flag to indicate that reclamation is in progress. The VXWANT flag is 140set by processes that wish to woken up when reclamation is finished. 141.Pp 142The 143.Xr vwaitforio 9 144call is used for to wait for all outstanding write I/Os associated with a 145vnode to complete. 146.Ss Version number/capability 147The vnode capability, v_id, is a 32-bit version number on the vnode. 148Every time a vnode is reassigned to a new file, the vnode capability 149is changed. This is used by code that wish to keep pointers to vnodes 150but doesn't want to hold a reference (e.g. caches). The code keeps 151both a vnode * and a copy of the capability. The code can later compare 152the vnode's capability to its copy and see if the vnode still 153points to the same file. 154.Pp 155Note: for this to work, memory assigned to hold a struct vnode can 156only be used for another purpose when all pointers to it have disappeared. 157Since the vnode pool has no way of knowing when all pointers have 158disappeared, it never frees memory it has allocated for vnodes. 159.Ss Vnode fields 160Most of the fields of the vnode structure should be treated as opaque 161and only manipulated through the proper APIs. This section describes 162the fields that are manipulated directly. 163.Pp 164The v_flag attribute contains random flags related to various functions. 165They are summarized in table ... 166.Pp 167The v_tag attribute indicates what file system the vnode belongs to. 168Very little code actually uses this attribute and its use is deprecated. 169Programmers should seriously consider using more object-oriented approaches 170(e.g. function tables). There is no safe way of defining new v_tags 171for loadable file systems. The v_tag attribute is read-only. 172.Pp 173The v_type attribute indicates what type of file (e.g. directory, 174regular, fifo) this vnode is. This is used by the generic code to 175ensure for various checks. For example, the 176.Xr read 2 177system call returns an error when a read is attempted on a directory. 178.Pp 179The v_data attribute allows a file system to attach piece of file 180system specific memory to the vnode. This contains information about 181the file that is specific to the file system. 182.Pp 183The v_numoutput attribute indicates the number of pending synchronous 184and asynchronous writes on the vnode. It does not track the number of 185dirty buffers attached to the vnode. The attribute is used by code 186like fsync to wait for all writes to complete before returning to the 187user. This attribute must be manipulated at splbio(). 188.Pp 189The v_writecount attribute tracks the number of write calls pending 190on the vnode. 191.Ss RULES 192The vast majority of vnode functions may not be called from interrupt 193context. The exceptions are bgetvp and brelvp. The following 194fields of the vnode are manipulated at interrupt level: v_numoutput, 195v_holdcnt, v_dirtyblkhd, v_cleanblkhd, v_bioflag, v_freelist, and 196v_synclist. Any accesses to these field should be protected by splbio, 197unless you are certain that there is no chance an interrupt handler 198will modify them. 199.Pp 200A vnode will only be reassigned to another file when its reference count 201reaches zero and the vnode lock is freed. 202.Pp 203A vnode will not be reclaimed as long as the vnode lock is held. 204If the vnode reference count drops to zero while a process is holding 205the vnode lock, the vnode MAY be queued for reclamation. Increasing 206the reference count from 0 to 1 while holding the lock will most likely 207cause intermittent kernel panics. 208.Sh SEE ALSO 209 210.Sh HISTORY 211This document first appeared in 212.Ox 2.9 213. 214 215 216