hpfs_vfsops.c - OpenGrok history log for /dflybsd-src/sys/vfs/hpfs/hpfs

Revision	Date	Author	Comments
# 28623bf9	27-Oct-2006	Matthew Dillon <dillon@dragonflybsd.org>	Major namecache work primarily to support NULLFS. * Move the nc_mount field out of the namecache{} record and use a new namecache handle structure called nchandle { mount, ncp } for all API acce Major namecache work primarily to support NULLFS. * Move the nc_mount field out of the namecache{} record and use a new namecache handle structure called nchandle { mount, ncp } for all API accesses to the namecache. * Remove all mount point linkages from the namecache topology. Each mount now has its own namecache topology rooted at the root of the mount point. Mount points are flagged in their underlying filesystem's namecache topology but instead of linking the mount into the topology, the flag simply triggers a mountlist scan to locate the mount. ".." is handled the same way... when the root of a topology is encountered the scan can traverse to the underlying filesystem via a field stored in the mount structure. * Ref the mount structure based on the number of nchandle structures referencing it, and do not kfree() the mount structure during a forced unmount if refs remain. These changes have the following effects: * Traversal across mount points no longer require locking of any sort, preventing process blockages occuring in one mount from leaking across a mount point to another mount. * Aliased namespaces such as occurs with NULLFS no longer duplicate the namecache topology of the underlying filesystem. Instead, a NULLFS mount simply shares the underlying topology (differentiating between it and the underlying topology by the fact that the name cache handles { mount, ncp } contain NULLFS's mount pointer. This saves an immense amount of memory and allows NULLFS to be used heavily within a system without creating any adverse impact on kernel memory or performance. * Since the namecache topology for a NULLFS mount is shared with the underyling mount, the namecache records are in fact the same records and thus full coherency between the NULLFS mount and the underlying filesystem is maintained by design. * Future efforts, such as a unionfs or shadow fs implementation, now have a mount structure to work with. The new API is a lot more flexible then the old one. show more ...
# b13267a5	10-Sep-2006	Matthew Dillon <dillon@dragonflybsd.org>	Change the kernel dev_t, representing a pointer to a specinfo structure, to cdev_t. Change struct specinfo to struct cdev. The name 'cdev' was taken from FreeBSD. Remove the dev_t shim for the ker Change the kernel dev_t, representing a pointer to a specinfo structure, to cdev_t. Change struct specinfo to struct cdev. The name 'cdev' was taken from FreeBSD. Remove the dev_t shim for the kernel. This commit generally removes the overloading of 'dev_t' between userland and the kernel. Also fix a bug in libkvm where a kernel dev_t (now cdev_t) was not being properly converted to a userland dev_t. show more ...
# efda3bd0	05-Sep-2006	Matthew Dillon <dillon@dragonflybsd.org>	Rename malloc->kmalloc, free->kfree, and realloc->krealloc. Pass 1
# 66a1ddf5	18-Jul-2006	Matthew Dillon <dillon@dragonflybsd.org>	Remove several layers in the vnode operations vector init code. Declare the operations vector directly instead of via a descriptor array. Remove most of the recalculation code, it stopped being nee Remove several layers in the vnode operations vector init code. Declare the operations vector directly instead of via a descriptor array. Remove most of the recalculation code, it stopped being needed over a year ago. This work is similar to what FreeBSD now does, but was developed along a different line. Ultimately our vop_ops will become SYSLINK ops for userland VFS and clustering support. show more ...
# acde96db	06-May-2006	Matthew Dillon <dillon@dragonflybsd.org>	Remove the thread argument from all mount->vfs_* function vectors, replacing it with a ucred pointer when applicable. This cleans up a considerable amount of VFS function code that previously delved Remove the thread argument from all mount->vfs_* function vectors, replacing it with a ucred pointer when applicable. This cleans up a considerable amount of VFS function code that previously delved into the process structure to get the cred, though some code remains. Get rid of the compatibility thread argument for hpfs and nwfs. Our lockmgr calls are now mostly compatible with NetBSD (which doesn't use a thread argument either). Get rid of some complex junk in fdesc_statfs() that nobody uses. Remove the thread argument from dounmount() as well as various other filesystem specific procedures (quota calls primarily) which no longer need it due to the lockmgr, VOP, and VFS cleanups. These cleanups also have the effect of making the VFS code slightly less dependant on the calling thread's context. show more ...
# 87de5057	06-May-2006	Matthew Dillon <dillon@dragonflybsd.org>	The thread/proc pointer argument in the VFS subsystem originally existed for... well, I'm not sure WHY it originally existed when most of the time the pointer couldn't be anything other then curth The thread/proc pointer argument in the VFS subsystem originally existed for... well, I'm not sure WHY it originally existed when most of the time the pointer couldn't be anything other then curthread or curproc or the code wouldn't work. This is particularly true of lockmgr locks. Remove the pointer argument from all VOP_*() functions, all fileops functions, and most ioctl functions. show more ...
# 056f4388	23-Apr-2006	Matthew Dillon <dillon@dragonflybsd.org>	Remove the now unused interlock argument to the lockmgr() procedure. This argument has been abused over the years by kernel programmers attempting to optimize certain locking and data modification se Remove the now unused interlock argument to the lockmgr() procedure. This argument has been abused over the years by kernel programmers attempting to optimize certain locking and data modification sequences, resulting in a virtually unreadable code in some cases. The interlock also made porting between BSDs difficult as each BSD implemented their interlock differently. DragonFly has slowly removed use of the interlock argument and we can now finally be rid of it entirely. show more ...
# 54078292	24-Mar-2006	Matthew Dillon <dillon@dragonflybsd.org>	Major BUF/BIO work commit. Make I/O BIO-centric and specify the disk or file location with a 64 bit offset instead of a 32 bit block number. * All I/O is now BIO-centric instead of BUF-centric. * Major BUF/BIO work commit. Make I/O BIO-centric and specify the disk or file location with a 64 bit offset instead of a 32 bit block number. * All I/O is now BIO-centric instead of BUF-centric. * File/Disk addresses universally use a 64 bit bio_offset now. bio_blkno no longer exists. * Stackable BIO's hold disk offset translations. Translations are no longer overloaded onto a single structure (BUF or BIO). * bio_offset == NOOFFSET is now universally used to indicate that a translation has not been made. The old (blkno == lblkno) junk has all been removed. * There is no longer a distinction between logical I/O and physical I/O. * All driver BUFQs have been converted to BIOQs. * BMAP, FREEBLKS, getblk, bread, breadn, bwrite, inmem, cluster_*, and findblk all now take and/or return 64 bit byte offsets instead of block numbers. Note that BMAP now returns a byte range for the before and after variables. show more ...
# b1ce5639	13-Jan-2006	Sascha Wildner <swildner@dragonflybsd.org>	* Remove (void) casts for discarded return values. * Put function types on separate lines. * Ansify function definitions. In-collaboration-with: Alexey Slynko <slynko@tronet.ru>
# dc1be39c	17-Sep-2005	Matthew Dillon <dillon@dragonflybsd.org>	Add an argument to vfs_add_vnodeops() to specify VVF_* flags for the vop_ops structure. Add a new flag called VVF_SUPPORTS_FSMID to indicate filesystems which support persistent storage of FSMIDs. Add an argument to vfs_add_vnodeops() to specify VVF_* flags for the vop_ops structure. Add a new flag called VVF_SUPPORTS_FSMID to indicate filesystems which support persistent storage of FSMIDs. Rework the FSMID code a bit to reduce overhead. Use the spare field in the UFS inode structure to implement a persistent FSMID. The FSMID is recursively marked in the namecache but not adjusted until the next getattr() call on the related inode(s), or when the vnode is reclaimed. show more ...
# f91a71dd	02-Aug-2005	Joerg Sonnenberger <joerg@dragonflybsd.org>	Make nlink_t 32bit and ino_t 64bit. Implement the old syscall numbers for stat by wrapping the new syscalls and truncation of the values. Add a hack for boot2 to keep ino_t 32bit, otherwise we would Make nlink_t 32bit and ino_t 64bit. Implement the old syscall numbers for stat by wrapping the new syscalls and truncation of the values. Add a hack for boot2 to keep ino_t 32bit, otherwise we would have to link the 64bit math code in and that would most likely overflow boot2. Bump libc major to annotate changed ABI and work around a problem with strip during installworld. strip is dynamically linked and doesn't play well with the new libc otherwise. Support for 64bit inode numbers is still incomplete, because the dirent limited to 32bit. The checks for nlink_t have to be redone too. show more ...
# 43c45e8f	26-Jul-2005	Hiten Pandya <hmp@dragonflybsd.org>	Clean the VFS operations vector and related code: * take advantage of C99 sparse structure initialisation, this allows us to initialise left out vfsops entries cleanly when vfs_register() is cal Clean the VFS operations vector and related code: * take advantage of C99 sparse structure initialisation, this allows us to initialise left out vfsops entries cleanly when vfs_register() is called; any vfsop entries that are not specified will be assigned vfs_std* functions. the only exception to this rule is VFS_SYNC which is assigned vfs_stdnosync() since a file system may not have support for it. file systems can simply assign vfs_stdsync if they do not have their own sync operation. * add KKASSERTS to make sure that the VFS_ROOT, VFS_MOUNT and VFS_UNMOUNT vfs operations are provided by a file system being registered. all of the above are necessary to ensure a minimally working file system. * remove scattered no-op definitions of VFS_START() vfsop vector entry and take advantage of sparse vfsop initialisation. VFS_START is only used by MFS to make ensure calling process is not swapped out when I/O is initialised. The entry point is called from the mount path, before the file system is marked ready. * remove scattered no-op definitions of VFS_QUOTACTL() vfsop vector entry and take advantage of sparse vfsop initialisation. * give UFS a VFS_UNINIT vfsop entry and make use of it in ext2fs when ripping down the hash tables. * many file systems in the kernel seem to not implement the complementing VFS_UNINIT() vfsop entry, this is not so much of a problem when the file system is compiled into the kernel, but it can leave leakage when compiled as KLD modules. add uninitialisation code and entry points for ext2fs, ufs, fdescfs. grab the ufs_ihash_token when free'ing the inode hash table at ripping time. * add typedefs for all the vfsop entry points, make use of it in definition of struct vfsops; this results in clean and consolidate code. use the typedefs for vfs_std* function prototypes. show more ...
# 5bd39f81	25-Jul-2005	Hiten Pandya <hmp@dragonflybsd.org>	Remove conditional bits about other operating systems, they are not required and just get in the way.
# 75ffff0d	02-Feb-2005	Joerg Sonnenberger <joerg@dragonflybsd.org>	Don't use the statfs field f_mntonname in filesystems. For the userland export code, it can synthesized from mnt_ncp. For debugging code, use f_mntfromname, it should be enough to find culprit. The v Don't use the statfs field f_mntonname in filesystems. For the userland export code, it can synthesized from mnt_ncp. For debugging code, use f_mntfromname, it should be enough to find culprit. The vfs_unmountall doesn't use code_fullpath to avoid problems with resource allocation and to make it more likely that a call from ddb succeds. Change getfsstat and fhstatfs to not show directories outside a chroot path, with the exception of the filesystem counting the chroot root itself. show more ...
# 64b4604e	29-Dec-2004	Matthew Dillon <dillon@dragonflybsd.org>	Get rid of dead non-DragonFly code.
# 6ddb7618	17-Dec-2004	Matthew Dillon <dillon@dragonflybsd.org>	VFS messaging/interfacing work stage 10/99: Start adding the journaling, range locking, and (very slightly) cache coherency infrastructure. Continue cleaning up the VOP operations vector. Expand o VFS messaging/interfacing work stage 10/99: Start adding the journaling, range locking, and (very slightly) cache coherency infrastructure. Continue cleaning up the VOP operations vector. Expand on past commits that gave each mount structure its own set of VOP operations vectors by adding additional vector sets for journaling or cache coherency operations. Remove the vv_jops and vv_cops fields from the vnode operations vector in favor of placing those vop_ops directly in the mount structure. Reorganize the VOP calls as a double-indirect and add a field to the mount structure which represents the current vnode operations set (which will change when e.g. journaling is turned on or off). This creates the infrastructure necessary to allow us to stack a generic journaling implementation on top of a filesystem. Introduce a hard range-locking API for vnodes. This API will be used by high level system/vfs calls in order to handle atomicy guarentees. It is a prerequisit for: (1) being able to break I/O's up into smaller pieces for the vm_page list/direct-to-DMA-without-mapping goal, (2) to support the parallel write operations on a vnode goal, (3) to support the clustered (remote) cache coherency goal, and (4) to support massive parallelism in dispatching operations for the upcoming threaded VFS work. This commit represents only infrastructure and skeleton/API work. show more ...
# fad57d0e	12-Nov-2004	Matthew Dillon <dillon@dragonflybsd.org>	VFS messaging/interfacing work stage 9/99: VFS 'NEW' API WORK. NOTE: unionfs and nullfs are temporarily broken by this commit. * Remove the old namecache API. Remove vfs_cache_lookup(), cache_look VFS messaging/interfacing work stage 9/99: VFS 'NEW' API WORK. NOTE: unionfs and nullfs are temporarily broken by this commit. * Remove the old namecache API. Remove vfs_cache_lookup(), cache_lookup(), cache_enter(), namei() and lookup() are all gone. VOP_LOOKUP() and VOP_CACHEDLOOKUP() have been collapsed into a single non-caching VOP_LOOKUP(). * Complete the new VFS CACHE (namecache) API. The new API is able to supply topological guarentees and is able to reserve namespaces, including negative cache spaces (whether the target name exists or not), which the new API uses to reserve namespace for things like NRENAME and NCREATE (and others). * Complete the new namecache API. VOP_NRESOLVE, NLOOKUPDOTDOT, NCREATE, NMKDIR, NMKNOD, NLINK, NSYMLINK, NWHITEOUT, NRENAME, NRMDIR, NREMOVE. These new calls take (typicaly locked) namecache pointers rather then combinations of directory vnodes, file vnodes, and name components. The new calls are MUCH simpler in concept and implementation. For example, VOP_RENAME() has 8 arguments while VOP_NRENAME() has only 3 arguments. The new namecache API uses the namecache to lock namespaces without having to lock the underlying vnodes. For example, this allows the kernel to reserve the target name of a create function trivially. Namecache records are maintained BY THE KERNEL for both positive and negative hits. Generally speaking, the kernel layer is now responsible for resolving path elements. NRESOLVE is called when an unresolved namecache record needs to be resolved. Unlike the old VOP_LOOKUP, NRESOLVE is simply responsible for associating a vnode to a namecache record (positive hit) or telling the system that it's a negative hit, and not responsible for handling symlinks or other special cases or doing any of the other path lookup work, much unlike the old VOP_LOOKUP. It should be particularly noted that the new namecache topology does not allow disconnected namecache records. In rare cases where a vnode must be converted to a namecache pointer for new API operation via a file handle (i.e. NFS), the cache_fromdvp() function is provided and a new API VOP, VOP_NLOOKUPDOTDOT() is provided to allow the namecache to resolve the topology leading up to the requested vnode. These and other topological guarentees greatly reduce the complexity of the new namecache API. The new namei() is called nlookup(). This function uses a combination of cache_n() calls, VOP_NRESOLVE(), and standard VOP calls resolve the supplied path, deal with symlinks, and so forth, in a nice small compact compartmentalized procedure. The old VFS code is no longer responsible for maintaining namecache records, a function which was mostly adhoc cache_purge()s occuring before the VFS actually knows whether an operation will succeed or not. The new VFS code is typically responsible for adjusting the state of locked namecache records passed into it. For example, if NCREATE succeeds it must call cache_setvp() to associate the passed namecache record with the vnode representing the successfully created file. The new requirements are much less complex then the old requirements. * Most VFSs still implement the old API calls, albeit somewhat modified and in particular the VOP_LOOKUP function is now MUCH simpler. However, the kernel now uses the new API calls almost exclusively and relies on compatibility code installed in the default ops (vop_compat_()) to convert the new calls to the old calls. All kernel system calls and related support functions which used to do complex and confusing namei() operations now do far less complex and far less confusing nlookup() operations. * SPECOPS shortcutting has been implemented. User reads and writes now go directly to supporting functions which talk to the device via fileops rather then having to be routed through VOP_READ or VOP_WRITE, saving significant overhead. Note, however, that these only really effect /dev/null and /dev/zero. Implementing this was fairly easy, we now simply pass an optional struct file pointer to VOP_OPEN() and let spec_open() handle the override. SPECIAL NOTES: It should be noted that we must still lock a directory vnode LK_EXCLUSIVE before issuing a VOP_LOOKUP(), even for simple lookups, because a number of VFS's (including UFS) store active directory scanning information in the directory vnode. The legacy NAMEI_LOOKUP cases can be changed to use LK_SHARED once these VFS cases are fixed. In particular, we are now organized well enough to actually be able to do record locking within a directory for handling NCREATE, NDELETE, and NRENAME situations, but it hasn't been done yet. Many thanks to all of the testers and in particular David Rhodus for finding a large number of panics and other issues. show more ...
# 5fd012e0	12-Oct-2004	Matthew Dillon <dillon@dragonflybsd.org>	VFS messaging/interfacing work stage 8/99: Major reworking of the vnode interlock and other miscellanious things. This patch also fixes FS corruption due to prior vfs work in head. In particular, p VFS messaging/interfacing work stage 8/99: Major reworking of the vnode interlock and other miscellanious things. This patch also fixes FS corruption due to prior vfs work in head. In particular, prior to this patch the namecache locking could introduce blocking conditions that confuse the old vnode deactivation and reclamation code paths. With this patch there appear to be no serious problems even after two days of continuous testing. * VX lock all VOP_CLOSE operations. * Fix two NFS issues. There was an incorrect assertion (found by David Rhodus), and the nfs_rename() code was not properly purging the target file from the cache, resulting in Stale file handle errors during, e.g. a buildworld with an NFS-mounted /usr/obj. * Fix a TTY session issue. Programs which open("/dev/tty" ,...) and then run the TIOCNOTTY ioctl were causing the system to lose track of the open count, preventing the tty from properly detaching. This is actually a very old BSD bug, but it came out of the woodwork in DragonFly because I am now attempting to track device opens explicitly. * Gets rid of the vnode interlock. The lockmgr interlock remains. * Introduced VX locks, which are mandatory vp->v_lock based locks. * Rewrites the locking semantics for deactivation and reclamation. (A ref'd VX lock'd vnode is now required for vgone(), VOP_INACTIVE, and VOP_RECLAIM). New guarentees emplaced with regard to vnode ripouts. * Recodes the mountlist scanning routines to close timing races. * Recodes getnewvnode to close timing races (it now returns a VX locked and refd vnode rather then a refd but unlocked vnode). * Recodes VOP_REVOKE- a locked vnode is now mandatory. * Recodes all VFS inode hash routines to close timing holes. * Removes cache_leaf_test() - vnodes representing intermediate directories are now held so the leaf test should no longer be necessary. * Splits the over-large vfs_subr.c into three additional source files, broken down by major function (locking, mount related, filesystem syncer). * Changes splvm() protection to a critical-section in a number of places (bleedover from another patch set which is also about to be committed). Known issues not yet resolved: * Possible vnode/namecache deadlocks. * While most filesystems now use vp->v_lock, I haven't done a final pass to make vp->v_lock mandatory and to clean up the few remaining inode based locks (nwfs I think and other obscure filesystems). * NullFS gets confused when you hit a mount point in the underlying filesystem. * Only UFS and NFS have been well tested * NFS is not properly timing out namecache entries, causing changes made on the server to not be properly detected on the client if the client already has a negative-cache hit for the filename in question. Testing-by: David Rhodus <sdrhodus@gmail.com>, Peter Kadau <peter.kadau@tuebingen.mpg.de>, walt <wa1ter@myrealbox.com>, others show more ...
# 21739618	30-Sep-2004	Matthew Dillon <dillon@dragonflybsd.org>	VFS messaging/interfacing work stage 7/99. BEGIN DESTABILIZATION! Implement the infrastructure required to allow us to begin switching to the new nlookup() VFS API. filedesc->fd_ncdir, fd_nrdir, VFS messaging/interfacing work stage 7/99. BEGIN DESTABILIZATION! Implement the infrastructure required to allow us to begin switching to the new nlookup() VFS API. filedesc->fd_ncdir, fd_nrdir, fd_njdir File descriptors (associated with processes) now record the namecache pointer related to the current directory, root directory, and jail directory, in addition to the vnode pointers. These pointers are used as the basis for the new path lookup code (nlookup() and friends). file->f_ncp File pointers may now have a referenced+unlocked namecache pointer associated with them. All fp's representing directories have this attached. This allows fchdir() to properly record the ncp in fdp->fd_ncdir and friends. mount->mnt_ncp The namecache topology for crossing a mount point works as follows: when looking up a path element which is a mount point, cache_nlookup() will locate the ncp for the vnode-under the mount point. mount->mnt_ncp represents the root of the mount, that is the vnode-over. nlookup() detects the mount point and accesses mount->mnt_ncp to skip past the vnode-under. When going backwards (..), nlookup() detects the case and skips backwards. The ncp linkages are: ncp->ncp->ncp[vnode_under]->ncp[vnode_over]. That is, when going forwards or backwards nlookup must explicitly skip over the double-ncp when crossing a mount point. This allows us to keep the namecache topology intact across mount points. NEW CACHE level API functions: cache_get() Reference and lock a namecache entry cache_put() Dereference and unlock a namecache entry cache_lock() lock an already-referenced namecache entry cache_unlock() unlock a lockednamecache entry NOTE: namecache locks are exclusive and recursive. These are the 'namespace' locks that we will be using to guarentee namespace operations such as in a CREATE, RENAME, or REMOVE. vfs_cache_setroot() Set the new system-wide root directory cache_allocroot() System bootstrap helper function to allocate the root namecache node. cache_resolve() Resolve a NCF_UNRESOLVED namecache node. The namecache node should be locked on call. cache_setvp() (resolver) associate a VP or create a negative cache entry representation for a namecache pointer and clear NCF_UNRESOLVED. The namecache node should be locked on call. cache_setunresolved() Revert a resolved namecache entry back to an unresolved state, disassociating any vnode but leaving the topology intact. The namecache node should be locked on call. cache_vget() Obtain the locked+refd vnode related to a namecache entry, resolving the entry if necessary. Return ENOENT if the entry represents a negative cache hit. cache_vref() Obtained a refd (not locked) vnode related to a namecache entry, as above. cache_nlookup() The new namecache lookup routine. This routine does a lookup and allocates a new namecache node (into an unresolved state) if necessary. Returns a namecache record whether or not the item can be found and whether or not it represents a positive or negative hit. cache_lookup() OLD API CODE DEPRECATED, but must be maintained until everything has been converted over. cache_enter() OLD API CODE DEPRECATED, but must be maintained until everything has been converted over. NEW default VOPs vop_noresolve() Implements a namecache resolver for VFSs which are still using the old VOP_LOOKUP/ VOP_CACHEDLOOKUP API (which is all of them still). VOP_LOOKUP OLD API CODE DEPRECATED, but must be maintained until everything has been converted over. VOP_CACHEDLOOKUP OLD API CODE DEPRECATED, but must be maintained until everything has been converted over. NEW PATHNAME LOOKUP CODE nlookup_init() Similar to NDINIT, initialize a nlookupdata structure for nlookup() and nlookup_done(). nlookup() Lookup a path. Unlike the old namei/lookup code the new lookup code does not do any fancy pre-disposition of the cache for create/delete, it simply looks up the requested path and returns the appropriate locked namecache pointer. The caller can obtain the vnode and directory vnode, as applicable, from the one namecache structure that is returned. Access checks are done on directories leading up to the result but not done on the returned namecache node. nlookup_done() Mandatory routine to cleanup a nlookupdata structure after it has been initialized and all operations have been completed on it. nlookup_simple() (in progress) all-in-one wrapped new lookup. nlookup_mp() helper call for resolving a mount point's glue NCP. hackish, will be cleaned up later. nreadsymlink() helper call to resolve a symlink. Note that the namecache does not yet cache symlink data but the intention is to eventually do so to avoid having to do VFS ops to get the data. naccess() Perform access checks on a namecache node given a mode and cred. naccess_va() Perform access cheks on a vattr given a mode and cred. Begin switching VFS operations from using namei to using nlookup. In this batch: * mount (install mnt_ncp for cross-mount-point handling in nlookup, simplify the vfs_mount() API to no longer pass a nameidata structure) * [l]stat (use nlookup) * [f]chdir (use nlookup, use recorded f_ncp) * [f]chroot (use nlookup, use recorded f_ncp) show more ...
# 3446c007	28-Aug-2004	Matthew Dillon <dillon@dragonflybsd.org>	VFS messaging/interfacing work stage 4/99. This stage goes a long ways towards allowing us to move the vnode locking into a kernel layer. It gets rid of a lot of cruft from FreeBSD-4. FreeBSD-5 ha VFS messaging/interfacing work stage 4/99. This stage goes a long ways towards allowing us to move the vnode locking into a kernel layer. It gets rid of a lot of cruft from FreeBSD-4. FreeBSD-5 has done some of this stuff too (such as changing the default locking to stdlock from nolock), but DragonFly is going further. * Consolidate vnode locks into the vnode structure, add an embedded v_lock, and getting rid of both v_vnlock and v_data based head-of-structure locks. * Change the default vops to use a standard vnode lock rather then a fake non-lock. * Get rid of vop_nolock() and friends, we no longer support non-locking vnodes. * Get rid of vop_sharedlock(), we no longer support non standard shared-only locks (only NFS was using it and the mount-crossing lookup code should now prevent races to root from dead NFS volumes). * Integrate lock initialization into getnewvnode(). We do not yet incorporate automatically locking into getnewvnode(). getnewvnode() now has two additional arguments, lktimeout and lkflags, for lock structure initialization. * Change the sync vnode lock from nolock to stdlock. This may require more tuning down the line. Fix various sync_inactive() to properly unlock the lock as per the VOP API. * Properly flag the 'rename' vop operation regarding required tdvp and tvp unlocks (the flags are only used by nullfs). * Get rid of all inode-embedded vnode locks * Remove manual lockinit and use new getnewvnode() args instead. Lock the vnode prior to doing anything that might block in order to avoid synclist access before the vnode has been properly initialize. * Generally change inode hash insertion to also check for a hash collision and return failure if it occurs, rather then doing (often non-atomic) relookups and other checks. These sorts of collisions can occur if a vnode is being destroyed at the same time a new vnode is being created from an inode. A new vnode is not generally accessible, except by the sync code (from the mountlist) until it's underlying inode has been hashed so dealing with a hash collision should be as simple as throwing away the vnode with a vput(). * Do not initialize a new vnode's v_data until after the associated inode has been successfully added to the hash, and make the xxx_inactive() and xxx_reclaim() code friendly towards vnodes with a NULL v_data. * NFS now uses standard locks rather then shared-only locks. * PROCFS now uses standard locks rather then non-locks, and PROCFS's lookup code now understands VOP lookup semantics. PROCFS now uses a real hash table for its node search rather then a single singly-linked list (which should better scale to systems with thousands of processes). * NULLFS should now properly handle lookup() and rename() locks. NULLFS's node handling code has been rewritten. NULLFS's bypass code now understands vnode unlocks (rename case). * UFS no longer needs the ffs_inode_hash_lock hacks. It now uses the new collision-on-hash-add methodology. This will speed up UFS when operating on lots of small files (reported by David Rhodus). show more ...
# 0961aa92	17-Aug-2004	Matthew Dillon <dillon@dragonflybsd.org>	VFS messaging/interfacing work stage 2/99. This stage retools the vnode ops vector dispatch, making the vop_ops a per-mount structure rather then a per-filesystem structure. Filesystem mount code, VFS messaging/interfacing work stage 2/99. This stage retools the vnode ops vector dispatch, making the vop_ops a per-mount structure rather then a per-filesystem structure. Filesystem mount code, typically in blah_vfsops.c, must now register various vop_ops pointers in the struct mount to compile its VOP operations set. This change will allow us to begin adding per-mount hooks to VFSes to support things like kernel-level journaling, various forms of cache coherency management, and so forth. In addition, the vop_() calls now require a struct vop_ops pointer as the first argument instead of a vnode pointer (note: in this commit the VOP_() macros currently just pull the vop_ops pointer from the vnode in order to call the vop_*() procedures). This change is intended to allow us to divorce ourselves from the requirement that a vnode pointer always be part of a VOP call. In particular, this will allow namespace based routines such as remove(), mkdir(), stat(), and so forth to pass namecache pointers rather then locked vnodes and is a very important precursor to the goal of using the namecache for namespace locking. show more ...
# 2d3e977e	13-Aug-2004	Matthew Dillon <dillon@dragonflybsd.org>	VFS messaging/interfacing work stage 1/99. This stage replaces the old dynamic VFS descriptor and inlined wrapper mess with a fixed structure and fixed procedural wrappers. Most of the work is stra VFS messaging/interfacing work stage 1/99. This stage replaces the old dynamic VFS descriptor and inlined wrapper mess with a fixed structure and fixed procedural wrappers. Most of the work is straightforward except for vfs_init, which was basically rewritten (and greatly simplified). It is my intention to make the vop_*() call wrappers eventually handle range locking and cache coherency issues as well as implementing the direct call -> messaging interface layer. The call wrappers will also API translation as we shift the APIs over to new, more powerful mechanisms in order to allow the work to be incrementally committed. This is the first stage of what is likely to be a huge number of stages to modernize the VFS subsystem. show more ...
# 715f92b6	26-May-2004	Matthew Dillon <dillon@dragonflybsd.org>	count_udev() was being called with the wrong argument. Submitted-by: Hiten Pandya <hmp@backplane.com>
# e4c9c0c8	19-May-2004	Matthew Dillon <dillon@dragonflybsd.org>	Device layer rollup commit. * cdevsw_add() is now required. cdevsw_add() and cdevsw_remove() may specify a mask/match indicating the range of supported minor numbers. Multiple cdevsw_add()'s u Device layer rollup commit. * cdevsw_add() is now required. cdevsw_add() and cdevsw_remove() may specify a mask/match indicating the range of supported minor numbers. Multiple cdevsw_add()'s using the same major number, but distinctly different ranges, may be issued. All devices that failed to call cdevsw_add() before now do. * cdevsw_remove() now automatically marks all devices within its supported range as being destroyed. * vnode->v_rdev is no longer resolved when the vnode is created. Instead, only v_udev (a newly added field) is resolved. v_rdev is resolved when the vnode is opened and cleared on the last close. * A great deal of code was making rather dubious assumptions with regards to the validity of devices associated with vnodes, primarily due to the persistence of a device structure due to being indexed by (major, minor) instead of by (cdevsw, major, minor). In particular, if you run a program which connects to a USB device and then you pull the USB device and plug it back in, the vnode subsystem will continue to believe that the device is open when, in fact, it isn't (because it was destroyed and recreated). In particular, note that all the VFS mount procedures now check devices via v_udev instead of v_rdev prior to calling VOP_OPEN(), since v_rdev is NULL prior to the first open. * The disk layer's device interaction has been rewritten. The disk layer (i.e. the slice and disklabel management layer) no longer overloads its data onto the device structure representing the underlying physical disk. Instead, the disk layer uses the new cdevsw_add() functionality to register its own cdevsw using the underlying device's major number, and simply does NOT register the underlying device's cdevsw. No confusion is created because the device hash is now based on (cdevsw,major,minor) rather then (major,minor). NOTE: This also means that underlying raw disk devices may use the entire device minor number instead of having to reserve the bits used by the disk layer, and also means that can we (theoretically) stack a fully disklabel-supported 'disk' on top of any block device. * The new reference counting scheme prevents this by associating a device with a cdevsw and disconnecting the device from its cdevsw when the cdevsw is removed. Additionally, all udev2dev() lookups run through the cdevsw mask/match and only successfully find devices still associated with an active cdevsw. * Major work on MFS: MFS no longer shortcuts vnode and device creation. It now creates a real vnode and a real device and implements real open and close VOPs. Additionally, due to the disk layer changes, MFS is no longer limited to 255 mounts. The new limit is 16 million. Since MFS creates a real device node, mount_mfs will now create a real /dev/mfs<PID> device that can be read from userland (e.g. so you can dump an MFS filesystem). * BUF AND DEVICE STRATEGY changes. The struct buf contains a b_dev field. In order to properly handle stacked devices we now require that the b_dev field be initialized before the device strategy routine is called. This required some additional work in various VFS implementations. To enforce this requirement, biodone() now sets b_dev to NODEV. The new disk layer will adjust b_dev before forwarding a request to the actual physical device. * A bug in the ISO CD boot sequence which resulted in a panic has been fixed. Testing by: lots of people, but David Rhodus found the most aggregious bugs. show more ...
# 597aea93	24-Apr-2004	David Rhodus <drhodus@dragonflybsd.org>	Remove the VREF() macro and uses of it. Remove uses of 0x20 before ^I inside vnode.h
123