1*433d6423SLionel SambucDevelopment notes regarding VND. Original document by David van Moolenbroek. 2*433d6423SLionel Sambuc 3*433d6423SLionel Sambuc 4*433d6423SLionel SambucDESIGN DECISIONS 5*433d6423SLionel Sambuc 6*433d6423SLionel SambucAs simple as the VND driver implementation looks, several important decisions 7*433d6423SLionel Sambuchad to be made in the design process. These decisions are listed here. 8*433d6423SLionel Sambuc 9*433d6423SLionel SambucMultiple instances instead of a single instance: The decision to spawn a 10*433d6423SLionel Sambucseparate driver instance for each VND unit was not ideologically inspired, but 11*433d6423SLionel Sambucrather based on a practical issue. Namely, users may reasonably expect to be 12*433d6423SLionel Sambucable to set up a VND using a backing file that resides on a file system hosted 13*433d6423SLionel Sambucon another VND. If one single driver instance were to host both VND units, its 14*433d6423SLionel Sambucimplementation would have to perform all its backcalls to VFS asynchronously, 15*433d6423SLionel Sambucso as to be able to process another incoming request that was initiated as part 16*433d6423SLionel Sambucof such an ongoing backcall. As of writing, MINIX3 does not support any form of 17*433d6423SLionel Sambucasynchronous I/O, but this would not even be sufficient: the asynchrony would 18*433d6423SLionel Sambuchave to extend even to the close(2) call that takes place during device 19*433d6423SLionel Sambucunconfiguration, as this call could spark I/O to another VND device. 20*433d6423SLionel SambucUltimately, using one driver instance per VND unit avoids these complications 21*433d6423SLionel Sambucaltogether, thus making nesting possible with a maximum depth of the number of 22*433d6423SLionel SambucVFS threads. Of course, this comes at the cost of having more VND driver 23*433d6423SLionel Sambucprocesses; in order to avoid this cost in the common case, driver instances are 24*433d6423SLionel Sambucdynamically started and stopped by vndconfig(8). 25*433d6423SLionel Sambuc 26*433d6423SLionel Sambuccopyfd(2) instead of openas(2): Compared to the NetBSD interface, the MINIX3 27*433d6423SLionel SambucVND API requires that the user program configuring a device pass in a file 28*433d6423SLionel Sambucdescriptor in the vnd_ioctl structure instead of a pointer to a path name. 29*433d6423SLionel SambucWhile binary compatibility with NetBSD would be impossible anyway (MINIX3 can 30*433d6423SLionel Sambucnot support pointers in IOCTL data structures), providing a path name buffer 31*433d6423SLionel Sambucwould be closer to what NetBSD does. There are two reasons behind the choice to 32*433d6423SLionel Sambucpass in a file descriptor instead. First, performing an open(2)-like call as 33*433d6423SLionel Sambuca driver backcall is tricky in terms of avoiding deadlocks in VFS, since it 34*433d6423SLionel Sambucwould by nature violate the VFS locking order. On top of that, special 35*433d6423SLionel Sambucprovisions would have to be added to support opening a file in the context of 36*433d6423SLionel Sambucanother process so that chrooted processes would be supported, for example. 37*433d6423SLionel SambucIn contrast, copying a file descriptor to a remote process is relatively easy 38*433d6423SLionel Sambucbecause there is only one potential deadlock case to cover - that of the given 39*433d6423SLionel Sambucfile descriptor identifying the VFS filp object used to control the very same 40*433d6423SLionel Sambucdevice - and VFS need only implement a procedure that very much resembles 41*433d6423SLionel Sambucsending a file descriptor across a UNIX domain socket. Second, since passing a 42*433d6423SLionel Sambucfile descriptor is effectively passing an object capability, it is easier to 43*433d6423SLionel Sambucimprove the isolation of the VND drivers in the future, as described below. 44*433d6423SLionel Sambuc 45*433d6423SLionel SambucNo separate control device: The driver uses the same minor (block) device for 46*433d6423SLionel Sambucconfiguration and for actual (whole-disk) I/O, instead of exposing a separate 47*433d6423SLionel Sambucdevice that exists only for the purpose of configuring the device. The reason 48*433d6423SLionel Sambucfor this is that such a control device simply does not fit the NetBSD 49*433d6423SLionel Sambucopendisk(3) API. While MINIX3 may at some point implement support for NetBSD's 50*433d6423SLionel Sambucnotion of raw devices, such raw devices are still expected to support I/O, and 51*433d6423SLionel Sambucthat means they cannot be control-only. In this regard, it should be mentioned 52*433d6423SLionel Sambucthat the entire VND infrastructure relies on block caches being invalidated 53*433d6423SLionel Sambucproperly upon (un)configuration of VND units, and that such invalidation 54*433d6423SLionel Sambuc(through the REQ_FLUSH file system request) is currently initiated only by 55*433d6423SLionel Sambucclosing block devices. Support for configuration or I/O through character 56*433d6423SLionel Sambucdevices would thus require more work on that side first. In any case, the 57*433d6423SLionel Sambucprimary downside of not having a separate control device is that handling 58*433d6423SLionel Sambucaccess permissions on device open is a bit of a hack in order to keep the 59*433d6423SLionel SambucMINIX3 userland happy. 60*433d6423SLionel Sambuc 61*433d6423SLionel Sambuc 62*433d6423SLionel SambucFUTURE IMPROVEMENTS 63*433d6423SLionel Sambuc 64*433d6423SLionel SambucCurrently, the VND driver instances are run as root just and only because the 65*433d6423SLionel Sambuccopyfd(2) call requires root. Obviously, nonroot user processes should never 66*433d6423SLionel Sambucbe able to copy file descriptors from arbitrary processes, and thus, some 67*433d6423SLionel Sambucsecurity check is required there. However, an access control list for VFS calls 68*433d6423SLionel Sambucwould be a much better solution: in that case, VND driver processes can be 69*433d6423SLionel Sambucgiven exclusive rights to the use of the copyfd(2) call, while they can be 70*433d6423SLionel Sambucgiven a normal driver UID at the same time. 71*433d6423SLionel Sambuc 72*433d6423SLionel SambucIn MINIX3's dependability model, drivers are generally not considered to be 73*433d6423SLionel Sambucmalicious. However, the VND case is interesting because it is possible to 74*433d6423SLionel Sambucisolate individual driver instances to the point of actual "least authority". 75*433d6423SLionel SambucThe copyfd(2) call currently allows any file descriptor to be copied, but it 76*433d6423SLionel Sambucwould be possible to extend the scheme to let user processes (and vndconfig(8) 77*433d6423SLionel Sambucin particular) mark the file descriptors that may be the target of a copyfd(2) 78*433d6423SLionel Sambuccall. One of several schemes may be implemented in VFS for this purpose. For 79*433d6423SLionel Sambucexample, each process could be allowed to mark one of its file descriptors as 80*433d6423SLionel Sambuc"copyable" using a new VFS call, and VFS would then allow copyfd(2) only on a 81*433d6423SLionel Sambuc"copyable" file descriptor from a process blocked on a call to the driver that 82*433d6423SLionel Sambucinvoked copyfd(2). This approach precludes hiding a VND driver behind a RAID 83*433d6423SLionel Sambucor FBD (etc) driver, but more sophisticated approaches can solve that as well. 84*433d6423SLionel SambucRegardless of the scheme, the end result would be a situation where the VND 85*433d6423SLionel Sambucdrivers are strictly limited to operating on the resources given to them. 86*433d6423SLionel Sambuc 87*433d6423SLionel SambucNote that copyfd(2) was originally called dupfrom(2), and then extended to copy 88*433d6423SLionel Sambucfile descriptors *to* remote processes as well. The latter is not as security 89*433d6423SLionel Sambucsensitive, but may have to be restricted in a similar way. If this is not 90*433d6423SLionel Sambucpossible, copyfd(2) can always be split into multiple calls. 91