1*21a90458SMatthew Dillon 2e2163f5bSMatthew Dillon* Need backend synchronization / serialization when the frontend detaches 3e2163f5bSMatthew Dillon a XOP. modify_tid tests won't be enough, the backend may wind up executing 4e2163f5bSMatthew Dillon the XOP out of order after the detach. 5159c3ca2SMatthew Dillon 68cd26e36SMatthew Dillon* xop_start - only start synchronized elements 78cd26e36SMatthew Dillon 812ff971cSMatthew Dillon* See if we can remove hammer2_inode_repoint() 912ff971cSMatthew Dillon 10d34788efSMatthew Dillon* FIXME - logical buffer associated with write-in-progress on backend 11d34788efSMatthew Dillon disappears once the cluster validates, even if more backend nodes 12d34788efSMatthew Dillon are in progress. 13d34788efSMatthew Dillon 14d34788efSMatthew Dillon* FIXME - backend ops need per-node transactions using spmp to protect 15d34788efSMatthew Dillon against flush. 16d34788efSMatthew Dillon 17d34788efSMatthew Dillon* FIXME - modifying backend ops are not currently validating the cluster. 18d34788efSMatthew Dillon That probably needs to be done by the frontend in hammer2_xop_start() 19d34788efSMatthew Dillon 20d34788efSMatthew Dillon* modify_tid handling probably broken w/ the XOP code for the moment. 21d34788efSMatthew Dillon 22c603b86bSMatthew Dillon* embedded transactions in XOPs - interlock early completion 23c603b86bSMatthew Dillon 24c603b86bSMatthew Dillon* remove current incarnation of EAGAIN 25c603b86bSMatthew Dillon 26c847e838SMatthew Dillon* mtx locks should not track td_locks count?. They can be acquired by one 27c847e838SMatthew Dillon thread and released by another. Need API function for exclusive locks. 28c847e838SMatthew Dillon 29159c3ca2SMatthew Dillon* Convert xops and hammer2_update_spans() from cluster back into chain calls 30159c3ca2SMatthew Dillon 31e513e77eSMatthew Dillon* syncthr leaves inode locks for entire sync, which is wrong. 32e513e77eSMatthew Dillon 335ceaaa82SMatthew Dillon* recovery scan vs unmount. At the moment an unmount does its flushes, 345ceaaa82SMatthew Dillon and if successful the freemap will be fully up-to-date, but the mount 355ceaaa82SMatthew Dillon code doesn't know that and the last flush batch will probably match 365ceaaa82SMatthew Dillon the PFS root mirror_tid. If it was a large cpdup the (unnecessary) 375ceaaa82SMatthew Dillon recovery pass at mount time can be extensive. Add a CLEAN flag to the 385ceaaa82SMatthew Dillon volume header to optimize out the unnecessary recovery pass. 395ceaaa82SMatthew Dillon 40e513e77eSMatthew Dillon* More complex transaction sequencing and flush merging. Right now it is 41e513e77eSMatthew Dillon all serialized against flushes. 42e513e77eSMatthew Dillon 435ceaaa82SMatthew Dillon* adding new pfs - freeze and force remaster 445ceaaa82SMatthew Dillon 455ceaaa82SMatthew Dillon* removing a pfs - freeze and force remaster 465ceaaa82SMatthew Dillon 47464659a3SMatthew Dillon* bulkfree - sync between passes and enforce serialization of operation 48464659a3SMatthew Dillon 49464659a3SMatthew Dillon* bulkfree - signal check, allow interrupt 50464659a3SMatthew Dillon 51464659a3SMatthew Dillon* bulkfree - sub-passes when kernel memory block isn't large enough 52464659a3SMatthew Dillon 53464659a3SMatthew Dillon* bulkfree - limit kernel memory allocation for bmap space 54464659a3SMatthew Dillon 55464659a3SMatthew Dillon* bulkfree - must include any detached vnodes in scan so open unlinked files 56464659a3SMatthew Dillon are not ripped out from under the system. 57464659a3SMatthew Dillon 58464659a3SMatthew Dillon* bulkfree - must include all volume headers in scan so they can be used 59464659a3SMatthew Dillon for recovery or automatic snapshot retrieval. 60464659a3SMatthew Dillon 61464659a3SMatthew Dillon* bulkfree - snapshot duplicate sub-tree cache and tests needed to reduce 62464659a3SMatthew Dillon unnecessary re-scans. 63464659a3SMatthew Dillon 64e07becf8SMatthew Dillon* Currently the check code (bref.methods / crc, sha, etc) is being checked 65e07becf8SMatthew Dillon every single blasted time a chain is locked, even if the underlying buffer 66e07becf8SMatthew Dillon was previously checked for that chain. This needs an optimization to 67e07becf8SMatthew Dillon (significantly) improve performance. 68e07becf8SMatthew Dillon 69da6f36f4SMatthew Dillon* flush synchronization boundary crossing check and current flush chain 70da6f36f4SMatthew Dillon interlock needed. 7150456506SMatthew Dillon 7250456506SMatthew Dillon* snapshot creation must allocate and separately pass a new pmp for the pfs 7350456506SMatthew Dillon degenerate 'cluster' representing the snapshot. This theoretically will 7450456506SMatthew Dillon also allow a snapshot to be generated inside a cluster of more than one 7550456506SMatthew Dillon node. 7650456506SMatthew Dillon 7750456506SMatthew Dillon* snapshot copy currently also copies uuids and can confuse cluster code 7850456506SMatthew Dillon 7958e43599SMatthew Dillon* hidden dir or other dirs/files/modifications made to PFS before 8058e43599SMatthew Dillon additional cluster entries added. 8158e43599SMatthew Dillon 82278ab2b2SMatthew Dillon* transaction on cluster - multiple trans structures, subtrans 83278ab2b2SMatthew Dillon 84278ab2b2SMatthew Dillon* inode always contains target cluster/chain, not hardlink 85278ab2b2SMatthew Dillon 86278ab2b2SMatthew Dillon* chain refs in cluster, cluster refs 87278ab2b2SMatthew Dillon 8872ebfa75SMatthew Dillon* check inode shared lock ... can end up in endless loop if following 8972ebfa75SMatthew Dillon hardlink because ip->chain is not updated in the exclusive lock cycle 9072ebfa75SMatthew Dillon when following hardlink. 9172ebfa75SMatthew Dillon 920924b3f8SMatthew Dilloncpdup /build/boomdata/jails/bleeding-edge/usr/share/man/man4 /mnt/x3 930924b3f8SMatthew Dillon 94623d43d4SMatthew Dillon 95623d43d4SMatthew Dillon * The block freeing code. At the very least a bulk scan is needed 96623d43d4SMatthew Dillon to implement freeing blocks. 97623d43d4SMatthew Dillon 98623d43d4SMatthew Dillon * Crash stability. Right now the allocation table on-media is not 99623d43d4SMatthew Dillon properly synchronized with the flush. This needs to be adjusted 100623d43d4SMatthew Dillon such that H2 can do an incremental scan on mount to fixup 101623d43d4SMatthew Dillon allocations on mount as part of its crash recovery mechanism. 102623d43d4SMatthew Dillon 103623d43d4SMatthew Dillon * We actually have to start checking and acting upon the CRCs being 104623d43d4SMatthew Dillon generated. 105623d43d4SMatthew Dillon 106623d43d4SMatthew Dillon * Remaining known hardlink issues need to be addressed. 107623d43d4SMatthew Dillon 108623d43d4SMatthew Dillon * Core 'copies' mechanism needs to be implemented to support multiple 109623d43d4SMatthew Dillon copies on the same media. 110623d43d4SMatthew Dillon 111623d43d4SMatthew Dillon * Core clustering mechanism needs to be implemented to support 112623d43d4SMatthew Dillon mirroring and basic multi-master operation from a single host 113623d43d4SMatthew Dillon (multi-host requires additional network protocols and won't 114623d43d4SMatthew Dillon be as easy). 115623d43d4SMatthew Dillon 116fdf62707SMatthew Dillon* make sure we aren't using a shared lock during RB_SCAN's? 117fdf62707SMatthew Dillon 11891abd410SMatthew Dillon* overwrite in write_file case w/compression - if device block size changes 11991abd410SMatthew Dillon the block has to be deleted and reallocated. See hammer2_assign_physical() 12091abd410SMatthew Dillon in vnops. 12191abd410SMatthew Dillon 1221a7cfe5aSMatthew Dillon* freemap / clustering. Set block size on 2MB boundary so the cluster code 1231a7cfe5aSMatthew Dillon can be used for reading. 1241a7cfe5aSMatthew Dillon 1251a7cfe5aSMatthew Dillon* need API layer for shared buffers (unfortunately). 1261a7cfe5aSMatthew Dillon 127731b2a84SMatthew Dillon* add magic number to inode header, add parent inode number too, to 128731b2a84SMatthew Dillon help with brute-force recovery. 129731b2a84SMatthew Dillon 130731b2a84SMatthew Dillon* modifications past our flush point do not adjust vchain. 131731b2a84SMatthew Dillon need to make vchain dynamic so we can (see flush_scan2).?? 132731b2a84SMatthew Dillon 1331a7cfe5aSMatthew Dillon* MINIOSIZE/RADIX set to 1KB for now to avoid buffer cache deadlocks 1341a7cfe5aSMatthew Dillon on multiple locked inodes. Fix so we can use LBUFSIZE! Or, 1351a7cfe5aSMatthew Dillon alternatively, allow a smaller I/O size based on the sector size 1361a7cfe5aSMatthew Dillon (not optimal though). 1371a7cfe5aSMatthew Dillon 138a864c5d9SMatthew Dillon* When making a snapshot, do not allow the snapshot to be mounted until 139a864c5d9SMatthew Dillon the in-memory chain has been freed in order to break the shared core. 140a864c5d9SMatthew Dillon 141a864c5d9SMatthew Dillon* Snapshotting a sub-directory does not snapshot any 142a864c5d9SMatthew Dillon parent-directory-spanning hardlinks. 143a864c5d9SMatthew Dillon 144731b2a84SMatthew Dillon* Snapshot / flush-synchronization point. remodified data that crosses 145731b2a84SMatthew Dillon the synchronization boundary is not currently reallocated. see 146731b2a84SMatthew Dillon hammer2_chain_modify(), explicit check (requires logical buffer cache 147731b2a84SMatthew Dillon buffer handling). 148731b2a84SMatthew Dillon 14951bf8e9bSMatthew Dillon* on fresh mount with multiple hardlinks present separate lookups will 15051bf8e9bSMatthew Dillon result in separate vnodes pointing to separate inodes pointing to a 15151bf8e9bSMatthew Dillon common chain (the hardlink target). 15251bf8e9bSMatthew Dillon 15351bf8e9bSMatthew Dillon When the hardlink target consolidates upward only one vp/ip will be 15451bf8e9bSMatthew Dillon adjusted. We need code to fixup the other chains (probably put in 15551bf8e9bSMatthew Dillon inode_lock_*()) which will be pointing to an older deleted hardlink 15651bf8e9bSMatthew Dillon target. 15751bf8e9bSMatthew Dillon 15832b800e6SMatthew Dillon* Filesystem must ensure that modify_tid is not too large relative to 15932b800e6SMatthew Dillon the iterator in the volume header, on load, or flush sequencing will 16032b800e6SMatthew Dillon not work properly. We should be able to just override it, but we 16132b800e6SMatthew Dillon should complain if it happens. 16232b800e6SMatthew Dillon 1638c280d5dSMatthew Dillon* Kernel-side needs to clean up transaction queues and make appropriate 1648c280d5dSMatthew Dillon callbacks. 1658c280d5dSMatthew Dillon 1668c280d5dSMatthew Dillon* Userland side needs to do the same for any initiated transactions. 1678c280d5dSMatthew Dillon 168222d9e22SMatthew Dillon* Nesting problems in the flusher. 169222d9e22SMatthew Dillon 17001eabad4SMatthew Dillon* Inefficient vfsync due to thousands of file buffers, one per-vnode. 17101eabad4SMatthew Dillon (need to aggregate using a device buffer?) 17201eabad4SMatthew Dillon 1738cce658dSMatthew Dillon* Use bp->b_dep to interlock the buffer with the chain structure so the 1748cce658dSMatthew Dillon strategy code can calculate the crc and assert that the chain is marked 1758cce658dSMatthew Dillon modified (not yet flushed). 1768cce658dSMatthew Dillon 1778cce658dSMatthew Dillon* Deleted inode not reachable via tree for volume flush but still reachable 1788cce658dSMatthew Dillon via fsync/inactive/reclaim. Its tree can be destroyed at that point. 1798cce658dSMatthew Dillon 180866d5273SMatthew Dillon* The direct write code needs to invalidate any underlying physical buffers. 181866d5273SMatthew Dillon Direct write needs to be implemented. 182866d5273SMatthew Dillon 183866d5273SMatthew Dillon* Make sure a resized block (hammer2_chain_resize()) calculates a new 184222d9e22SMatthew Dillon hash code in the parent bref 185866d5273SMatthew Dillon 186995e78dcSMatthew Dillon* The freemap allocator needs to getblk/clrbuf/bdwrite any partial 187995e78dcSMatthew Dillon block allocations (less than 64KB) that allocate out of a new 64K 188995e78dcSMatthew Dillon block, to avoid causing a read-before-write I/O. 189995e78dcSMatthew Dillon 190995e78dcSMatthew Dillon* Check flush race upward recursion setting SUBMODIFIED vs downward 191995e78dcSMatthew Dillon recursion checking SUBMODIFIED then locking (must clear before the 192995e78dcSMatthew Dillon recursion and might need additional synchronization) 193995e78dcSMatthew Dillon 194db0c2eb3SMatthew Dillon* There is definitely a flush race in the hardlink implementation between 195db0c2eb3SMatthew Dillon the forwarding entries and the actual (hidden) hardlink inode. 196db0c2eb3SMatthew Dillon 197db0c2eb3SMatthew Dillon This will require us to associate a small hard-link-adjust structure 198db0c2eb3SMatthew Dillon with the chain whenever we create or delete hardlinks, on top of 199db0c2eb3SMatthew Dillon adjusting the hardlink inode itself. Any actual flush to the media 200db0c2eb3SMatthew Dillon has to synchronize the correct nlinks value based on whether related 201db0c2eb3SMatthew Dillon created or deleted hardlinks were also flushed. 202db0c2eb3SMatthew Dillon 203995e78dcSMatthew Dillon* When a directory entry is created and also if an indirect block is 204995e78dcSMatthew Dillon created and entries moved into it, the directory seek position can 205995e78dcSMatthew Dillon potentially become incorrect during a scan. 206995e78dcSMatthew Dillon 207995e78dcSMatthew Dillon* When a directory entry is deleted a directory seek position depending 208995e78dcSMatthew Dillon on that key can cause readdir to skip entries. 209db0c2eb3SMatthew Dillon 21073e441b9SMatthew Dillon* TWO PHASE COMMIT - store two data offsets in the chain, and 21173e441b9SMatthew Dillon hammer2_chain_delete() needs to leave the chain intact if MODIFIED2 is 21273e441b9SMatthew Dillon set on its buffer until the flusher gets to it? 21373e441b9SMatthew Dillon 214db0c2eb3SMatthew Dillon 215db0c2eb3SMatthew Dillon OPTIMIZATIONS 216db0c2eb3SMatthew Dillon 217db0c2eb3SMatthew Dillon* If a file is unlinked buts its descriptors is left open and used, we 218db0c2eb3SMatthew Dillon should allow data blocks on-media to be reused since there is no 219db0c2eb3SMatthew Dillon topology left to point at them. 220