xref: /dflybsd-src/sys/vfs/hammer2/TODO (revision 65cacacfb9005fe8dd541b0e830c2466c0ab8453)
1*21a90458SMatthew Dillon
2e2163f5bSMatthew Dillon* Need backend synchronization / serialization when the frontend detaches
3e2163f5bSMatthew Dillon  a XOP.  modify_tid tests won't be enough, the backend may wind up executing
4e2163f5bSMatthew Dillon  the XOP out of order after the detach.
5159c3ca2SMatthew Dillon
68cd26e36SMatthew Dillon* xop_start - only start synchronized elements
78cd26e36SMatthew Dillon
812ff971cSMatthew Dillon* See if we can remove hammer2_inode_repoint()
912ff971cSMatthew Dillon
10d34788efSMatthew Dillon* FIXME - logical buffer associated with write-in-progress on backend
11d34788efSMatthew Dillon  disappears once the cluster validates, even if more backend nodes
12d34788efSMatthew Dillon  are in progress.
13d34788efSMatthew Dillon
14d34788efSMatthew Dillon* FIXME - backend ops need per-node transactions using spmp to protect
15d34788efSMatthew Dillon  against flush.
16d34788efSMatthew Dillon
17d34788efSMatthew Dillon* FIXME - modifying backend ops are not currently validating the cluster.
18d34788efSMatthew Dillon  That probably needs to be done by the frontend in hammer2_xop_start()
19d34788efSMatthew Dillon
20d34788efSMatthew Dillon* modify_tid handling probably broken w/ the XOP code for the moment.
21d34788efSMatthew Dillon
22c603b86bSMatthew Dillon* embedded transactions in XOPs - interlock early completion
23c603b86bSMatthew Dillon
24c603b86bSMatthew Dillon* remove current incarnation of EAGAIN
25c603b86bSMatthew Dillon
26c847e838SMatthew Dillon* mtx locks should not track td_locks count?.  They can be acquired by one
27c847e838SMatthew Dillon  thread and released by another.  Need API function for exclusive locks.
28c847e838SMatthew Dillon
29159c3ca2SMatthew Dillon* Convert xops and hammer2_update_spans() from cluster back into chain calls
30159c3ca2SMatthew Dillon
31e513e77eSMatthew Dillon* syncthr leaves inode locks for entire sync, which is wrong.
32e513e77eSMatthew Dillon
335ceaaa82SMatthew Dillon* recovery scan vs unmount.  At the moment an unmount does its flushes,
345ceaaa82SMatthew Dillon  and if successful the freemap will be fully up-to-date, but the mount
355ceaaa82SMatthew Dillon  code doesn't know that and the last flush batch will probably match
365ceaaa82SMatthew Dillon  the PFS root mirror_tid.  If it was a large cpdup the (unnecessary)
375ceaaa82SMatthew Dillon  recovery pass at mount time can be extensive.  Add a CLEAN flag to the
385ceaaa82SMatthew Dillon  volume header to optimize out the unnecessary recovery pass.
395ceaaa82SMatthew Dillon
40e513e77eSMatthew Dillon* More complex transaction sequencing and flush merging.  Right now it is
41e513e77eSMatthew Dillon  all serialized against flushes.
42e513e77eSMatthew Dillon
435ceaaa82SMatthew Dillon* adding new pfs - freeze and force remaster
445ceaaa82SMatthew Dillon
455ceaaa82SMatthew Dillon* removing a pfs - freeze and force remaster
465ceaaa82SMatthew Dillon
47464659a3SMatthew Dillon* bulkfree - sync between passes and enforce serialization of operation
48464659a3SMatthew Dillon
49464659a3SMatthew Dillon* bulkfree - signal check, allow interrupt
50464659a3SMatthew Dillon
51464659a3SMatthew Dillon* bulkfree - sub-passes when kernel memory block isn't large enough
52464659a3SMatthew Dillon
53464659a3SMatthew Dillon* bulkfree - limit kernel memory allocation for bmap space
54464659a3SMatthew Dillon
55464659a3SMatthew Dillon* bulkfree - must include any detached vnodes in scan so open unlinked files
56464659a3SMatthew Dillon	     are not ripped out from under the system.
57464659a3SMatthew Dillon
58464659a3SMatthew Dillon* bulkfree - must include all volume headers in scan so they can be used
59464659a3SMatthew Dillon	     for recovery or automatic snapshot retrieval.
60464659a3SMatthew Dillon
61464659a3SMatthew Dillon* bulkfree - snapshot duplicate sub-tree cache and tests needed to reduce
62464659a3SMatthew Dillon	     unnecessary re-scans.
63464659a3SMatthew Dillon
64e07becf8SMatthew Dillon* Currently the check code (bref.methods / crc, sha, etc) is being checked
65e07becf8SMatthew Dillon  every single blasted time a chain is locked, even if the underlying buffer
66e07becf8SMatthew Dillon  was previously checked for that chain.  This needs an optimization to
67e07becf8SMatthew Dillon  (significantly) improve performance.
68e07becf8SMatthew Dillon
69da6f36f4SMatthew Dillon* flush synchronization boundary crossing check and current flush chain
70da6f36f4SMatthew Dillon  interlock needed.
7150456506SMatthew Dillon
7250456506SMatthew Dillon* snapshot creation must allocate and separately pass a new pmp for the pfs
7350456506SMatthew Dillon  degenerate 'cluster' representing the snapshot.  This theoretically will
7450456506SMatthew Dillon  also allow a snapshot to be generated inside a cluster of more than one
7550456506SMatthew Dillon  node.
7650456506SMatthew Dillon
7750456506SMatthew Dillon* snapshot copy currently also copies uuids and can confuse cluster code
7850456506SMatthew Dillon
7958e43599SMatthew Dillon* hidden dir or other dirs/files/modifications made to PFS before
8058e43599SMatthew Dillon  additional cluster entries added.
8158e43599SMatthew Dillon
82278ab2b2SMatthew Dillon* transaction on cluster - multiple trans structures, subtrans
83278ab2b2SMatthew Dillon
84278ab2b2SMatthew Dillon* inode always contains target cluster/chain, not hardlink
85278ab2b2SMatthew Dillon
86278ab2b2SMatthew Dillon* chain refs in cluster, cluster refs
87278ab2b2SMatthew Dillon
8872ebfa75SMatthew Dillon* check inode shared lock ... can end up in endless loop if following
8972ebfa75SMatthew Dillon  hardlink because ip->chain is not updated in the exclusive lock cycle
9072ebfa75SMatthew Dillon  when following hardlink.
9172ebfa75SMatthew Dillon
920924b3f8SMatthew Dilloncpdup /build/boomdata/jails/bleeding-edge/usr/share/man/man4 /mnt/x3
930924b3f8SMatthew Dillon
94623d43d4SMatthew Dillon
95623d43d4SMatthew Dillon        * The block freeing code.  At the very least a bulk scan is needed
96623d43d4SMatthew Dillon          to implement freeing blocks.
97623d43d4SMatthew Dillon
98623d43d4SMatthew Dillon        * Crash stability.  Right now the allocation table on-media is not
99623d43d4SMatthew Dillon          properly synchronized with the flush.  This needs to be adjusted
100623d43d4SMatthew Dillon          such that H2 can do an incremental scan on mount to fixup
101623d43d4SMatthew Dillon          allocations on mount as part of its crash recovery mechanism.
102623d43d4SMatthew Dillon
103623d43d4SMatthew Dillon        * We actually have to start checking and acting upon the CRCs being
104623d43d4SMatthew Dillon          generated.
105623d43d4SMatthew Dillon
106623d43d4SMatthew Dillon        * Remaining known hardlink issues need to be addressed.
107623d43d4SMatthew Dillon
108623d43d4SMatthew Dillon        * Core 'copies' mechanism needs to be implemented to support multiple
109623d43d4SMatthew Dillon          copies on the same media.
110623d43d4SMatthew Dillon
111623d43d4SMatthew Dillon        * Core clustering mechanism needs to be implemented to support
112623d43d4SMatthew Dillon          mirroring and basic multi-master operation from a single host
113623d43d4SMatthew Dillon          (multi-host requires additional network protocols and won't
114623d43d4SMatthew Dillon          be as easy).
115623d43d4SMatthew Dillon
116fdf62707SMatthew Dillon* make sure we aren't using a shared lock during RB_SCAN's?
117fdf62707SMatthew Dillon
11891abd410SMatthew Dillon* overwrite in write_file case w/compression - if device block size changes
11991abd410SMatthew Dillon  the block has to be deleted and reallocated.  See hammer2_assign_physical()
12091abd410SMatthew Dillon  in vnops.
12191abd410SMatthew Dillon
1221a7cfe5aSMatthew Dillon* freemap / clustering.  Set block size on 2MB boundary so the cluster code
1231a7cfe5aSMatthew Dillon  can be used for reading.
1241a7cfe5aSMatthew Dillon
1251a7cfe5aSMatthew Dillon* need API layer for shared buffers (unfortunately).
1261a7cfe5aSMatthew Dillon
127731b2a84SMatthew Dillon* add magic number to inode header, add parent inode number too, to
128731b2a84SMatthew Dillon  help with brute-force recovery.
129731b2a84SMatthew Dillon
130731b2a84SMatthew Dillon* modifications past our flush point do not adjust vchain.
131731b2a84SMatthew Dillon  need to make vchain dynamic so we can (see flush_scan2).??
132731b2a84SMatthew Dillon
1331a7cfe5aSMatthew Dillon* MINIOSIZE/RADIX set to 1KB for now to avoid buffer cache deadlocks
1341a7cfe5aSMatthew Dillon  on multiple locked inodes.  Fix so we can use LBUFSIZE!  Or,
1351a7cfe5aSMatthew Dillon  alternatively, allow a smaller I/O size based on the sector size
1361a7cfe5aSMatthew Dillon  (not optimal though).
1371a7cfe5aSMatthew Dillon
138a864c5d9SMatthew Dillon* When making a snapshot, do not allow the snapshot to be mounted until
139a864c5d9SMatthew Dillon  the in-memory chain has been freed in order to break the shared core.
140a864c5d9SMatthew Dillon
141a864c5d9SMatthew Dillon* Snapshotting a sub-directory does not snapshot any
142a864c5d9SMatthew Dillon  parent-directory-spanning hardlinks.
143a864c5d9SMatthew Dillon
144731b2a84SMatthew Dillon* Snapshot / flush-synchronization point.  remodified data that crosses
145731b2a84SMatthew Dillon  the synchronization boundary is not currently reallocated.  see
146731b2a84SMatthew Dillon  hammer2_chain_modify(), explicit check (requires logical buffer cache
147731b2a84SMatthew Dillon  buffer handling).
148731b2a84SMatthew Dillon
14951bf8e9bSMatthew Dillon* on fresh mount with multiple hardlinks present separate lookups will
15051bf8e9bSMatthew Dillon  result in separate vnodes pointing to separate inodes pointing to a
15151bf8e9bSMatthew Dillon  common chain (the hardlink target).
15251bf8e9bSMatthew Dillon
15351bf8e9bSMatthew Dillon  When the hardlink target consolidates upward only one vp/ip will be
15451bf8e9bSMatthew Dillon  adjusted.  We need code to fixup the other chains (probably put in
15551bf8e9bSMatthew Dillon  inode_lock_*()) which will be pointing to an older deleted hardlink
15651bf8e9bSMatthew Dillon  target.
15751bf8e9bSMatthew Dillon
15832b800e6SMatthew Dillon* Filesystem must ensure that modify_tid is not too large relative to
15932b800e6SMatthew Dillon  the iterator in the volume header, on load, or flush sequencing will
16032b800e6SMatthew Dillon  not work properly.  We should be able to just override it, but we
16132b800e6SMatthew Dillon  should complain if it happens.
16232b800e6SMatthew Dillon
1638c280d5dSMatthew Dillon* Kernel-side needs to clean up transaction queues and make appropriate
1648c280d5dSMatthew Dillon  callbacks.
1658c280d5dSMatthew Dillon
1668c280d5dSMatthew Dillon* Userland side needs to do the same for any initiated transactions.
1678c280d5dSMatthew Dillon
168222d9e22SMatthew Dillon* Nesting problems in the flusher.
169222d9e22SMatthew Dillon
17001eabad4SMatthew Dillon* Inefficient vfsync due to thousands of file buffers, one per-vnode.
17101eabad4SMatthew Dillon  (need to aggregate using a device buffer?)
17201eabad4SMatthew Dillon
1738cce658dSMatthew Dillon* Use bp->b_dep to interlock the buffer with the chain structure so the
1748cce658dSMatthew Dillon  strategy code can calculate the crc and assert that the chain is marked
1758cce658dSMatthew Dillon  modified (not yet flushed).
1768cce658dSMatthew Dillon
1778cce658dSMatthew Dillon* Deleted inode not reachable via tree for volume flush but still reachable
1788cce658dSMatthew Dillon  via fsync/inactive/reclaim.  Its tree can be destroyed at that point.
1798cce658dSMatthew Dillon
180866d5273SMatthew Dillon* The direct write code needs to invalidate any underlying physical buffers.
181866d5273SMatthew Dillon  Direct write needs to be implemented.
182866d5273SMatthew Dillon
183866d5273SMatthew Dillon* Make sure a resized block (hammer2_chain_resize()) calculates a new
184222d9e22SMatthew Dillon  hash code in the parent bref
185866d5273SMatthew Dillon
186995e78dcSMatthew Dillon* The freemap allocator needs to getblk/clrbuf/bdwrite any partial
187995e78dcSMatthew Dillon  block allocations (less than 64KB) that allocate out of a new 64K
188995e78dcSMatthew Dillon  block, to avoid causing a read-before-write I/O.
189995e78dcSMatthew Dillon
190995e78dcSMatthew Dillon* Check flush race upward recursion setting SUBMODIFIED vs downward
191995e78dcSMatthew Dillon  recursion checking SUBMODIFIED then locking (must clear before the
192995e78dcSMatthew Dillon  recursion and might need additional synchronization)
193995e78dcSMatthew Dillon
194db0c2eb3SMatthew Dillon* There is definitely a flush race in the hardlink implementation between
195db0c2eb3SMatthew Dillon  the forwarding entries and the actual (hidden) hardlink inode.
196db0c2eb3SMatthew Dillon
197db0c2eb3SMatthew Dillon  This will require us to associate a small hard-link-adjust structure
198db0c2eb3SMatthew Dillon  with the chain whenever we create or delete hardlinks, on top of
199db0c2eb3SMatthew Dillon  adjusting the hardlink inode itself.  Any actual flush to the media
200db0c2eb3SMatthew Dillon  has to synchronize the correct nlinks value based on whether related
201db0c2eb3SMatthew Dillon  created or deleted hardlinks were also flushed.
202db0c2eb3SMatthew Dillon
203995e78dcSMatthew Dillon* When a directory entry is created and also if an indirect block is
204995e78dcSMatthew Dillon  created and entries moved into it, the directory seek position can
205995e78dcSMatthew Dillon  potentially become incorrect during a scan.
206995e78dcSMatthew Dillon
207995e78dcSMatthew Dillon* When a directory entry is deleted a directory seek position depending
208995e78dcSMatthew Dillon  on that key can cause readdir to skip entries.
209db0c2eb3SMatthew Dillon
21073e441b9SMatthew Dillon* TWO PHASE COMMIT - store two data offsets in the chain, and
21173e441b9SMatthew Dillon  hammer2_chain_delete() needs to leave the chain intact if MODIFIED2 is
21273e441b9SMatthew Dillon  set on its buffer until the flusher gets to it?
21373e441b9SMatthew Dillon
214db0c2eb3SMatthew Dillon
215db0c2eb3SMatthew Dillon				OPTIMIZATIONS
216db0c2eb3SMatthew Dillon
217db0c2eb3SMatthew Dillon* If a file is unlinked buts its descriptors is left open and used, we
218db0c2eb3SMatthew Dillon  should allow data blocks on-media to be reused since there is no
219db0c2eb3SMatthew Dillon  topology left to point at them.
220