| #
396f9f45 |
| 19-May-2008 |
oster <oster@NetBSD.org> |
Re-work some of the guts of the reconstruction code.
Reconmap used to have one pointer for every reconstruction unit. This does not scale well in the land of 1TB disks, where some 100MB+ of "status
Re-work some of the guts of the reconstruction code.
Reconmap used to have one pointer for every reconstruction unit. This does not scale well in the land of 1TB disks, where some 100MB+ of "status pointers" are required for typical configurations. Convert the reconstruction code to use a "sliding status window" which will scale nicely regardless of the number of stripes/reconstruction units in the RAID set. Convert the main reconstruction loop to rebuild the array in chunks rather than in one big lump.
As part of these changes, introduce a function to kick any waiters on the head separation callback list, and use that in the main reconstruction event queue to wake up the waiters if things have stalled. (I believe this may fix a race condition that could occur at at least at the very end of a disk during reconstruction under heavy IO load.)
Thanks to Brian Buhrow for all his help, support, and patience in testing these changes.
show more ...
|
| #
8fb49f6f |
| 15-Apr-2008 |
oster <oster@NetBSD.org> |
A forced recon read should not default to indicating that the reads for that disk have stopped, since this will bump us out of the normal reconstruction loop prematurely.
Fixes the (mostly cosmetic)
A forced recon read should not default to indicating that the reads for that disk have stopped, since this will bump us out of the normal reconstruction loop prematurely.
Fixes the (mostly cosmetic) bug where the reconstruction status values stop updating, and from raidctl it appears that reconstruction has totally stalled (which it actually hasn't -- the reconstruction does complete properly, but not in the normal way).
show more ...
|
| #
25c8cdfd |
| 14-Apr-2008 |
oster <oster@NetBSD.org> |
Print out the status value if a reconstruction read fails. Don't print out write promotions during reconstruct unless we are debugging reconstructs.
|
| #
287ee4e9 |
| 26-Jan-2008 |
oster <oster@NetBSD.org> |
In a land before time, when kernel processes roamed the system, we needed to keep track of the kernel process that opened a device in order to close it with the right credentials. Flash forward to t
In a land before time, when kernel processes roamed the system, we needed to keep track of the kernel process that opened a device in order to close it with the right credentials. Flash forward to today where curlwp is now quite sufficient.
show more ...
|
| #
61e8303e |
| 26-Nov-2007 |
pooka <pooka@NetBSD.org> |
Remove the "struct lwp *" argument from all VFS and VOP interfaces. The general trend is to remove it from all kernel interfaces and this is a start. In case the calling lwp is desired, curlwp shoul
Remove the "struct lwp *" argument from all VFS and VOP interfaces. The general trend is to remove it from all kernel interfaces and this is a start. In case the calling lwp is desired, curlwp should be used.
quick consensus on tech-kern
show more ...
|
| #
6384685d |
| 21-Sep-2007 |
oster <oster@NetBSD.org> |
Fix wording in a comment and correct a debug line. From Olivier Cherrier (via private mail). Thanks!
|
| #
1c0f1b25 |
| 18-Jul-2007 |
ad <ad@NetBSD.org> |
Fix fallout from recent kthread changes.
|
| #
88ab7da9 |
| 09-Jul-2007 |
ad <ad@NetBSD.org> |
Merge some of the less invasive changes from the vmlocking branch:
- kthread, callout, devsw API changes - select()/poll() improvements - miscellaneous MT safety improvements
|
| #
954bc134 |
| 26-Jun-2007 |
cube <cube@NetBSD.org> |
Change dk_lookup() to accept an additional argument of the type enum uio_seg that tells whether the given path is in user space or kernel space, so it can tell NDINIT().
While the raidframe calls we
Change dk_lookup() to accept an additional argument of the type enum uio_seg that tells whether the given path is in user space or kernel space, so it can tell NDINIT().
While the raidframe calls were ok, both ccd(4) and cgd(4) were passing pointers to user space data, which leads to strange error on i386, as reported by Jukka Salmi on current-users.
The issue has been there since last august, I'm actually a bit surprised that no one in the meantime has used ccd(4) or cgd(4) on an arch where it would have simply faulted.
show more ...
|
| #
168cd830 |
| 16-Nov-2006 |
christos <christos@NetBSD.org> |
__unused removal on arguments; approved by core.
|
| #
4d595fd7 |
| 12-Oct-2006 |
christos <christos@NetBSD.org> |
- sprinkle __unused on function decls. - fix a couple of unused bugs - no more -Wno-unused for i386
|
| #
ecdff16f |
| 27-Aug-2006 |
christos <christos@NetBSD.org> |
- use dk_lookup instead of our home-spun version. - allow raid to be configured in a wedge - allow wedges to be configured in a raid - add autoconfiguration of wedges in a raid
|
| #
3029ac48 |
| 21-Jul-2006 |
ad <ad@NetBSD.org> |
- Use the LWP cached credentials where sane. - Minor cosmetic changes.
|
| #
2867b68b |
| 14-May-2006 |
elad <elad@NetBSD.org> |
integrate kauth.
|
| #
95e1ffb1 |
| 11-Dec-2005 |
christos <christos@NetBSD.org> |
merge ktrace-lwp.
|
| #
97682553 |
| 18-Jul-2005 |
oster <oster@NetBSD.org> |
If rf_SubmitReconBuffer indicates the submission was blocked (for whatever reason), return 0 instead of the default RF_RECON_READ_STOPPED. Returning RF_RECON_READ_STOPPED would result in rf_Continue
If rf_SubmitReconBuffer indicates the submission was blocked (for whatever reason), return 0 instead of the default RF_RECON_READ_STOPPED. Returning RF_RECON_READ_STOPPED would result in rf_ContinueReconstructFailedDisk() thinking that the given component was "done" and breaking out of the main reconstruction loop far too early. Reconstruction still worked correctly as long as there were no errors, but RAIDframe wouldn't be in a position to properly handle read/write errors during reconstruction.
This fixes the "raidctl's progress bar spins at 0% until reconstruction finishes" problem.
show more ...
|
| #
77708271 |
| 08-Jun-2005 |
oster <oster@NetBSD.org> |
- initialize numRUsTotal before we indicate that we are doing a reconstruct.
- make numRUsComplete and numRUsTotal 64-bit quantities like everything else that records this information.
|
| #
f31bd063 |
| 27-Feb-2005 |
perry <perry@NetBSD.org> |
nuke trailing whitespace
|
| #
be864067 |
| 12-Feb-2005 |
oster <oster@NetBSD.org> |
The 'next' argument to rf_CreateDiskQueueData is always NULL. Since there is no particular reason to pass an extra NULL argument, turf it, and initialize p->next to NULL within the function.
|
| #
0b154709 |
| 12-Feb-2005 |
oster <oster@NetBSD.org> |
Add a 'waitflag' argument to rf_CreateDiskQueueData() and use it to determine if we are willing to wait for memory to come from the diskqueuedata (dqd) and bufpool pools. Cleanup the mess related to
Add a 'waitflag' argument to rf_CreateDiskQueueData() and use it to determine if we are willing to wait for memory to come from the diskqueuedata (dqd) and bufpool pools. Cleanup the mess related to code calling rf_CreateDiskQueueData() with different expectations (and/or blatent disregard) of what might happen if there were insufficient pool resources.
show more ...
|
| #
04a30b5e |
| 06-Feb-2005 |
oster <oster@NetBSD.org> |
It's not a bad idea to update the component labels whether or not the reconstruction was successful.
|
| #
339f61b7 |
| 05-Feb-2005 |
oster <oster@NetBSD.org> |
rf_GetNextReconEvent() *will* return a valid event, so no need for the assert. (we'd have panic'ed in there long before this assert if that wasn't the case).
Minor whitespace changes.
|
| #
c38bce14 |
| 05-Feb-2005 |
oster <oster@NetBSD.org> |
Vastly improve the error handling in the case of a read/write error that occurs during a reconstruction. We go from zero error handling and likely panicing if something goes amiss, to gracefully bai
Vastly improve the error handling in the case of a read/write error that occurs during a reconstruction. We go from zero error handling and likely panicing if something goes amiss, to gracefully bailing and leaving the system in the best, usable state possible.
- introduce rf_DrainReconEventQueue() to allow easy cleaning of the reconstruction event queue
- change how we cleanup the floating recon buffers in rf_FreeReconControl(). Detect the end of the list rather than traversing according to a count.
- keep track of the number of pending reconstruction writes. In the event of a read error, use this to wait long enough for the pending writes to (hopefully) drain.
- more cleanup is still needed on this code, but I didn't want to start mixing major functional changes with minor cleanups.
XXX: There is a known issue with pool items left outstanding due to the IO failure, and this can show up in the form of a panic at the tail end of a shutdown. This problem is much less severe than before these changes, and the hope/plan is that this problem will go away once this code gets overhauled again.
show more ...
|
| #
c18a2427 |
| 22-Jan-2005 |
oster <oster@NetBSD.org> |
Torch some #define's missed in last commit.
|
| #
31409478 |
| 22-Jan-2005 |
oster <oster@NetBSD.org> |
Reconstruction Descriptors are only allocated once per reconstruction, and don't need their own pool or freelist or anything fancier than a malloc/free.
|