aboutsummaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)Author
2006-09-01[PATCH] md: Fix issues with referencing rdev in md/raid1NeilBrown
We need to be careful when referencing mirrors[i].rdev. It can disappear under us at various times. So: fix a couple of problem places. comment a couple of non-problem places move an 'atomic_add' which deferences rdev down a little way to some where where it is sure to not be NULL. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] md: fix recent breakage of md/raid1 array checkingNeilBrown
A recent patch broke the ability to do a user-request check of a raid1. This patch fixes the breakage and also moves a comment that was dislocated by the same patch. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] md: avoid backward event updates in md superblock when degraded.NeilBrown
If we - shut down a clean array, - restart with one (or more) drive(s) missing - make some changes - pause, so that they array gets marked 'clean', the event count on the superblock of included drives will be the same as that of the removed drives. So adding the removed drive back in will cause it to be included with no resync. To avoid this, we only update the eventcount backwards when the array is not degraded. In this case there can (should) be no non-connected drives that we can get confused with, and this is the particular case where updating-backwards is valuable. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] dm: Fix deadlock under high i/o load in raid1 setup.Daniel Kobras
On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid, we experienced frequent deadlock of the system under high i/o load. 'cat /dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly after a few GB, 'cp' would be left in 'D' state along with kjournald and kmirrord. The functions cp and kjournald were blocked in did vary, but kmirrord's wchan always pointed to 'mempool_alloc()'. We've seen this pattern on 2.6.15 and 2.6.17 kernels. http://lkml.org/lkml/2005/4/20/142 indicates that this problem has been around even before. So much for the facts, here's my interpretation: mempool_alloc() first tries to atomically allocate the requested memory, or falls back to hand out preallocated chunks from the mempool. If both fail, it puts the calling process (kmirrord in this case) on a private waitqueue until somebody refills the pool. Where the only 'somebody' is kmirrord itself, so we have a deadlock. I worked around this problem by falling back to a (blocking) kmalloc when before kmirrord would have ended up on the waitqueue. This defeats part of the benefits of using the mempool, but at least keeps the system running. And it could be done with a two-line change. Note that mempool_alloc() clears the GFP_NOIO flag internally, and only uses it to decide whether to wait or return an error if immediate allocation fails, so the attached patch doesn't change behaviour in the non-deadlocking case. Path is against current git (2.6.18-rc4), but should apply to earlier versions as well. I've tested on 2.6.15, where this patch makes the difference between random lockup and a stable system. Signed-off-by: Daniel Kobras <kobras@linux.de> Acked-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-14[PATCH] dm: BUG/OOPS fixMichal Miroslaw
Fix BUG I tripped on while testing failover and multipathing. BUG shows up on error path in multipath_ctr() when parse_priority_group() fails after returning at least once without error. The fix is to initialize m->ti early - just after alloc()ing it. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c027c3d2 *pde = 00000000 Oops: 0000 [#3] Modules linked in: qla2xxx ext3 jbd mbcache sg ide_cd cdrom floppy CPU: 0 EIP: 0060:[<c027c3d2>] Not tainted VLI EFLAGS: 00010202 (2.6.17.3 #1) EIP is at dm_put_device+0xf/0x3b eax: 00000001 ebx: ee4fcac0 ecx: 00000000 edx: ee4fcac0 esi: ee4fc4e0 edi: ee4fc4e0 ebp: 00000000 esp: c5db3e78 ds: 007b es: 007b ss: 0068 Process multipathd (pid: 15912, threadinfo=c5db2000 task=ef485a90) Stack: ec4eda40 c02816bd ee4fc4c0 00000000 f7e89498 f883e0bc c02816f6 f7e89480 f7e8948c c0281801 ffffffea f7e89480 f883e080 c0281ffe 00000001 00000000 00000004 dfe9cab8 f7a693c0 f883e080 f883e0c0 ca4b99c0 c027c6ee 01400000 Call Trace: <c02816bd> free_pgpaths+0x31/0x45 <c02816f6> free_priority_group+0x25/0x2e <c0281801> free_multipath+0x35/0x67 <c0281ffe> multipath_ctr+0x123/0x12d <c027c6ee> dm_table_add_target+0x11e/0x18b <c027e5b4> populate_table+0x8a/0xaf <c027e62b> table_load+0x52/0xf9 <c027ec23> ctl_ioctl+0xca/0xfc <c027e5d9> table_load+0x0/0xf9 <c0152146> do_ioctl+0x3e/0x43 <c0152360> vfs_ioctl+0x16c/0x178 <c01523b4> sys_ioctl+0x48/0x60 <c01029b3> syscall_call+0x7/0xb Code: 97 f0 00 00 00 89 c1 83 c9 01 80 e2 01 0f 44 c1 88 43 14 8b 04 24 59 5b 5e 5f 5d c3 53 89 c1 89 d3 ff 4a 08 0f 94 c0 84 c0 74 2a <8b> 01 8b 10 89 d8 e8 f6 fb ff ff 8b 03 8b 53 04 89 50 04 89 02 EIP: [<c027c3d2>] dm_put_device+0xf/0x3b SS:ESP 0068:c5db3e78 Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Acked-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-06[PATCH] md: Fix a bug that recently crept into md/linearNeilBrown
A recent patch that allowed linear arrays to be reconfigured on-line allowed in a bug which results in divide by zero - not all mddev->array_size were converted to conf->array_size. This patch finished the conversion and fixed the bug. The offending patch was commit 7c7546ccf6463edbeee8d9aac6de7be1cd80d08a. Thanks to Simon Kirby <sim@netnation.com> for the bug report. Cc: Simon Kirby <sim@netnation.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: fix oops in error-handlingAndrew Morton
During early MD setup (superblock reading), we don't have a personality yet. But the error-handling code tries to dereference mddev->pers. Fix. Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: include sector number in messages about corrected read errorsNeilBrown
This is generally useful, but particularly helps see if it is the same sector that always needs correcting, or different ones. [akpm@osdl.org: fix printk warnings] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: require CAP_SYS_ADMIN for (re-)configuring md devices via sysfsNeilBrown
The ioctl requires CAP_SYS_ADMIN, so sysfs should too. Note that we don't require CAP_SYS_ADMIN for reading attributes even though the ioctl does. There is no reason to limit the read access, and much of the information is already available via /proc/mdstat Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: unify usage of symbolic names for permsNeilBrown
Some places we use number (0660) someplaces names (S_IRUGO). Change all numbers to be names, and change 0655 to be what it should be. Also make some formatting more consistent. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: fix usage of wrong variable in raid1NeilBrown
Though it rarely matters, we should be using 's' rather than r1_bio->sector here. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: fix some small races in bitmap plugging in raid5NeilBrown
The comment gives more details, but I didn't quite have the sequencing write, so there was room for races to leave bits unset in the on-disk bitmap for short periods of time. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: fix a plug/unplug race in raid5NeilBrown
When a device is unplugged, requests are moved from one or two (depending on whether a bitmap is in use) queues to the main request queue. So whenever requests are put on either of those queues, we should make sure the raid5 array is 'plugged'. However we don't. We currently plug the raid5 queue just before putting requests on queues, so there is room for a race. If something unplugs the queue at just the wrong time, requests will be left on the queue and nothing will want to unplug them. Normally something else will plug and unplug the queue fairly soon, but there is a risk that nothing will. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: fix resync speed calculation for restarted resyncsNeilBrown
We introduced 'io_sectors' recently so we could count the sectors that causes io during resync separate from sectors which didn't cause IO - there can be a difference if a bitmap is being used to accelerate resync. However when a speed is reported, we find the number of sectors processed recently by subtracting an oldish io_sectors count from a current 'curr_resync' count. This is wrong because curr_resync counts all sectors, not just io sectors. So, add a field to mddev to store the curren io_sectors separately from curr_resync, and use that in the calculations. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: delay starting md threads until array is completely setupNeilBrown
When an array is started we start one or two threads (two if there is a reshape or recovery that needs to be completed). We currently start these *before* the array is completely set up and in particular before queue->queuedata is set. If the thread actually starts very quickly on another CPU, we can end up dereferencing queue->queuedata and oops. This patch also makes sure we don't try to start a recovery if a reshape is being restarted. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: set desc_nr correctly for version-1 superblocksNeilBrown
This has to be done in ->load_super, not ->validate_super Without this, hot-adding devices to an array doesn't always work right - though there is a work around in mdadm-2.5.2 to make this less of an issue. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: possible fix for unplug problemNeilBrown
I have reports of a problem with raid5 which turns out to be because the raid5 device gets stuck in a 'plugged' state. This shouldn't be able to happen as 3msec after it gets plugged it should get unplugged. However it happens none-the-less. This patch fixes the problem and is a reasonable thing to do, though it might hurt performance slightly in some cases. Until I can find the real problem, we should probably have this workaround in place. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03[PATCH] lockdep: annotate blkdev nestingIngo Molnar
Teach special (recursive) locking code to the lock validator. Effects on non-lockdep kernels: - the introduction of the following function variants: extern struct block_device *open_partition_by_devnum(dev_t, unsigned); extern int blkdev_put_partition(struct block_device *); static int blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags); which on non-lockdep are the same as open_by_devnum(), blkdev_put() and blkdev_get(). - a subclass parameter to do_open(). [unused on non-lockdep] - a subclass parameter to __blkdev_put(), which is a new internal function for the main blkdev_put*() functions. [parameter unused on non-lockdep kernels, except for two sanity check WARN_ON()s] these functions carry no semantical difference - they only express object dependencies towards the lockdep subsystem. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6: (22 commits) [PATCH] devfs: Remove it from the feature_removal.txt file [PATCH] devfs: Last little devfs cleanups throughout the kernel tree. [PATCH] devfs: Rename TTY_DRIVER_NO_DEVFS to TTY_DRIVER_DYNAMIC_DEV [PATCH] devfs: Remove the tty_driver devfs_name field as it's no longer needed [PATCH] devfs: Remove the line_driver devfs_name field as it's no longer needed [PATCH] devfs: Remove the videodevice devfs_name field as it's no longer needed [PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed [PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree [PATCH] devfs: Remove devfs_remove() function from the kernel tree [PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree [PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree [PATCH] devfs: Remove devfs_mk_symlink() function from the kernel tree [PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree [PATCH] devfs: Remove devfs_*_tape() functions from the kernel tree [PATCH] devfs: Remove devfs support from the sound subsystem [PATCH] devfs: Remove devfs support from the ide subsystem. [PATCH] devfs: Remove devfs support from the serial subsystem [PATCH] devfs: Remove devfs from the init code [PATCH] devfs: Remove devfs from the partition code ...
2006-06-29[PATCH] drivers/md/raid5.c: remove an unused variableAdrian Bunk
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] devfs: Last little devfs cleanups throughout the kernel tree.Greg Kroah-Hartman
Just removes a few unused #defines and fixes some comments due to devfs now being gone. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26[PATCH] devfs: Remove the gendisk devfs_name field as it's no longer neededGreg Kroah-Hartman
And remove the now unneeded number field. Also fixes all drivers that set these fields. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26[PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer neededGreg Kroah-Hartman
Also fixes all drivers that set this field. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26[PATCH] devfs: Remove the devfs_fs_kernel.h file from the treeGreg Kroah-Hartman
Also fixes up all files that #include it. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26[PATCH] devfs: Remove devfs_remove() function from the kernel treeGreg Kroah-Hartman
Removes the devfs_remove() function and all callers of it. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26[PATCH] devfs: Remove devfs_mk_bdev() function from the kernel treeGreg Kroah-Hartman
Removes the devfs_mk_bdev() function and all callers of it. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26[PATCH] devfs: Remove devfs_mk_dir() function from the kernel treeGreg Kroah-Hartman
Removes the devfs_mk_dir() function and all callers of it. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26[PATCH] drivers/md/md.c: make code staticAdrian Bunk
Make needlessly global code static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Allow the write_mostly flag to be set via sysfsNeilBrown
It appears in /sys/mdX/md/dev-YYY/state and can be set or cleared by writing 'writemostly' or '-writemostly' respectively. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Allow resync_start to be set and queried via sysfsNeilBrown
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Allow raid 'layout' to be read and set via sysfsNeilBrown
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Allow rdev state to be set via sysfsNeilBrown
The md/dev-XXX/state file can now be written: "faulty" simulates an error on the device "remove" removes the device from the array (if it is not busy) Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Set/get state of array via sysfsNeilBrown
This allows the state of an md/array to be directly controlled via sysfs and adds the ability to stop and array without tearing it down. Array states/settings: clear No devices, no size, no level Equivalent to STOP_ARRAY ioctl inactive May have some settings, but array is not active all IO results in error When written, doesn't tear down array, but just stops it suspended (not supported yet) All IO requests will block. The array can be reconfigured. Writing this, if accepted, will block until array is quiescent readonly no resync can happen. no superblocks get written. write requests fail read-auto like readonly, but behaves like 'clean' on a write request. clean - no pending writes, but otherwise active. When written to inactive array, starts without resync If a write request arrives then if metadata is known, mark 'dirty' and switch to 'active'. if not known, block and switch to write-pending If written to an active array that has pending writes, then fails. active fully active: IO and resync can be happening. When written to inactive array, starts with resync write-pending (not supported yet) clean, but writes are blocked waiting for 'active' to be written. active-idle like active, but no writes have been seen for a while (100msec). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Don't write dirty/clean update to spares - leave them aloneNeilBrown
- record the 'event' count on each individual device (they might sometimes be slightly different now) - add a new value for 'sb_dirty': '3' means that the super block only needs to be updated to record a clean<->dirty transition. - Prefer odd event numbers for dirty states and even numbers for clean states - Using all the above, don't update the superblock on a spare device if the update is just doing a clean-dirty transition. To accomodate this, a transition from dirty back to clean might now decrement the events counter if nothing else has changed. The net effect of this is that spare drives will not see any IO requests during normal running of the array, so they can go to sleep if that is what they want to do. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Allow re-add to work on array without bitmapsNeilBrown
When an array has a bitmap, a device can be removed and re-added and only blocks changes since the removal (as recorded in the bitmap) will be resynced. It should be possible to do a similar thing to arrays without bitmaps. i.e. if a device is removed and re-added and *no* changes have been made in the interim, then the add should not require a resync. This patch allows that option. This means that when assembling an array one device at a time (e.g. during device discovery) the array can be enabled read-only as soon as enough devices are available, but extra devices can still be added without causing a resync. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Fix bug that stops raid5 resync from happeningNeilBrown
As data_disks is *less* than raid_disks, the current test here is obviously wrong. And as the difference is already available in conf->max_degraded, it makes much more sense to use that. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Fix Kconfig errorakpm@osdl.org
RAID5 recently changed to RAID456 Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: md Kconfig speeling feexJustin Piszcz
I was experimenting with Linux SW raid today and found a spelling error when reading the help menus... (and fly spell found more). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Calculate correct array size for raid10 in new offset modeNeilBrown
The size calculation made assumtion which the new offset mode didn't follow. This gets the size right in all cases. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Change md/bitmap file handling to use bmap to file blocks-fixNeilBrown
Fix problems with new bmap based access to bitmap files. 1/ When not using a file based bitmap, attach a NULL list of buffers to each page so the common free_buffer routine can cope. 2/ Use submit_bh to read as well as write, rather than vfs_read. This makes read and write more symetric. 3/ sync the file before reading, to ensure that the page cache has no dirty pages that might get written out later. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md/bitmap: change md/bitmap file handling to use bmap to file blocksNeilBrown
If md is asked to store a bitmap in a file, it tries to hold onto the page cache pages for that file, manipulate them directly, and call a cocktail of operations to write the file out. I don't believe this is a supportable approach. This patch changes the approach to use the same approach as swap files. i.e. bmap is used to enumerate all the block address of parts of the file and we write directly to those blocks of the device. swapfile only uses parts of the file that provide a full pages at contiguous addresses. We don't have that luxury so we have to cope with pages that are non-contiguous in storage. To handle this we attach buffers to each page, and store the addresses in those buffers. With this approach the pagecache may contain data which is inconsistent with what is on disk. To alleviate the problems this can cause, md invalidates the pagecache when releasing the file. If the file is to be examined while the array is active (a non-critical but occasionally useful function), O_DIRECT io must be used. And new version of mdadm will have support for this. This approach simplifies a lot of code: - we no longer need to keep a list of pages which we need to wait for, as the b_endio function can keep track of how many outstanding writes there are. This saves a mempool. - -EAGAIN returns from write_page are no longer possible (not sure if they ever were actually). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md/bitmap: tidy up i_writecount handling in md/bitmapNeilBrown
md/bitmap modifies i_writecount of a bitmap file to make sure that no-one else writes to it. The reverting of the change is sometimes done twice, and there is one error path where it is omitted. This patch tidies that up. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md/bitmap: remove dead code from md/bitmapNeilBrown
bitmap_active is never called, and the BITMAP_ACTIVE flag is never users or tested, so discard them both. Also remove some out-of-date 'todo' comments. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md/bitmap: remove unnecessary page reference manipulations from ↵NeilBrown
md/bitmap code md/bitmap gets a collection of pages representing the bitmap when it initialises the bitmap, and puts all the references when discarding the bitmap. It also occasionally takes extra references without any good reason, and sometimes drops them ... though it doesn't always drop them, which can result in a memory leak. This patch removes the unnecessary 'get_page' calls, and the corresponding 'put_page' calls. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md/bitmap: use set_bit etc for bitmap page attributesNeilBrown
In particular, this means that we use 4 bits per page instead of a whole unsigned long. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md/bitmap: cleaner separation of page attribute handlers in md/bitmapNeilBrown
md/bitmap has some attributes per-page. Handling of these attributes in largely abstracted in set_page_attr and clear_page_attr. However get_page_attr exposes the format used to store them. So prior to changing that format, introduce test_page_attr instead of get_page_attr, and make appropriate usage changes. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md/bitmap: remove bitmap writeback daemonNeilBrown
md/bitmap currently has a separate thread to wait for writes to the bitmap file to complete (as we cannot get a callback on that action). However this isn't needed as bitmap_unplug is called from process context and waits for the writeback thread to do it's work. The same result can be achieved by doing the waiting directly in bitmap_unplug. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md/bitmap: fix online removal of file-backed bitmapsNeilBrown
When "mdadm --grow /dev/mdX --bitmap=none" is used to remove a filebacked bitmap, the bitmap was disconnected from the array, but the file wasn't closed (until the array was stopped). The file also wasn't closed if adding the bitmap file failed. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: split reshape portion of raid5 sync_request into a separate functionNeilBrown
... as raid5 sync_request is WAY too big. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>