aboutsummaryrefslogtreecommitdiff
path: root/drivers/md/dm-raid1.c
AgeCommit message (Collapse)Author
2007-07-12dm raid1: handle log failureJonathan Brassow
When writing to a mirror, the log must be updated first. Failure to update the log could result in the log not properly reflecting the state of the mirror if the machine should crash. We change the return type of the rh_flush function to give us the ability to check if a log write was successful. If the log write was unsuccessful, we fail the writes to avoid the case where the log does not properly reflect the state of the mirror. A follow-up patch - which is dependent on the ability to requeue I/O's to core device-mapper - will requeue the I/O's for retry (allowing the mirror to be reconfigured.) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12dm raid1: handle resync failuresJonathan Brassow
Device-mapper mirroring currently takes a best effort approach to recovery - failures during mirror synchronization are completely ignored. This means that regions are marked 'in-sync' and 'clean' and removed from the hash list. Future reads and writes that query the region will incorrectly interpret the region as in-sync. This patch handles failures during the recovery process. If a failure occurs, the region is marked as 'not-in-sync' (aka RH_NOSYNC) and added to a new list 'failed_recovered_regions'. Regions on the 'failed_recovered_regions' list are not marked as 'clean' upon removal from the list. Furthermore, if the DM_RAID1_HANDLE_ERRORS flag is set, the region is marked as 'not-in-sync'. This action prevents any future read-balancing from choosing an invalid device because of the 'not-in-sync' status. If "handle_errors" is not specified when creating a mirror (leaving the DM_RAID1_HANDLE_ERRORS flag unset), failures will be ignored exactly as they would be without this patch. This is to preserve backwards compatibility with user-space tools, such as 'pvmove'. However, since future read-balancing policies will rely on the correct sync status of a region, a user must choose "handle_errors" when using read-balancing. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12dm raid1: clear region outside spinlockJonathan Brassow
A clear_region function is permitted to block (in practice, rare) but gets called in rh_update_states() with a spinlock held. The bits being marked and cleared by the above functions are used to update the on-disk log, but are never read directly. We can perform these operations outside the spinlock since the bits are only changed within one thread viz. - mark_region in rh_inc() - clear_region in rh_update_states(). So, we grab the clean_regions list items via list_splice() within the spinlock and defer clear_region() until we iterate over the list for deletion - similar to how the recovered_regions list is already handled. We then move the flush() call down to ensure it encapsulates the changes which are done by the later calls to clear_region(). Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12dm raid1: fix statusMilan Broz
Fix mirror status line broken in dm-log-report-fault-status.patch: - space missing between two words - placeholder ("0") required for compatibility with a subsequent patch - incorrect offset parameter Cc: stable@kernel.org Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12dm: remove duplicate module name from error msgsAlasdair G Kergon
Remove explicit module name from messages as the macro now includes it automatically. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09dm raid1: switch rh_in_sync to blocking in do_readsJonathan Brassow
The call to rh_in_sync() in do_reads() should be allowed to block. It is in the mirror worker thread which already permits blocking operations. This will be needed to support clustered mirroring which will perform network operations. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09dm raid1: fix to commit pending clear region requestsJonathan Brassow
With the code as it is, it is possible for oustanding clear region requests never to get flushed when a mirror is deactivated or suspended. This means there will always be some resync work required when a mirror is activated, even though it may very well be in-sync. Always requesting the flush doesn't hurt us. This is because the log tracks whether any changes occurred and, if not, no flush is performed. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09dm raid1: update dm io interfaceMilan Broz
This patch ports dm-raid1.c to the new dm-io interface. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09dm raid1: add handle_errors feature flagJonathan E Brassow
This patch adds the ability to specify desired features in the mirror constructor/mapping table. The first feature of interest is "handle_errors". Currently, mirroring will ignore any I/O errors from the devices. Subsequent patches will check for this flag and handle the errors. If flag/feature is not present, mirror will do nothing - maintaining backwards compatibility. Signed-off-by: Jonathan E Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09dm log: report fault statusJonathan E Brassow
This patch reports the status of the log device so that userspace can detect the error and take appropriate action. Signed-off-by: Jonathan E Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09dm raid1: one kmirrord per mirrorHolger Smolinski
This patch replaces the single instance of kmirrord by one instance per mirror set. This change is required to avoid a deadlock in kmirrord when the persistent dirty log of a mirror itself resides on a mirror. The single instance of kmirrord then issues a sync write to the dirty log in write_bits which gets deferred to kmirrord itself later in the call chain. But kmirrord never does the deferred work because it is still waiting for the sync write_bits. _mirror_sets is removed as it no longer needed, and we always flush the workqueue before destroying it to ensure all work is complete before destroying it. Signed-off-by: Holger Smolinski <smolinski@de.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-12-08[PATCH] dm: log: rename complete_resync_workJonathan E Brassow
The complete_resync_work function only provides the ability to change an out-of-sync region to in-sync. This patch enhances the function to allow us to change the status from in-sync to out-of-sync as well, something that is needed when a mirror write to one of the devices or an initial resync on a given region fails. Signed-off-by: Jonathan E Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm: map and endio symbolic return codesKiyoshi Ueda
Update existing targets to use the new symbols for return values from target map and end_io functions. There is no effect on behaviour. Test results: Done build test without errors. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-22WorkStruct: make allyesconfigDavid Howells
Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-08[PATCH] dm: raid1: fix waiting for io on suspendJonathan E Brassow
All device-mapper targets must complete outstanding I/O before suspending. The mirror target generates I/O in its recovery phase and fails to wait for it. It needs to be tracked so we can ensure that it has completed before we suspend. [akpm@osdl.org: cleanup] Signed-off-by: Jonathan E Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <dm-devel@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] dm mirror: remove trailing space from tableJonathan Brassow
Remove trailing space from 'dmsetup table' output. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] dm: Fix deadlock under high i/o load in raid1 setup.Daniel Kobras
On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid, we experienced frequent deadlock of the system under high i/o load. 'cat /dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly after a few GB, 'cp' would be left in 'D' state along with kjournald and kmirrord. The functions cp and kjournald were blocked in did vary, but kmirrord's wchan always pointed to 'mempool_alloc()'. We've seen this pattern on 2.6.15 and 2.6.17 kernels. http://lkml.org/lkml/2005/4/20/142 indicates that this problem has been around even before. So much for the facts, here's my interpretation: mempool_alloc() first tries to atomically allocate the requested memory, or falls back to hand out preallocated chunks from the mempool. If both fail, it puts the calling process (kmirrord in this case) on a private waitqueue until somebody refills the pool. Where the only 'somebody' is kmirrord itself, so we have a deadlock. I worked around this problem by falling back to a (blocking) kmalloc when before kmirrord would have ended up on the waitqueue. This defeats part of the benefits of using the mempool, but at least keeps the system running. And it could be done with a two-line change. Note that mempool_alloc() clears the GFP_NOIO flag internally, and only uses it to decide whether to wait or return an error if immediate allocation fails, so the attached patch doesn't change behaviour in the non-deadlocking case. Path is against current git (2.6.18-rc4), but should apply to earlier versions as well. I've tested on 2.6.15, where this patch makes the difference between random lockup and a stable system. Signed-off-by: Daniel Kobras <kobras@linux.de> Acked-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] dm: improve error message consistencyAlasdair G Kergon
Tidy device-mapper error messages to include context information automatically. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] dm kcopyd: error accumulation fixJonathan Brassow
kcopyd should accumulate errors - otherwise I/O failures may be ignored unintentionally. And invert 'success' (used in a future patch), using a more intuitive !(read_err || write_err). Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] dm mirror log: sector size fixKevin Corry
On-disk logs for dm-mirror devices are currently hard-coded to use 512 byte hard-sector-sizes. This patch fixes dm-log so it will work with devices with non-512-byte hard-sector-sizes. To maintain full compatibility, instead of moving the clean-bits bitset to a bitset, and enlarges the disk-header buffer to encompass both the header and the bitset. The I/O routines for the bitset are removed, and the I/O routines for the disk-header now also read/write the bitset. Signed-off-by: Kevin Corry <kevcorry@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] dm: mirror sector offset fixNeil Brown
The device-mapper core does not perform any remapping of bios before passing them to the targets. If a particular mapping begins part-way into a device, targets obtain the sector relative to the start of the mapping by subtracting ti->begin. The dm-raid1 target didn't do this everywhere: this patch fixes it, taking care to subtract ti->begin exactly once for each bio. [akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] drivers: use list_move()Akinobu Mita
This patch converts the combination of list_del(A) and list_add(A, B) to list_move(A, B) under drivers/. Acked-by: Corey Minyard <minyard@mvista.com> Cc: Ben Collins <bcollins@debian.org> Acked-by: Roland Dreier <rolandd@cisco.com> Cc: Alasdair Kergon <dm-devel@redhat.com> Cc: Gerd Knorr <kraxel@bytesex.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frank Pavlic <fpavlic@de.ibm.com> Acked-by: Matthew Wilcox <matthew@wil.cx> Cc: Andrew Vasquez <linux-driver@qlogic.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] dm: remove SECTOR_FORMATAndrew Morton
We don't know what type sector_t has. Sometimes it's unsigned long, sometimes it's unsigned long long. For example on ppc64 it's unsigned long with CONFIG_LBD=n and on x86_64 it's unsigned long long with CONFIG_LBD=n. The way to handle all of this is to always use unsigned long long and to always typecast the sector_t when printing it. Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] drivers/md/dm-raid1.c: Fix inconsistent mirroring after interrupted ↵Jun'ichi Nomura
recovery dm-mirror has potential data corruption problem: while on-disk log shows that all disk contents are in-sync, actual contents of the disks are not synchronized. This problem occurs if initial recovery (synching) is interrupted and resumed. Attached patch fixes this problem. Background: rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN (in-sync), which results in the corresponding bit of clean_bits being set. This is harmful if on-disk log is used and the map is removed/suspended before the initial sync is completed. The clean_bits is written down to the on-disk log at the map removal, and, upon resume, it's read and copied to sync_bits. Since the recovery process refers to the sync_bits to find a region to be recovered, the region whose state was changed from RH_NOSYNC to RH_CLEAN is no longer recovered. If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel sometimes ago, the contents of the mirrored disk just corrupt silently. If you have, balanced read may get bogus data from out-of-sync disks. The patch keeps RH_NOSYNC state unchanged. It will be changed to RH_RECOVERING when recovery starts and get reclaimed when the recovery completes. So it doesn't leak the region hash entry. Description: Keep RH_NOSYNC state unchanged when I/O on the region completes. rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN (in-sync), which results in the corresponding bit of clean_bits being set. This is harmful if on-disk log is used and the map is removed/suspended before the initial sync is completed. The clean_bits is written down to the on-disk log at the map removal, and, upon resume, it's read and copied to sync_bits. Since the recovery process refers to the sync_bits to find a region to be recovered, the region whose state was changed from RH_NOSYNC to RH_CLEAN is no longer recovered. If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel sometimes ago, the contents of the mirrored disk just corrupt silently. If you have, balanced read may get bogus data from out-of-sync disks. The RH_NOSYNC region will be changed to RH_RECOVERING when recovery starts on the region and get reclaimed when the recovery completes. So it doesn't leak the region hash entry. Alasdair said: I've analysed the relevant part of the state machine and I believe that the patch is correct. (Further work on this code is still needed - this patch has the side-effect of holding onto memory unnecessarily for long periods of time under certain workloads - but better that than corrupting data.) Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] mempool: use common mempool kmalloc allocatorMatthew Dobson
This patch changes several mempool users, all of which are basically just wrappers around kmalloc(), to use the common mempool_kmalloc/kfree, rather than their own wrapper function, removing a bunch of duplicated code. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] device-mapper raid1: add default mirrorJonathan E Brassow
This patch introduces a new field to the mirror_set (default_mirror) to store the default mirror. (A subsequent patch will allow us to change the default mirror in the event of a failure.) Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] device-mapper raid1: drop mark_region spinlock fixJonathan E Brassow
The spinlock region_lock is held while calling mark_region which can sleep. Drop the spinlock before calling that function. A region's state and inclusion in the clean list are altered by rh_inc and rh_dec. The state variable is set to RH_CLEAN in rh_dec, but only if 'pending' is zero. It is set to RH_DIRTY in rh_inc, but not if it is already so. The changes to 'pending', the state, and the region's inclusion in the clean list need to be atomicly. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-08[PATCH] gfp flags annotations - part 1Al Viro
- added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09[PATCH] dm: fix rh_dec()/rh_inc() race in dm-raid1.cJun'ichi Nomura
Fix another bug in dm-raid1.c that the dirty region may stay in or be moved to clean list and freed while in use. It happens as follows: CPU0 CPU1 ------------------------------------------------------------------------------ rh_dec() if (atomic_dec_and_test(pending)) <the region is still marked dirty> rh_inc() if the region is clean mark the region dirty and remove from clean list mark the region clean and move to clean list atomic_inc(pending) At this stage, the region is in clean list and will be mistakenly reclaimed by rh_update_states() later. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04[PATCH] dm-raid locking fixAlasdair G Kergon
This code was never designed to handle more than one instance of do_work() running at once. Signed-Off-By: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07[PATCH] device-mapper: dm-raid1: Limit bios to size of mirror regionAlasdair G Kergon
Set the target's split_io field when building a dm-mirror device so incoming bios won't span the mirror's internal regions. Without this, regions can be accessed while not holding correct locks and data corruption is possible. Reported-By: "Zhao Qian" <zhaoqian@aaastor.com> From: Kevin Corry <kevcorry@us.ibm.com> Signed-Off-By: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16Linux-2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!