From d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 Mon Sep 17 00:00:00 2001 From: Neil Brown Date: Tue, 1 May 2007 09:53:42 +0200 Subject: When stacked block devices are in-use (e.g. md or dm), the recursive calls to generic_make_request can use up a lot of space, and we would rather they didn't. As generic_make_request is a void function, and as it is generally not expected that it will have any effect immediately, it is safe to delay any call to generic_make_request until there is sufficient stack space available. As ->bi_next is reserved for the driver to use, it can have no valid value when generic_make_request is called, and as __make_request implicitly assumes it will be NULL (ELEVATOR_BACK_MERGE fork of switch) we can be certain that all callers set it to NULL. We can therefore safely use bi_next to link pending requests together, providing we clear it before making the real call. So, we choose to allow each thread to only be active in one generic_make_request at a time. If a subsequent (recursive) call is made, the bio is linked into a per-thread list, and is handled when the active call completes. As the list of pending bios is per-thread, there are no locking issues to worry about. I say above that it is "safe to delay any call...". There are, however, some behaviours of a make_request_fn which would make it unsafe. These include any behaviour that assumes anything will have changed after a recursive call to generic_make_request. These could include: - waiting for that call to finish and call it's bi_end_io function. md use to sometimes do this (marking the superblock dirty before completing a write) but doesn't any more - inspecting the bio for fields that generic_make_request might change, such as bi_sector or bi_bdev. It is hard to see a good reason for this, and I don't think anyone actually does it. - inspecing the queue to see if, e.g. it is 'full' yet. Again, I think this is very unlikely to be useful, or to be done. Signed-off-by: Neil Brown Cc: Jens Axboe Cc: Alasdair G Kergon said: I can see nothing wrong with this in principle. For device-mapper at the moment though it's essential that, while the bio mappings may now get delayed, they still get processed in exactly the same order as they were passed to generic_make_request(). My main concern is whether the timing changes implicit in this patch will make the rare data-corrupting races in the existing snapshot code more likely. (I'm working on a fix for these races, but the unfinished patch is already several hundred lines long.) It would be helpful if some people on this mailing list would test this patch in various scenarios and report back. Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe --- include/linux/sched.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/sched.h b/include/linux/sched.h index 17b72d88c4c..e38c436ee12 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -88,6 +88,7 @@ struct sched_param { struct exec_domain; struct futex_pi_state; +struct bio; /* * List of flags we want to share for kernel threads, @@ -1014,6 +1015,9 @@ struct task_struct { /* journalling filesystem info */ void *journal_info; +/* stacked block device info */ + struct bio *bio_list, **bio_tail; + /* VM state */ struct reclaim_state *reclaim_state; -- cgit v1.2.3 From 87c1efbfeac49849b981a7eac8cba42d4a49b2e9 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Fri, 11 May 2007 13:29:54 +0200 Subject: Fix compile/link of init/do_mounts.c with !CONFIG_BLOCK We need a stub function for when CONFIG_BLOCK isn't set. Signed-off-by: Jens Axboe --- include/linux/genhd.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/genhd.h b/include/linux/genhd.h index f589559cf07..4c03ee353e7 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -434,6 +434,10 @@ static inline struct block_device *bdget_disk(struct gendisk *disk, int index) #endif -#endif +#else /* CONFIG_BLOCK */ + +static inline void printk_all_partitions(void) { } + +#endif /* CONFIG_BLOCK */ #endif -- cgit v1.2.3