From d0deef5b14af7d5bbd0003a0a2a1a32326e20a6d Mon Sep 17 00:00:00 2001 From: Shawn Du Date: Tue, 14 Apr 2009 13:58:56 +0800 Subject: blktrace: support per-partition tracing Though one can specify '-d /dev/sda1' when using blktrace, it still traces the whole sda. To support per-partition tracing, when we start tracing, we initialize bt->start_lba and bt->end_lba to the start and end sector of that partition. Note some actions are per device, thus we don't filter 0-sector events. The original patch and discussion can be found here: http://marc.info/?l=linux-btrace&m=122949374214540&w=2 Signed-off-by: Shawn Du Signed-off-by: Li Zefan Acked-by: "Theodore Ts'o" Cc: Arnaldo Carvalho de Melo Cc: Jens Axboe LKML-Reference: <49E42620.4050701@cn.fujitsu.com> Signed-off-by: Ingo Molnar --- block/compat_ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'block') diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c index f87615dea46..f8c218cd08e 100644 --- a/block/compat_ioctl.c +++ b/block/compat_ioctl.c @@ -568,7 +568,7 @@ static int compat_blk_trace_setup(struct block_device *bdev, char __user *arg) memcpy(&buts.name, &cbuts.name, 32); mutex_lock(&bdev->bd_mutex); - ret = do_blk_trace_setup(q, b, bdev->bd_dev, &buts); + ret = do_blk_trace_setup(q, b, bdev->bd_dev, bdev, &buts); mutex_unlock(&bdev->bd_mutex); if (ret) return ret; -- cgit v1.2.3 From 1d54ad6da9192fed5dd3b60224d9f2dfea0dcd82 Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Tue, 14 Apr 2009 14:00:05 +0800 Subject: blktrace: add trace/ to /sys/block/sda Impact: allow ftrace-plugin blktrace to trace device-mapper devices To trace a single partition: # echo 1 > /sys/block/sda/sda1/enable To trace the whole sda instead: # echo 1 > /sys/block/sda/enable Thus we also fix an issue reported by Ted, that ftrace-plugin blktrace can't be used to trace device-mapper devices. Now: # echo 1 > /sys/block/dm-0/trace/enable echo: write error: No such device or address # mount -t ext4 /dev/dm-0 /mnt # echo 1 > /sys/block/dm-0/trace/enable # echo blk > /debug/tracing/current_tracer Reported-by: Theodore Tso Signed-off-by: Li Zefan Acked-by: "Theodore Ts'o" Cc: Arnaldo Carvalho de Melo Cc: Shawn Du Cc: Jens Axboe LKML-Reference: <49E42665.6020506@cn.fujitsu.com> Signed-off-by: Ingo Molnar --- block/blk-sysfs.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'block') diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 73f36beff5c..8653d710b39 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -387,16 +387,21 @@ struct kobj_type blk_queue_ktype = { int blk_register_queue(struct gendisk *disk) { int ret; + struct device *dev = disk_to_dev(disk); struct request_queue *q = disk->queue; if (WARN_ON(!q)) return -ENXIO; + ret = blk_trace_init_sysfs(dev); + if (ret) + return ret; + if (!q->request_fn) return 0; - ret = kobject_add(&q->kobj, kobject_get(&disk_to_dev(disk)->kobj), + ret = kobject_add(&q->kobj, kobject_get(&dev->kobj), "%s", "queue"); if (ret < 0) return ret; -- cgit v1.2.3 From 6f41469c627fe118121dfce2a8134ad24da3df28 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sun, 19 Apr 2009 07:00:41 +0900 Subject: block: clear req->errors on bio completion only for fs requests Impact: subtle behavior change For fs requests, rq is only carrier of bios and rq error status as a whole doesn't mean much. This is the reason why rq->errors is being cleared on each partial completion of a request as on each partial completion the error status is transferred to the respective bios. For pc requests, rq->errors is used to carry error status to the issuer and thus __end_that_request_first() doesn't clear it on such cases. The condition was fine till now as only fs and pc requests have used bio and thus the bio completion path. However, future changes will unify data accesses to bio and all non fs users care about rq error status. Clear rq->errors on bio completion only for fs requests. In general, the implicit clearing is a bit too subtle especially as the meaning of rq->errors is completely dependent on low level drivers. Unifying / cleaning up rq->errors usage and letting llds manage it would be better. TODO comment added. Signed-off-by: Tejun Heo Acked-by: Jens Axboe --- block/blk-core.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 07ab75403e1..cce7a88dc61 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1739,10 +1739,14 @@ static int __end_that_request_first(struct request *req, int error, trace_block_rq_complete(req->q, req); /* - * for a REQ_TYPE_BLOCK_PC request, we want to carry any eventual - * sense key with us all the way through + * For fs requests, rq is just carrier of independent bio's + * and each partial completion should be handled separately. + * Reset per-request error on each partial completion. + * + * TODO: tj: This is too subtle. It would be better to let + * low level drivers do what they see fit. */ - if (!blk_pc_request(req)) + if (blk_fs_request(req)) req->errors = 0; if (error && (blk_fs_request(req) && !(req->cmd_flags & REQ_QUIET))) { -- cgit v1.2.3 From 924cec7789f65ab7f022256f6533ecba0747b5f3 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sun, 19 Apr 2009 07:00:41 +0900 Subject: block: clear req->errors on bio completion only for fs requests Impact: subtle behavior change For fs requests, rq is only carrier of bios and rq error status as a whole doesn't mean much. This is the reason why rq->errors is being cleared on each partial completion of a request as on each partial completion the error status is transferred to the respective bios. For pc requests, rq->errors is used to carry error status to the issuer and thus __end_that_request_first() doesn't clear it on such cases. The condition was fine till now as only fs and pc requests have used bio and thus the bio completion path. However, future changes will unify data accesses to bio and all non fs users care about rq error status. Clear rq->errors on bio completion only for fs requests. In general, the implicit clearing is a bit too subtle especially as the meaning of rq->errors is completely dependent on low level drivers. Unifying / cleaning up rq->errors usage and letting llds manage it would be better. TODO comment added. Signed-off-by: Tejun Heo Acked-by: Jens Axboe --- block/blk-core.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 2998fe3a237..41bc0ff75e2 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1741,10 +1741,14 @@ static int __end_that_request_first(struct request *req, int error, trace_block_rq_complete(req->q, req); /* - * for a REQ_TYPE_BLOCK_PC request, we want to carry any eventual - * sense key with us all the way through + * For fs requests, rq is just carrier of independent bio's + * and each partial completion should be handled separately. + * Reset per-request error on each partial completion. + * + * TODO: tj: This is too subtle. It would be better to let + * low level drivers do what they see fit. */ - if (!blk_pc_request(req)) + if (blk_fs_request(req)) req->errors = 0; if (error && (blk_fs_request(req) && !(req->cmd_flags & REQ_QUIET))) { -- cgit v1.2.3 From db29a6b49674085f136331014ba0eee249c16a2c Mon Sep 17 00:00:00 2001 From: Bartlomiej Zolnierkiewicz Date: Tue, 21 Apr 2009 09:27:03 +0200 Subject: block: enable by default support for large devices and files on 32-bit archs Enable by default support for large devices and files (CONFIG_LBD): - With 1TB disks being a commodity hardware it is quite easy to hit 2TB limitation while building RAIDs etc. and many distros have been using CONFIG_LBD=y by default already (at least Fedora 10 and openSUSE 11.1). - This should also prevent a subtle ext4 filesystem compatibility issue: mke2fs.ext4 defaults to creating filesystems with huge_files feature enabled and such filesystems cannot be later mounted read-write on machines with CONFIG_LBD=n (it should be quite easy to hit this issue when trying to use filesystem created using distro kernel on system running the self-build kernel, think about USB disk enclosures & co.). While at it: - Clarify config option help text w.r.t. mounting ext4 filesystems (they can be mounted with CONFIG_LBD=n but in the read-only mode). Cc: "Theodore Ts'o" Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Jens Axboe --- block/Kconfig | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'block') diff --git a/block/Kconfig b/block/Kconfig index e7d12782bcf..2c39527aa7d 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -26,6 +26,7 @@ if BLOCK config LBD bool "Support for large block devices and files" depends on !64BIT + default y help Enable block devices or files of size 2TB and larger. @@ -38,11 +39,13 @@ config LBD The ext4 filesystem requires that this feature be enabled in order to support filesystems that have the huge_file feature - enabled. Otherwise, it will refuse to mount any filesystems - that use the huge_file feature, which is enabled by default - by mke2fs.ext4. The GFS2 filesystem also requires this feature. + enabled. Otherwise, it will refuse to mount in the read-write + mode any filesystems that use the huge_file feature, which is + enabled by default by mke2fs.ext4. - If unsure, say N. + The GFS2 filesystem also requires this feature. + + If unsure, say Y. config BLK_DEV_BSG bool "Block layer SG support v4 (EXPERIMENTAL)" -- cgit v1.2.3 From a538cd03be6f363d039daa94199c28cfbd508455 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:17 +0900 Subject: block: merge blk_invoke_request_fn() into __blk_run_queue() __blk_run_queue wraps blk_invoke_request_fn() such that it additionally removes plug and bails out early if the queue is empty. Both extra operations have their own pending mechanisms and don't cause any harm correctness-wise when they are done superflously. The only user of blk_invoke_request_fn() being blk_start_queue(), there isn't much reason to keep both functions around. Merge blk_invoke_request_fn() into __blk_run_queue() and make blk_start_queue() use __blk_run_queue() instead. [ Impact: merge two subtly different internal functions ] Signed-off-by: Tejun Heo --- block/blk-core.c | 35 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 21 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 41bc0ff75e2..02f53bc00e4 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -333,24 +333,6 @@ void blk_unplug(struct request_queue *q) } EXPORT_SYMBOL(blk_unplug); -static void blk_invoke_request_fn(struct request_queue *q) -{ - if (unlikely(blk_queue_stopped(q))) - return; - - /* - * one level of recursion is ok and is much faster than kicking - * the unplug handling - */ - if (!queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) { - q->request_fn(q); - queue_flag_clear(QUEUE_FLAG_REENTER, q); - } else { - queue_flag_set(QUEUE_FLAG_PLUGGED, q); - kblockd_schedule_work(q, &q->unplug_work); - } -} - /** * blk_start_queue - restart a previously stopped queue * @q: The &struct request_queue in question @@ -365,7 +347,7 @@ void blk_start_queue(struct request_queue *q) WARN_ON(!irqs_disabled()); queue_flag_clear(QUEUE_FLAG_STOPPED, q); - blk_invoke_request_fn(q); + __blk_run_queue(q); } EXPORT_SYMBOL(blk_start_queue); @@ -425,12 +407,23 @@ void __blk_run_queue(struct request_queue *q) { blk_remove_plug(q); + if (unlikely(blk_queue_stopped(q))) + return; + + if (elv_queue_empty(q)) + return; + /* * Only recurse once to avoid overrunning the stack, let the unplug * handling reinvoke the handler shortly if we already got there. */ - if (!elv_queue_empty(q)) - blk_invoke_request_fn(q); + if (!queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) { + q->request_fn(q); + queue_flag_clear(QUEUE_FLAG_REENTER, q); + } else { + queue_flag_set(QUEUE_FLAG_PLUGGED, q); + kblockd_schedule_work(q, &q->unplug_work); + } } EXPORT_SYMBOL(__blk_run_queue); -- cgit v1.2.3 From a7f557923441186a3cdbabc54f1bcacf42b63bf5 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:17 +0900 Subject: block: kill blk_start_queueing() blk_start_queueing() is identical to __blk_run_queue() except that it doesn't check for recursion. None of the current users depends on blk_start_queueing() running request_fn directly. Replace usages of blk_start_queueing() with [__]blk_run_queue() and kill it. [ Impact: removal of mostly duplicate interface function ] Signed-off-by: Tejun Heo --- block/as-iosched.c | 6 +----- block/blk-core.c | 28 ++-------------------------- block/cfq-iosched.c | 6 +++--- block/elevator.c | 7 +++---- 4 files changed, 9 insertions(+), 38 deletions(-) (limited to 'block') diff --git a/block/as-iosched.c b/block/as-iosched.c index c48fa670d22..45bd07059c2 100644 --- a/block/as-iosched.c +++ b/block/as-iosched.c @@ -1312,12 +1312,8 @@ static void as_merged_requests(struct request_queue *q, struct request *req, static void as_work_handler(struct work_struct *work) { struct as_data *ad = container_of(work, struct as_data, antic_work); - struct request_queue *q = ad->q; - unsigned long flags; - spin_lock_irqsave(q->queue_lock, flags); - blk_start_queueing(q); - spin_unlock_irqrestore(q->queue_lock, flags); + blk_run_queue(ad->q); } static int as_may_queue(struct request_queue *q, int rw) diff --git a/block/blk-core.c b/block/blk-core.c index 02f53bc00e4..8b4a0af7d69 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -433,9 +433,7 @@ EXPORT_SYMBOL(__blk_run_queue); * * Description: * Invoke request handling on this queue, if it has pending work to do. - * May be used to restart queueing when a request has completed. Also - * See @blk_start_queueing. - * + * May be used to restart queueing when a request has completed. */ void blk_run_queue(struct request_queue *q) { @@ -894,28 +892,6 @@ struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) } EXPORT_SYMBOL(blk_get_request); -/** - * blk_start_queueing - initiate dispatch of requests to device - * @q: request queue to kick into gear - * - * This is basically a helper to remove the need to know whether a queue - * is plugged or not if someone just wants to initiate dispatch of requests - * for this queue. Should be used to start queueing on a device outside - * of ->request_fn() context. Also see @blk_run_queue. - * - * The queue lock must be held with interrupts disabled. - */ -void blk_start_queueing(struct request_queue *q) -{ - if (!blk_queue_plugged(q)) { - if (unlikely(blk_queue_stopped(q))) - return; - q->request_fn(q); - } else - __generic_unplug_device(q); -} -EXPORT_SYMBOL(blk_start_queueing); - /** * blk_requeue_request - put a request back on queue * @q: request queue where request should be inserted @@ -984,7 +960,7 @@ void blk_insert_request(struct request_queue *q, struct request *rq, drive_stat_acct(rq, 1); __elv_add_request(q, rq, where, 0); - blk_start_queueing(q); + __blk_run_queue(q); spin_unlock_irqrestore(q->queue_lock, flags); } EXPORT_SYMBOL(blk_insert_request); diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index a55a9bd75bd..def0c698a4b 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -2088,7 +2088,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq, if (blk_rq_bytes(rq) > PAGE_CACHE_SIZE || cfqd->busy_queues > 1) { del_timer(&cfqd->idle_slice_timer); - blk_start_queueing(cfqd->queue); + __blk_run_queue(cfqd->queue); } cfq_mark_cfqq_must_dispatch(cfqq); } @@ -2100,7 +2100,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq, * this new queue is RT and the current one is BE */ cfq_preempt_queue(cfqd, cfqq); - blk_start_queueing(cfqd->queue); + __blk_run_queue(cfqd->queue); } } @@ -2345,7 +2345,7 @@ static void cfq_kick_queue(struct work_struct *work) struct request_queue *q = cfqd->queue; spin_lock_irq(q->queue_lock); - blk_start_queueing(q); + __blk_run_queue(cfqd->queue); spin_unlock_irq(q->queue_lock); } diff --git a/block/elevator.c b/block/elevator.c index 7073a907257..2e0fb21485b 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -599,7 +599,7 @@ void elv_quiesce_start(struct request_queue *q) */ elv_drain_elevator(q); while (q->rq.elvpriv) { - blk_start_queueing(q); + __blk_run_queue(q); spin_unlock_irq(q->queue_lock); msleep(10); spin_lock_irq(q->queue_lock); @@ -643,8 +643,7 @@ void elv_insert(struct request_queue *q, struct request *rq, int where) * with anything. There's no point in delaying queue * processing. */ - blk_remove_plug(q); - blk_start_queueing(q); + __blk_run_queue(q); break; case ELEVATOR_INSERT_SORT: @@ -971,7 +970,7 @@ void elv_completed_request(struct request_queue *q, struct request *rq) blk_ordered_cur_seq(q) == QUEUE_ORDSEQ_DRAIN && (!next || blk_ordered_req_seq(next) > QUEUE_ORDSEQ_DRAIN)) { blk_ordered_complete_seq(q, QUEUE_ORDSEQ_DRAIN, 0); - blk_start_queueing(q); + __blk_run_queue(q); } } } -- cgit v1.2.3 From e4025f6c21f1389696c069be2dc647f364925c45 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:17 +0900 Subject: block: don't set REQ_NOMERGE unnecessarily RQ_NOMERGE_FLAGS already clears defines which REQ flags aren't mergeable. There is no reason to specify it superflously. It only adds to confusion. Don't set REQ_NOMERGE for barriers and requests with specific queueing directive. REQ_NOMERGE is now exclusively used by the merging code. [ Impact: cleanup ] Signed-off-by: Tejun Heo --- block/blk-core.c | 5 +---- block/blk-exec.c | 1 - 2 files changed, 1 insertion(+), 5 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 8b4a0af7d69..7e0fab53e93 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1082,16 +1082,13 @@ void init_request_from_bio(struct request *req, struct bio *bio) if (bio_failfast_driver(bio)) req->cmd_flags |= REQ_FAILFAST_DRIVER; - /* - * REQ_BARRIER implies no merging, but lets make it explicit - */ if (unlikely(bio_discard(bio))) { req->cmd_flags |= REQ_DISCARD; if (bio_barrier(bio)) req->cmd_flags |= REQ_SOFTBARRIER; req->q->prepare_discard_fn(req->q, req); } else if (unlikely(bio_barrier(bio))) - req->cmd_flags |= (REQ_HARDBARRIER | REQ_NOMERGE); + req->cmd_flags |= REQ_HARDBARRIER; if (bio_sync(bio)) req->cmd_flags |= REQ_RW_SYNC; diff --git a/block/blk-exec.c b/block/blk-exec.c index 6af716d1e54..49557e91f0d 100644 --- a/block/blk-exec.c +++ b/block/blk-exec.c @@ -51,7 +51,6 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk, int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK; rq->rq_disk = bd_disk; - rq->cmd_flags |= REQ_NOMERGE; rq->end_io = done; WARN_ON(irqs_disabled()); spin_lock_irq(q->queue_lock); -- cgit v1.2.3 From 10732f5661fb7adf62e20733b0030fc0fc93b0c4 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:18 +0900 Subject: block: cleanup REQ_SOFTBARRIER usages blk_insert_request() doesn't need to worry about REQ_SOFTBARRIER. Don't set it. Combined with recent ide updates, REQ_SOFTBARRIER is now only used in elevator proper and for discard requests. [ Impact: cleanup ] Signed-off-by: Tejun Heo --- block/blk-core.c | 1 - 1 file changed, 1 deletion(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 7e0fab53e93..cf10dfcda99 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -946,7 +946,6 @@ void blk_insert_request(struct request_queue *q, struct request *rq, * barrier */ rq->cmd_type = REQ_TYPE_SPECIAL; - rq->cmd_flags |= REQ_SOFTBARRIER; rq->special = data; -- cgit v1.2.3 From 2eef33e439ba9ae387cdc3f1abcef2f3f6c4e7a8 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:18 +0900 Subject: block: clean up misc stuff after block layer timeout conversion * In blk_rq_timed_out_timer(), else { if } to else if * In blk_add_timer(), simplify if/else block [ Impact: cleanup ] Signed-off-by: Tejun Heo --- block/blk-timeout.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) (limited to 'block') diff --git a/block/blk-timeout.c b/block/blk-timeout.c index 1ec0d503cac..1ba7e0aca87 100644 --- a/block/blk-timeout.c +++ b/block/blk-timeout.c @@ -122,10 +122,8 @@ void blk_rq_timed_out_timer(unsigned long data) if (blk_mark_rq_complete(rq)) continue; blk_rq_timed_out(rq); - } else { - if (!next || time_after(next, rq->deadline)) - next = rq->deadline; - } + } else if (!next || time_after(next, rq->deadline)) + next = rq->deadline; } /* @@ -176,16 +174,14 @@ void blk_add_timer(struct request *req) BUG_ON(!list_empty(&req->timeout_list)); BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags)); - if (req->timeout) - req->deadline = jiffies + req->timeout; - else { - req->deadline = jiffies + q->rq_timeout; - /* - * Some LLDs, like scsi, peek at the timeout to prevent - * a command from being retried forever. - */ + /* + * Some LLDs, like scsi, peek at the timeout to prevent a + * command from being retried forever. + */ + if (!req->timeout) req->timeout = q->rq_timeout; - } + + req->deadline = jiffies + req->timeout; list_add_tail(&req->timeout_list, &q->timeout_list); /* -- cgit v1.2.3 From 5efccd17ceb0fc43837a331297c2c407969d7201 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:18 +0900 Subject: block: reorder request completion functions Reorder request completion functions such that * All request completion functions are located together. * Functions which are used by only one caller is put right above the caller. * end_request() is put after other completion functions but before blk_update_request(). This change is for completion function cleanup which will follow. [ Impact: cleanup, code reorganization ] Signed-off-by: Tejun Heo --- block/blk-core.c | 144 +++++++++++++++++++++++++++---------------------------- 1 file changed, 72 insertions(+), 72 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index cf10dfcda99..406a93e526b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1683,6 +1683,35 @@ static void blk_account_io_done(struct request *req) } } +/** + * blk_rq_bytes - Returns bytes left to complete in the entire request + * @rq: the request being processed + **/ +unsigned int blk_rq_bytes(struct request *rq) +{ + if (blk_fs_request(rq)) + return rq->hard_nr_sectors << 9; + + return rq->data_len; +} +EXPORT_SYMBOL_GPL(blk_rq_bytes); + +/** + * blk_rq_cur_bytes - Returns bytes left to complete in the current segment + * @rq: the request being processed + **/ +unsigned int blk_rq_cur_bytes(struct request *rq) +{ + if (blk_fs_request(rq)) + return rq->current_nr_sectors << 9; + + if (rq->bio) + return rq->bio->bi_size; + + return rq->data_len; +} +EXPORT_SYMBOL_GPL(blk_rq_cur_bytes); + /** * __end_that_request_first - end I/O on a request * @req: the request being processed @@ -1797,6 +1826,22 @@ static int __end_that_request_first(struct request *req, int error, return 1; } +static int end_that_request_data(struct request *rq, int error, + unsigned int nr_bytes, unsigned int bidi_bytes) +{ + if (rq->bio) { + if (__end_that_request_first(rq, error, nr_bytes)) + return 1; + + /* Bidi request must be completed as a whole */ + if (blk_bidi_rq(rq) && + __end_that_request_first(rq->next_rq, error, bidi_bytes)) + return 1; + } + + return 0; +} + /* * queue lock must be held */ @@ -1825,78 +1870,6 @@ static void end_that_request_last(struct request *req, int error) } } -/** - * blk_rq_bytes - Returns bytes left to complete in the entire request - * @rq: the request being processed - **/ -unsigned int blk_rq_bytes(struct request *rq) -{ - if (blk_fs_request(rq)) - return rq->hard_nr_sectors << 9; - - return rq->data_len; -} -EXPORT_SYMBOL_GPL(blk_rq_bytes); - -/** - * blk_rq_cur_bytes - Returns bytes left to complete in the current segment - * @rq: the request being processed - **/ -unsigned int blk_rq_cur_bytes(struct request *rq) -{ - if (blk_fs_request(rq)) - return rq->current_nr_sectors << 9; - - if (rq->bio) - return rq->bio->bi_size; - - return rq->data_len; -} -EXPORT_SYMBOL_GPL(blk_rq_cur_bytes); - -/** - * end_request - end I/O on the current segment of the request - * @req: the request being processed - * @uptodate: error value or %0/%1 uptodate flag - * - * Description: - * Ends I/O on the current segment of a request. If that is the only - * remaining segment, the request is also completed and freed. - * - * This is a remnant of how older block drivers handled I/O completions. - * Modern drivers typically end I/O on the full request in one go, unless - * they have a residual value to account for. For that case this function - * isn't really useful, unless the residual just happens to be the - * full current segment. In other words, don't use this function in new - * code. Use blk_end_request() or __blk_end_request() to end a request. - **/ -void end_request(struct request *req, int uptodate) -{ - int error = 0; - - if (uptodate <= 0) - error = uptodate ? uptodate : -EIO; - - __blk_end_request(req, error, req->hard_cur_sectors << 9); -} -EXPORT_SYMBOL(end_request); - -static int end_that_request_data(struct request *rq, int error, - unsigned int nr_bytes, unsigned int bidi_bytes) -{ - if (rq->bio) { - if (__end_that_request_first(rq, error, nr_bytes)) - return 1; - - /* Bidi request must be completed as a whole */ - if (blk_bidi_rq(rq) && - __end_that_request_first(rq->next_rq, error, bidi_bytes)) - return 1; - } - - return 0; -} - /** * blk_end_io - Generic end_io function to complete a request. * @rq: the request being processed @@ -2006,6 +1979,33 @@ int blk_end_bidi_request(struct request *rq, int error, unsigned int nr_bytes, } EXPORT_SYMBOL_GPL(blk_end_bidi_request); +/** + * end_request - end I/O on the current segment of the request + * @req: the request being processed + * @uptodate: error value or %0/%1 uptodate flag + * + * Description: + * Ends I/O on the current segment of a request. If that is the only + * remaining segment, the request is also completed and freed. + * + * This is a remnant of how older block drivers handled I/O completions. + * Modern drivers typically end I/O on the full request in one go, unless + * they have a residual value to account for. For that case this function + * isn't really useful, unless the residual just happens to be the + * full current segment. In other words, don't use this function in new + * code. Use blk_end_request() or __blk_end_request() to end a request. + **/ +void end_request(struct request *req, int uptodate) +{ + int error = 0; + + if (uptodate <= 0) + error = uptodate ? uptodate : -EIO; + + __blk_end_request(req, error, req->hard_cur_sectors << 9); +} +EXPORT_SYMBOL(end_request); + /** * blk_update_request - Special helper function for request stacking drivers * @rq: the request being processed -- cgit v1.2.3 From 158dbda0068e63c7cce7bd47c123bd1dfa5a902c Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:18 +0900 Subject: block: reorganize request fetching functions Impact: code reorganization elv_next_request() and elv_dequeue_request() are public block layer interface than actual elevator implementation. They mostly deal with how requests interact with block layer and low level drivers at the beginning of rqeuest processing whereas __elv_next_request() is the actual eleveator request fetching interface. Move the two functions to blk-core.c. This prepares for further interface cleanup. Signed-off-by: Tejun Heo --- block/blk-core.c | 95 +++++++++++++++++++++++++++++++++++++++++ block/blk.h | 37 ++++++++++++++++ block/elevator.c | 128 ------------------------------------------------------- 3 files changed, 132 insertions(+), 128 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 406a93e526b..678ede23ed0 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1712,6 +1712,101 @@ unsigned int blk_rq_cur_bytes(struct request *rq) } EXPORT_SYMBOL_GPL(blk_rq_cur_bytes); +struct request *elv_next_request(struct request_queue *q) +{ + struct request *rq; + int ret; + + while ((rq = __elv_next_request(q)) != NULL) { + if (!(rq->cmd_flags & REQ_STARTED)) { + /* + * This is the first time the device driver + * sees this request (possibly after + * requeueing). Notify IO scheduler. + */ + if (blk_sorted_rq(rq)) + elv_activate_rq(q, rq); + + /* + * just mark as started even if we don't start + * it, a request that has been delayed should + * not be passed by new incoming requests + */ + rq->cmd_flags |= REQ_STARTED; + trace_block_rq_issue(q, rq); + } + + if (!q->boundary_rq || q->boundary_rq == rq) { + q->end_sector = rq_end_sector(rq); + q->boundary_rq = NULL; + } + + if (rq->cmd_flags & REQ_DONTPREP) + break; + + if (q->dma_drain_size && rq->data_len) { + /* + * make sure space for the drain appears we + * know we can do this because max_hw_segments + * has been adjusted to be one fewer than the + * device can handle + */ + rq->nr_phys_segments++; + } + + if (!q->prep_rq_fn) + break; + + ret = q->prep_rq_fn(q, rq); + if (ret == BLKPREP_OK) { + break; + } else if (ret == BLKPREP_DEFER) { + /* + * the request may have been (partially) prepped. + * we need to keep this request in the front to + * avoid resource deadlock. REQ_STARTED will + * prevent other fs requests from passing this one. + */ + if (q->dma_drain_size && rq->data_len && + !(rq->cmd_flags & REQ_DONTPREP)) { + /* + * remove the space for the drain we added + * so that we don't add it again + */ + --rq->nr_phys_segments; + } + + rq = NULL; + break; + } else if (ret == BLKPREP_KILL) { + rq->cmd_flags |= REQ_QUIET; + __blk_end_request(rq, -EIO, blk_rq_bytes(rq)); + } else { + printk(KERN_ERR "%s: bad return=%d\n", __func__, ret); + break; + } + } + + return rq; +} +EXPORT_SYMBOL(elv_next_request); + +void elv_dequeue_request(struct request_queue *q, struct request *rq) +{ + BUG_ON(list_empty(&rq->queuelist)); + BUG_ON(ELV_ON_HASH(rq)); + + list_del_init(&rq->queuelist); + + /* + * the time frame between a request being removed from the lists + * and to it is freed is accounted as io that is in progress at + * the driver side. + */ + if (blk_account_rq(rq)) + q->in_flight++; +} + /** * __end_that_request_first - end I/O on a request * @req: the request being processed diff --git a/block/blk.h b/block/blk.h index 79c85f7c9ff..9b2c324e440 100644 --- a/block/blk.h +++ b/block/blk.h @@ -43,6 +43,43 @@ static inline void blk_clear_rq_complete(struct request *rq) clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags); } +/* + * Internal elevator interface + */ +#define ELV_ON_HASH(rq) (!hlist_unhashed(&(rq)->hash)) + +static inline struct request *__elv_next_request(struct request_queue *q) +{ + struct request *rq; + + while (1) { + while (!list_empty(&q->queue_head)) { + rq = list_entry_rq(q->queue_head.next); + if (blk_do_ordered(q, &rq)) + return rq; + } + + if (!q->elevator->ops->elevator_dispatch_fn(q, 0)) + return NULL; + } +} + +static inline void elv_activate_rq(struct request_queue *q, struct request *rq) +{ + struct elevator_queue *e = q->elevator; + + if (e->ops->elevator_activate_req_fn) + e->ops->elevator_activate_req_fn(q, rq); +} + +static inline void elv_deactivate_rq(struct request_queue *q, struct request *rq) +{ + struct elevator_queue *e = q->elevator; + + if (e->ops->elevator_deactivate_req_fn) + e->ops->elevator_deactivate_req_fn(q, rq); +} + #ifdef CONFIG_FAIL_IO_TIMEOUT int blk_should_fake_timeout(struct request_queue *); ssize_t part_timeout_show(struct device *, struct device_attribute *, char *); diff --git a/block/elevator.c b/block/elevator.c index 2e0fb21485b..b03b8752e18 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -53,7 +53,6 @@ static const int elv_hash_shift = 6; (hash_long(ELV_HASH_BLOCK((sec)), elv_hash_shift)) #define ELV_HASH_ENTRIES (1 << elv_hash_shift) #define rq_hash_key(rq) ((rq)->sector + (rq)->nr_sectors) -#define ELV_ON_HASH(rq) (!hlist_unhashed(&(rq)->hash)) DEFINE_TRACE(block_rq_insert); DEFINE_TRACE(block_rq_issue); @@ -310,22 +309,6 @@ void elevator_exit(struct elevator_queue *e) } EXPORT_SYMBOL(elevator_exit); -static void elv_activate_rq(struct request_queue *q, struct request *rq) -{ - struct elevator_queue *e = q->elevator; - - if (e->ops->elevator_activate_req_fn) - e->ops->elevator_activate_req_fn(q, rq); -} - -static void elv_deactivate_rq(struct request_queue *q, struct request *rq) -{ - struct elevator_queue *e = q->elevator; - - if (e->ops->elevator_deactivate_req_fn) - e->ops->elevator_deactivate_req_fn(q, rq); -} - static inline void __elv_rqhash_del(struct request *rq) { hlist_del_init(&rq->hash); @@ -758,117 +741,6 @@ void elv_add_request(struct request_queue *q, struct request *rq, int where, } EXPORT_SYMBOL(elv_add_request); -static inline struct request *__elv_next_request(struct request_queue *q) -{ - struct request *rq; - - while (1) { - while (!list_empty(&q->queue_head)) { - rq = list_entry_rq(q->queue_head.next); - if (blk_do_ordered(q, &rq)) - return rq; - } - - if (!q->elevator->ops->elevator_dispatch_fn(q, 0)) - return NULL; - } -} - -struct request *elv_next_request(struct request_queue *q) -{ - struct request *rq; - int ret; - - while ((rq = __elv_next_request(q)) != NULL) { - if (!(rq->cmd_flags & REQ_STARTED)) { - /* - * This is the first time the device driver - * sees this request (possibly after - * requeueing). Notify IO scheduler. - */ - if (blk_sorted_rq(rq)) - elv_activate_rq(q, rq); - - /* - * just mark as started even if we don't start - * it, a request that has been delayed should - * not be passed by new incoming requests - */ - rq->cmd_flags |= REQ_STARTED; - trace_block_rq_issue(q, rq); - } - - if (!q->boundary_rq || q->boundary_rq == rq) { - q->end_sector = rq_end_sector(rq); - q->boundary_rq = NULL; - } - - if (rq->cmd_flags & REQ_DONTPREP) - break; - - if (q->dma_drain_size && rq->data_len) { - /* - * make sure space for the drain appears we - * know we can do this because max_hw_segments - * has been adjusted to be one fewer than the - * device can handle - */ - rq->nr_phys_segments++; - } - - if (!q->prep_rq_fn) - break; - - ret = q->prep_rq_fn(q, rq); - if (ret == BLKPREP_OK) { - break; - } else if (ret == BLKPREP_DEFER) { - /* - * the request may have been (partially) prepped. - * we need to keep this request in the front to - * avoid resource deadlock. REQ_STARTED will - * prevent other fs requests from passing this one. - */ - if (q->dma_drain_size && rq->data_len && - !(rq->cmd_flags & REQ_DONTPREP)) { - /* - * remove the space for the drain we added - * so that we don't add it again - */ - --rq->nr_phys_segments; - } - - rq = NULL; - break; - } else if (ret == BLKPREP_KILL) { - rq->cmd_flags |= REQ_QUIET; - __blk_end_request(rq, -EIO, blk_rq_bytes(rq)); - } else { - printk(KERN_ERR "%s: bad return=%d\n", __func__, ret); - break; - } - } - - return rq; -} -EXPORT_SYMBOL(elv_next_request); - -void elv_dequeue_request(struct request_queue *q, struct request *rq) -{ - BUG_ON(list_empty(&rq->queuelist)); - BUG_ON(ELV_ON_HASH(rq)); - - list_del_init(&rq->queuelist); - - /* - * the time frame between a request being removed from the lists - * and to it is freed is accounted as io that is in progress at - * the driver side. - */ - if (blk_account_rq(rq)) - q->in_flight++; -} - int elv_queue_empty(struct request_queue *q) { struct elevator_queue *e = q->elevator; -- cgit v1.2.3 From 0b302d5aa7975006fa2ec3d66386610b9b36c669 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:18 +0900 Subject: block: kill blk_end_request_callback() With recent IDE updates, blk_end_request_callback() doesn't have any user now. Kill it. [ Impact: removal of unused convoluted interface ] Signed-off-by: Tejun Heo --- block/blk-core.c | 48 +++--------------------------------------------- 1 file changed, 3 insertions(+), 45 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 678ede23ed0..2f277ea0e59 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1971,10 +1971,6 @@ static void end_that_request_last(struct request *req, int error) * @error: %0 for success, < %0 for error * @nr_bytes: number of bytes to complete @rq * @bidi_bytes: number of bytes to complete @rq->next_rq - * @drv_callback: function called between completion of bios in the request - * and completion of the request. - * If the callback returns non %0, this helper returns without - * completion of the request. * * Description: * Ends I/O on a number of bytes attached to @rq and @rq->next_rq. @@ -1985,8 +1981,7 @@ static void end_that_request_last(struct request *req, int error) * %1 - this request is not freed yet, it still has pending buffers. **/ static int blk_end_io(struct request *rq, int error, unsigned int nr_bytes, - unsigned int bidi_bytes, - int (drv_callback)(struct request *)) + unsigned int bidi_bytes) { struct request_queue *q = rq->q; unsigned long flags = 0UL; @@ -1994,10 +1989,6 @@ static int blk_end_io(struct request *rq, int error, unsigned int nr_bytes, if (end_that_request_data(rq, error, nr_bytes, bidi_bytes)) return 1; - /* Special feature for tricky drivers */ - if (drv_callback && drv_callback(rq)) - return 1; - add_disk_randomness(rq->rq_disk); spin_lock_irqsave(q->queue_lock, flags); @@ -2023,7 +2014,7 @@ static int blk_end_io(struct request *rq, int error, unsigned int nr_bytes, **/ int blk_end_request(struct request *rq, int error, unsigned int nr_bytes) { - return blk_end_io(rq, error, nr_bytes, 0, NULL); + return blk_end_io(rq, error, nr_bytes, 0); } EXPORT_SYMBOL_GPL(blk_end_request); @@ -2070,7 +2061,7 @@ EXPORT_SYMBOL_GPL(__blk_end_request); int blk_end_bidi_request(struct request *rq, int error, unsigned int nr_bytes, unsigned int bidi_bytes) { - return blk_end_io(rq, error, nr_bytes, bidi_bytes, NULL); + return blk_end_io(rq, error, nr_bytes, bidi_bytes); } EXPORT_SYMBOL_GPL(blk_end_bidi_request); @@ -2131,39 +2122,6 @@ void blk_update_request(struct request *rq, int error, unsigned int nr_bytes) } EXPORT_SYMBOL_GPL(blk_update_request); -/** - * blk_end_request_callback - Special helper function for tricky drivers - * @rq: the request being processed - * @error: %0 for success, < %0 for error - * @nr_bytes: number of bytes to complete - * @drv_callback: function called between completion of bios in the request - * and completion of the request. - * If the callback returns non %0, this helper returns without - * completion of the request. - * - * Description: - * Ends I/O on a number of bytes attached to @rq. - * If @rq has leftover, sets it up for the next range of segments. - * - * This special helper function is used only for existing tricky drivers. - * (e.g. cdrom_newpc_intr() of ide-cd) - * This interface will be removed when such drivers are rewritten. - * Don't use this interface in other places anymore. - * - * Return: - * %0 - we are done with this request - * %1 - this request is not freed yet. - * this request still has pending buffers or - * the driver doesn't want to finish this request yet. - **/ -int blk_end_request_callback(struct request *rq, int error, - unsigned int nr_bytes, - int (drv_callback)(struct request *)) -{ - return blk_end_io(rq, error, nr_bytes, 0, drv_callback); -} -EXPORT_SYMBOL_GPL(blk_end_request_callback); - void blk_rq_bio_prep(struct request_queue *q, struct request *rq, struct bio *bio) { -- cgit v1.2.3 From 2e60e02297cf54e367567f2d85b2ca56b1c4a906 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:18 +0900 Subject: block: clean up request completion API Request completion has gone through several changes and became a bit messy over the time. Clean it up. 1. end_that_request_data() is a thin wrapper around end_that_request_data_first() which checks whether bio is NULL before doing anything and handles bidi completion. blk_update_request() is a thin wrapper around end_that_request_data() which clears nr_sectors on the last iteration but doesn't use the bidi completion. Clean it up by moving the initial bio NULL check and nr_sectors clearing on the last iteration into end_that_request_data() and renaming it to blk_update_request(), which makes blk_end_io() the only user of end_that_request_data(). Collapse end_that_request_data() into blk_end_io(). 2. There are four visible completion variants - blk_end_request(), __blk_end_request(), blk_end_bidi_request() and end_request(). blk_end_request() and blk_end_bidi_request() uses blk_end_request() as the backend but __blk_end_request() and end_request() use separate implementation in __blk_end_request() due to different locking rules. blk_end_bidi_request() is identical to blk_end_io(). Collapse blk_end_io() into blk_end_bidi_request(), separate out request update into internal helper blk_update_bidi_request() and add __blk_end_bidi_request(). Redefine [__]blk_end_request() as thin inline wrappers around [__]blk_end_bidi_request(). 3. As the whole request issue/completion usages are about to be modified and audited, it's a good chance to convert completion functions return bool which better indicates the intended meaning of return values. 4. The function name end_that_request_last() is from the days when it was a public interface and slighly confusing. Give it a proper internal name - blk_finish_request(). 5. Add description explaning that blk_end_bidi_request() can be safely used for uni requests as suggested by Boaz Harrosh. The only visible behavior change is from #1. nr_sectors counts are cleared after the final iteration no matter which function is used to complete the request. I couldn't find any place where the code assumes those nr_sectors counters contain the values for the last segment and this change is good as it makes the API much more consistent as the end result is now same whether a request is completed using [__]blk_end_request() alone or in combination with blk_update_request(). API further cleaned up per Christoph's suggestion. [ Impact: cleanup, rq->*nr_sectors always updated after req completion ] Signed-off-by: Tejun Heo Reviewed-by: Boaz Harrosh Cc: Christoph Hellwig --- block/blk-core.c | 226 ++++++++++++++++++------------------------------------- 1 file changed, 75 insertions(+), 151 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 2f277ea0e59..89cc05d9a7a 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1808,25 +1808,35 @@ void elv_dequeue_request(struct request_queue *q, struct request *rq) } /** - * __end_that_request_first - end I/O on a request - * @req: the request being processed + * blk_update_request - Special helper function for request stacking drivers + * @rq: the request being processed * @error: %0 for success, < %0 for error - * @nr_bytes: number of bytes to complete + * @nr_bytes: number of bytes to complete @rq * * Description: - * Ends I/O on a number of bytes attached to @req, and sets it up - * for the next range of segments (if any) in the cluster. + * Ends I/O on a number of bytes attached to @rq, but doesn't complete + * the request structure even if @rq doesn't have leftover. + * If @rq has leftover, sets it up for the next range of segments. + * + * This special helper function is only for request stacking drivers + * (e.g. request-based dm) so that they can handle partial completion. + * Actual device drivers should use blk_end_request instead. + * + * Passing the result of blk_rq_bytes() as @nr_bytes guarantees + * %false return from this function. * * Return: - * %0 - we are done with this request, call end_that_request_last() - * %1 - still buffers pending for this request + * %false - this request doesn't have any more data + * %true - this request has more data **/ -static int __end_that_request_first(struct request *req, int error, - int nr_bytes) +bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) { int total_bytes, bio_nbytes, next_idx = 0; struct bio *bio; + if (!req->bio) + return false; + trace_block_rq_complete(req->q, req); /* @@ -1903,8 +1913,16 @@ static int __end_that_request_first(struct request *req, int error, /* * completely done */ - if (!req->bio) - return 0; + if (!req->bio) { + /* + * Reset counters so that the request stacking driver + * can find how many bytes remain in the request + * later. + */ + req->nr_sectors = req->hard_nr_sectors = 0; + req->current_nr_sectors = req->hard_cur_sectors = 0; + return false; + } /* * if the request wasn't completed, update state @@ -1918,29 +1936,31 @@ static int __end_that_request_first(struct request *req, int error, blk_recalc_rq_sectors(req, total_bytes >> 9); blk_recalc_rq_segments(req); - return 1; + return true; } +EXPORT_SYMBOL_GPL(blk_update_request); -static int end_that_request_data(struct request *rq, int error, - unsigned int nr_bytes, unsigned int bidi_bytes) +static bool blk_update_bidi_request(struct request *rq, int error, + unsigned int nr_bytes, + unsigned int bidi_bytes) { - if (rq->bio) { - if (__end_that_request_first(rq, error, nr_bytes)) - return 1; + if (blk_update_request(rq, error, nr_bytes)) + return true; - /* Bidi request must be completed as a whole */ - if (blk_bidi_rq(rq) && - __end_that_request_first(rq->next_rq, error, bidi_bytes)) - return 1; - } + /* Bidi request must be completed as a whole */ + if (unlikely(blk_bidi_rq(rq)) && + blk_update_request(rq->next_rq, error, bidi_bytes)) + return true; - return 0; + add_disk_randomness(rq->rq_disk); + + return false; } /* * queue lock must be held */ -static void end_that_request_last(struct request *req, int error) +static void blk_finish_request(struct request *req, int error) { if (blk_rq_tagged(req)) blk_queue_end_tag(req->q, req); @@ -1966,161 +1986,65 @@ static void end_that_request_last(struct request *req, int error) } /** - * blk_end_io - Generic end_io function to complete a request. - * @rq: the request being processed - * @error: %0 for success, < %0 for error - * @nr_bytes: number of bytes to complete @rq - * @bidi_bytes: number of bytes to complete @rq->next_rq + * blk_end_bidi_request - Complete a bidi request + * @rq: the request to complete + * @error: %0 for success, < %0 for error + * @nr_bytes: number of bytes to complete @rq + * @bidi_bytes: number of bytes to complete @rq->next_rq * * Description: * Ends I/O on a number of bytes attached to @rq and @rq->next_rq. - * If @rq has leftover, sets it up for the next range of segments. + * Drivers that supports bidi can safely call this member for any + * type of request, bidi or uni. In the later case @bidi_bytes is + * just ignored. * * Return: - * %0 - we are done with this request - * %1 - this request is not freed yet, it still has pending buffers. + * %false - we are done with this request + * %true - still buffers pending for this request **/ -static int blk_end_io(struct request *rq, int error, unsigned int nr_bytes, - unsigned int bidi_bytes) +bool blk_end_bidi_request(struct request *rq, int error, + unsigned int nr_bytes, unsigned int bidi_bytes) { struct request_queue *q = rq->q; - unsigned long flags = 0UL; - - if (end_that_request_data(rq, error, nr_bytes, bidi_bytes)) - return 1; + unsigned long flags; - add_disk_randomness(rq->rq_disk); + if (blk_update_bidi_request(rq, error, nr_bytes, bidi_bytes)) + return true; spin_lock_irqsave(q->queue_lock, flags); - end_that_request_last(rq, error); + blk_finish_request(rq, error); spin_unlock_irqrestore(q->queue_lock, flags); - return 0; -} - -/** - * blk_end_request - Helper function for drivers to complete the request. - * @rq: the request being processed - * @error: %0 for success, < %0 for error - * @nr_bytes: number of bytes to complete - * - * Description: - * Ends I/O on a number of bytes attached to @rq. - * If @rq has leftover, sets it up for the next range of segments. - * - * Return: - * %0 - we are done with this request - * %1 - still buffers pending for this request - **/ -int blk_end_request(struct request *rq, int error, unsigned int nr_bytes) -{ - return blk_end_io(rq, error, nr_bytes, 0); -} -EXPORT_SYMBOL_GPL(blk_end_request); - -/** - * __blk_end_request - Helper function for drivers to complete the request. - * @rq: the request being processed - * @error: %0 for success, < %0 for error - * @nr_bytes: number of bytes to complete - * - * Description: - * Must be called with queue lock held unlike blk_end_request(). - * - * Return: - * %0 - we are done with this request - * %1 - still buffers pending for this request - **/ -int __blk_end_request(struct request *rq, int error, unsigned int nr_bytes) -{ - if (rq->bio && __end_that_request_first(rq, error, nr_bytes)) - return 1; - - add_disk_randomness(rq->rq_disk); - - end_that_request_last(rq, error); - - return 0; + return false; } -EXPORT_SYMBOL_GPL(__blk_end_request); +EXPORT_SYMBOL_GPL(blk_end_bidi_request); /** - * blk_end_bidi_request - Helper function for drivers to complete bidi request. - * @rq: the bidi request being processed + * __blk_end_bidi_request - Complete a bidi request with queue lock held + * @rq: the request to complete * @error: %0 for success, < %0 for error * @nr_bytes: number of bytes to complete @rq * @bidi_bytes: number of bytes to complete @rq->next_rq * * Description: - * Ends I/O on a number of bytes attached to @rq and @rq->next_rq. + * Identical to blk_end_bidi_request() except that queue lock is + * assumed to be locked on entry and remains so on return. * * Return: - * %0 - we are done with this request - * %1 - still buffers pending for this request - **/ -int blk_end_bidi_request(struct request *rq, int error, unsigned int nr_bytes, - unsigned int bidi_bytes) -{ - return blk_end_io(rq, error, nr_bytes, bidi_bytes); -} -EXPORT_SYMBOL_GPL(blk_end_bidi_request); - -/** - * end_request - end I/O on the current segment of the request - * @req: the request being processed - * @uptodate: error value or %0/%1 uptodate flag - * - * Description: - * Ends I/O on the current segment of a request. If that is the only - * remaining segment, the request is also completed and freed. - * - * This is a remnant of how older block drivers handled I/O completions. - * Modern drivers typically end I/O on the full request in one go, unless - * they have a residual value to account for. For that case this function - * isn't really useful, unless the residual just happens to be the - * full current segment. In other words, don't use this function in new - * code. Use blk_end_request() or __blk_end_request() to end a request. + * %false - we are done with this request + * %true - still buffers pending for this request **/ -void end_request(struct request *req, int uptodate) +bool __blk_end_bidi_request(struct request *rq, int error, + unsigned int nr_bytes, unsigned int bidi_bytes) { - int error = 0; + if (blk_update_bidi_request(rq, error, nr_bytes, bidi_bytes)) + return true; - if (uptodate <= 0) - error = uptodate ? uptodate : -EIO; + blk_finish_request(rq, error); - __blk_end_request(req, error, req->hard_cur_sectors << 9); -} -EXPORT_SYMBOL(end_request); - -/** - * blk_update_request - Special helper function for request stacking drivers - * @rq: the request being processed - * @error: %0 for success, < %0 for error - * @nr_bytes: number of bytes to complete @rq - * - * Description: - * Ends I/O on a number of bytes attached to @rq, but doesn't complete - * the request structure even if @rq doesn't have leftover. - * If @rq has leftover, sets it up for the next range of segments. - * - * This special helper function is only for request stacking drivers - * (e.g. request-based dm) so that they can handle partial completion. - * Actual device drivers should use blk_end_request instead. - */ -void blk_update_request(struct request *rq, int error, unsigned int nr_bytes) -{ - if (!end_that_request_data(rq, error, nr_bytes, 0)) { - /* - * These members are not updated in end_that_request_data() - * when all bios are completed. - * Update them so that the request stacking driver can find - * how many bytes remain in the request later. - */ - rq->nr_sectors = rq->hard_nr_sectors = 0; - rq->current_nr_sectors = rq->hard_cur_sectors = 0; - } + return false; } -EXPORT_SYMBOL_GPL(blk_update_request); +EXPORT_SYMBOL_GPL(__blk_end_bidi_request); void blk_rq_bio_prep(struct request_queue *q, struct request *rq, struct bio *bio) -- cgit v1.2.3 From b243ddcbe9be146172baa544dadecebf156eda0e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:18 +0900 Subject: block: move rq->start_time initialization to blk_rq_init() rq->start_time was initialized in init_request_from_bio() so special requests didn't have start_time set. This has been okay as start_time has been used only for fs requests; however, there is no indication of this actually is the case or not. Set rq->start_time in blk_rq_init() and guarantee that all initialized rq's have its start_time set. This improves consistency at virtually no cost and future changes will make use of the timestamp for !bio requests. [ Impact: rq->start_time is valid for all requests ] Signed-off-by: Tejun Heo --- block/blk-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 89cc05d9a7a..b84250d3019 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -134,6 +134,7 @@ void blk_rq_init(struct request_queue *q, struct request *rq) rq->cmd_len = BLK_MAX_CDB; rq->tag = -1; rq->ref_count = 1; + rq->start_time = jiffies; } EXPORT_SYMBOL(blk_rq_init); @@ -1099,7 +1100,6 @@ void init_request_from_bio(struct request *req, struct bio *bio) req->errors = 0; req->hard_sector = req->sector = bio->bi_sector; req->ioprio = bio_prio(bio); - req->start_time = jiffies; blk_rq_bio_prep(req->q, req, bio); } -- cgit v1.2.3 From 40cbbb781d3eba5d6ac0860db078af490e5c7c6b Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:19 +0900 Subject: block: implement and use [__]blk_end_request_all() There are many [__]blk_end_request() call sites which call it with full request length and expect full completion. Many of them ensure that the request actually completes by doing BUG_ON() the return value, which is awkward and error-prone. This patch adds [__]blk_end_request_all() which takes @rq and @error and fully completes the request. BUG_ON() is added to to ensure that this actually happens. Most conversions are simple but there are a few noteworthy ones. * cdrom/viocd: viocd_end_request() replaced with direct calls to __blk_end_request_all(). * s390/block/dasd: dasd_end_request() replaced with direct calls to __blk_end_request_all(). * s390/char/tape_block: tapeblock_end_request() replaced with direct calls to blk_end_request_all(). [ Impact: cleanup ] Signed-off-by: Tejun Heo Cc: Russell King Cc: Stephen Rothwell Cc: Mike Miller Cc: Martin Schwidefsky Cc: Jeff Garzik Cc: Rusty Russell Cc: Jeremy Fitzhardinge Cc: Alex Dubov Cc: James Bottomley --- block/blk-barrier.c | 9 ++------- block/blk-core.c | 2 +- block/elevator.c | 2 +- 3 files changed, 4 insertions(+), 9 deletions(-) (limited to 'block') diff --git a/block/blk-barrier.c b/block/blk-barrier.c index 20b4111fa05..c8d087655ef 100644 --- a/block/blk-barrier.c +++ b/block/blk-barrier.c @@ -106,10 +106,7 @@ bool blk_ordered_complete_seq(struct request_queue *q, unsigned seq, int error) */ q->ordseq = 0; rq = q->orig_bar_rq; - - if (__blk_end_request(rq, q->orderr, blk_rq_bytes(rq))) - BUG(); - + __blk_end_request_all(rq, q->orderr); return true; } @@ -252,9 +249,7 @@ bool blk_do_ordered(struct request_queue *q, struct request **rqp) * with prejudice. */ elv_dequeue_request(q, rq); - if (__blk_end_request(rq, -EOPNOTSUPP, - blk_rq_bytes(rq))) - BUG(); + __blk_end_request_all(rq, -EOPNOTSUPP); *rqp = NULL; return false; } diff --git a/block/blk-core.c b/block/blk-core.c index b84250d3019..0520cc70458 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1780,7 +1780,7 @@ struct request *elv_next_request(struct request_queue *q) break; } else if (ret == BLKPREP_KILL) { rq->cmd_flags |= REQ_QUIET; - __blk_end_request(rq, -EIO, blk_rq_bytes(rq)); + __blk_end_request_all(rq, -EIO); } else { printk(KERN_ERR "%s: bad return=%d\n", __func__, ret); break; diff --git a/block/elevator.c b/block/elevator.c index b03b8752e18..1af5d9f04af 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -810,7 +810,7 @@ void elv_abort_queue(struct request_queue *q) rq = list_entry_rq(q->queue_head.next); rq->cmd_flags |= REQ_QUIET; trace_block_rq_abort(q, rq); - __blk_end_request(rq, -EIO, blk_rq_bytes(rq)); + __blk_end_request_all(rq, -EIO); } } EXPORT_SYMBOL(elv_abort_queue); -- cgit v1.2.3 From 731ec497e5888c6792ad62613ae9be97eebcd7ca Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 23 Apr 2009 11:05:20 +0900 Subject: block: kill rq->data Now that all block request data transfer is done via bio, rq->data isn't used. Kill it. While at it, make the roles of rq->special and buffer clear. [ Impact: drop now unncessary field from struct request ] Signed-off-by: Tejun Heo Cc: Boaz Harrosh --- block/blk-core.c | 5 ++--- block/blk-map.c | 6 +++--- block/scsi_ioctl.c | 1 - 3 files changed, 5 insertions(+), 7 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 0520cc70458..6dd180cf15d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -189,10 +189,9 @@ void blk_dump_rq_flags(struct request *rq, char *msg) (unsigned long long)rq->sector, rq->nr_sectors, rq->current_nr_sectors); - printk(KERN_INFO " bio %p, biotail %p, buffer %p, data %p, len %u\n", + printk(KERN_INFO " bio %p, biotail %p, buffer %p, len %u\n", rq->bio, rq->biotail, - rq->buffer, rq->data, - rq->data_len); + rq->buffer, rq->data_len); if (blk_pc_request(rq)) { printk(KERN_INFO " cdb: "); diff --git a/block/blk-map.c b/block/blk-map.c index f103729b462..694fefad34e 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -156,7 +156,7 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq, if (!bio_flagged(bio, BIO_USER_MAPPED)) rq->cmd_flags |= REQ_COPY_USER; - rq->buffer = rq->data = NULL; + rq->buffer = NULL; return 0; unmap_rq: blk_rq_unmap_user(bio); @@ -235,7 +235,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, blk_queue_bounce(q, &bio); bio_get(bio); blk_rq_bio_prep(q, rq, bio); - rq->buffer = rq->data = NULL; + rq->buffer = NULL; return 0; } EXPORT_SYMBOL(blk_rq_map_user_iov); @@ -313,7 +313,7 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf, blk_rq_bio_prep(q, rq, bio); blk_queue_bounce(q, &rq->bio); - rq->buffer = rq->data = NULL; + rq->buffer = NULL; return 0; } EXPORT_SYMBOL(blk_rq_map_kern); diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index 82a0ca2f672..bb9aa43551b 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -500,7 +500,6 @@ static int __blk_send_generic(struct request_queue *q, struct gendisk *bd_disk, rq = blk_get_request(q, WRITE, __GFP_WAIT); rq->cmd_type = REQ_TYPE_BLOCK_PC; - rq->data = NULL; rq->data_len = 0; rq->extra_len = 0; rq->timeout = BLK_DEFAULT_SG_TIMEOUT; -- cgit v1.2.3 From c2553b5844b06910435e40cfab9e6f384840cb97 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Fri, 24 Apr 2009 08:10:11 +0200 Subject: block: make blk_do_io_stat() do the full "is this rq accountable" checks We currently check for file system requests outside of blk_do_io_stat(rq), but we may as well just include it. Signed-off-by: Jens Axboe --- block/blk-core.c | 12 +++--------- block/blk.h | 9 ++++++++- 2 files changed, 11 insertions(+), 10 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 6dd180cf15d..1e3b97f0ae6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -68,7 +68,7 @@ static void drive_stat_acct(struct request *rq, int new_io) int rw = rq_data_dir(rq); int cpu; - if (!blk_fs_request(rq) || !blk_do_io_stat(rq)) + if (!blk_do_io_stat(rq)) return; cpu = part_stat_lock(); @@ -1639,10 +1639,7 @@ EXPORT_SYMBOL(blkdev_dequeue_request); static void blk_account_io_completion(struct request *req, unsigned int bytes) { - if (!blk_do_io_stat(req)) - return; - - if (blk_fs_request(req)) { + if (blk_do_io_stat(req)) { const int rw = rq_data_dir(req); struct hd_struct *part; int cpu; @@ -1656,15 +1653,12 @@ static void blk_account_io_completion(struct request *req, unsigned int bytes) static void blk_account_io_done(struct request *req) { - if (!blk_do_io_stat(req)) - return; - /* * Account IO completion. bar_rq isn't accounted as a normal * IO on queueing nor completion. Accounting the containing * request is enough. */ - if (blk_fs_request(req) && req != &req->q->bar_rq) { + if (blk_do_io_stat(req) && req != &req->q->bar_rq) { unsigned long duration = jiffies - req->start_time; const int rw = rq_data_dir(req); struct hd_struct *part; diff --git a/block/blk.h b/block/blk.h index 9b2c324e440..404c10b25ca 100644 --- a/block/blk.h +++ b/block/blk.h @@ -149,9 +149,16 @@ static inline int blk_cpu_to_group(int cpu) #endif } +/* + * Contribute to IO statistics IFF: + * + * a) it's attached to a gendisk, and + * b) the queue had IO stats enabled when this request was started, and + * c) it's a file system request + */ static inline int blk_do_io_stat(struct request *rq) { - return rq->rq_disk && blk_rq_io_stat(rq); + return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq); } #endif -- cgit v1.2.3 From c69d48540c201394d08cb4d48b905e001313d9b8 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Fri, 24 Apr 2009 08:12:19 +0200 Subject: block: include discard requests in IO accounting We currently don't do merging on discard requests, but we potentially could. If we do, then we need to include discard requests in the IO accounting, or merging would end up decrementing in_flight IO counters for an IO which never incremented them. So enable accounting for discard requests. Problem found by Nikanth Karthikesan Signed-off-by: Jens Axboe --- block/blk.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'block') diff --git a/block/blk.h b/block/blk.h index 404c10b25ca..51115599df9 100644 --- a/block/blk.h +++ b/block/blk.h @@ -158,7 +158,8 @@ static inline int blk_cpu_to_group(int cpu) */ static inline int blk_do_io_stat(struct request *rq) { - return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq); + return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq) && + blk_discard_rq(rq); } #endif -- cgit v1.2.3 From 9eb55b030c4b3227334ee4482402096cd1d1a6fe Mon Sep 17 00:00:00 2001 From: Nikanth Karthikesan Date: Mon, 27 Apr 2009 14:53:54 +0200 Subject: block: catch trying to use more bits than request->cmd_flags has Signed-off-by: Nikanth Karthikesan Signed-off-by: Jens Axboe --- block/blk-core.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 1e3b97f0ae6..394c5bd8127 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2097,6 +2097,9 @@ EXPORT_SYMBOL(kblockd_schedule_work); int __init blk_dev_init(void) { + BUILD_BUG_ON(__REQ_NR_BITS > 8 * + sizeof(((struct request *)0)->cmd_flags)); + kblockd_workqueue = create_workqueue("kblockd"); if (!kblockd_workqueue) panic("Failed to create kblockd\n"); -- cgit v1.2.3 From 4c94dece1baf320d925cedb231489c4e0358ac5a Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 28 Apr 2009 13:06:05 +0900 Subject: block: don't init rq fields unnecessarily blk_get_request() always returns properly zeroed requests. Don't set fields to zero/NULL unnecessarily. [ Impact: cleanup ] Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe --- block/scsi_ioctl.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'block') diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index bb9aa43551b..58cf4560f74 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -500,8 +500,6 @@ static int __blk_send_generic(struct request_queue *q, struct gendisk *bd_disk, rq = blk_get_request(q, WRITE, __GFP_WAIT); rq->cmd_type = REQ_TYPE_BLOCK_PC; - rq->data_len = 0; - rq->extra_len = 0; rq->timeout = BLK_DEFAULT_SG_TIMEOUT; rq->cmd[0] = cmd; rq->cmd[4] = data; -- cgit v1.2.3 From 22a7c31a9659deaddafbbcec6562d44141e84474 Mon Sep 17 00:00:00 2001 From: "Alan D. Brunelle" Date: Mon, 4 May 2009 16:35:08 -0400 Subject: blktrace: from-sector redundant in trace_block_remap Remove redundant from-sector parameter: it's /always/ the bio's sector passed in. [ Impact: cleanup ] Signed-off-by: Alan D. Brunelle Reviewed-by: Li Zefan Reviewed-by: KOSAKI Motohiro Cc: Jens Axboe Cc: Arnaldo Carvalho de Melo LKML-Reference: <49FF517C.7000503@hp.com> Signed-off-by: Ingo Molnar --- block/blk-core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 07ab75403e1..a5f747a8312 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1275,7 +1275,7 @@ static inline void blk_partition_remap(struct bio *bio) bio->bi_bdev = bdev->bd_contains; trace_block_remap(bdev_get_queue(bio->bi_bdev), bio, - bdev->bd_dev, bio->bi_sector, + bdev->bd_dev, bio->bi_sector - p->start_sect); } } @@ -1444,8 +1444,7 @@ static inline void __generic_make_request(struct bio *bio) goto end_io; if (old_sector != -1) - trace_block_remap(q, bio, old_dev, bio->bi_sector, - old_sector); + trace_block_remap(q, bio, old_dev, old_sector); trace_block_bio_queue(q, bio); -- cgit v1.2.3 From c3a4d78c580de4edc9ef0f7c59812fb02ceb037f Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 7 May 2009 22:24:37 +0900 Subject: block: add rq->resid_len rq->data_len served two purposes - the length of data buffer on issue and the residual count on completion. This duality creates some headaches. First of all, block layer and low level drivers can't really determine what rq->data_len contains while a request is executing. It could be the total request length or it coulde be anything else one of the lower layers is using to keep track of residual count. This complicates things because blk_rq_bytes() and thus [__]blk_end_request_all() relies on rq->data_len for PC commands. Drivers which want to report residual count should first cache the total request length, update rq->data_len and then complete the request with the cached data length. Secondly, it makes requests default to reporting full residual count, ie. reporting that no data transfer occurred. The residual count is an exception not the norm; however, the driver should clear rq->data_len to zero to signify the normal cases while leaving it alone means no data transfer occurred at all. This reverse default behavior complicates code unnecessarily and renders block PC on some drivers (ide-tape/floppy) unuseable. This patch adds rq->resid_len which is used only for residual count. While at it, remove now unnecessasry blk_rq_bytes() caching in ide_pc_intr() as rq->data_len is not changed anymore. Boaz : spotted missing conversion in osd Sergei : spotted too early conversion to blk_rq_bytes() in ide-tape [ Impact: cleanup residual count handling, report 0 resid by default ] Signed-off-by: Tejun Heo Cc: James Bottomley Cc: Bartlomiej Zolnierkiewicz Cc: Borislav Petkov Cc: Sergei Shtylyov Cc: Mike Miller Cc: Eric Moore Cc: Alan Stern Cc: FUJITA Tomonori Cc: Doug Gilbert Cc: Mike Miller Cc: Eric Moore Cc: Darrick J. Wong Cc: Pete Zaitcev Cc: Boaz Harrosh Signed-off-by: Jens Axboe --- block/bsg.c | 8 ++++---- block/scsi_ioctl.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) (limited to 'block') diff --git a/block/bsg.c b/block/bsg.c index 206060e795d..2d746e34f4c 100644 --- a/block/bsg.c +++ b/block/bsg.c @@ -445,14 +445,14 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr, } if (rq->next_rq) { - hdr->dout_resid = rq->data_len; - hdr->din_resid = rq->next_rq->data_len; + hdr->dout_resid = rq->resid_len; + hdr->din_resid = rq->next_rq->resid_len; blk_rq_unmap_user(bidi_bio); blk_put_request(rq->next_rq); } else if (rq_data_dir(rq) == READ) - hdr->din_resid = rq->data_len; + hdr->din_resid = rq->resid_len; else - hdr->dout_resid = rq->data_len; + hdr->dout_resid = rq->resid_len; /* * If the request generated a negative error number, return it diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index 58cf4560f74..a9670dd4b5d 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -230,7 +230,7 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr, hdr->info = 0; if (hdr->masked_status || hdr->host_status || hdr->driver_status) hdr->info |= SG_INFO_CHECK; - hdr->resid = rq->data_len; + hdr->resid = rq->resid_len; hdr->sb_len_wr = 0; if (rq->sense_len && hdr->sbp) { -- cgit v1.2.3 From 5b93629b4509c03ffa87a9316412fedf6f58cb37 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 7 May 2009 22:24:38 +0900 Subject: block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones Implement accessors - blk_rq_pos(), blk_rq_sectors() and blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors and rq->hard_cur_sectors respectively and convert direct references of the said fields to the accessors. This is in preparation of request data length handling cleanup. Geert : suggested adding const to struct request * parameter to accessors Sergei : spotted error in patch description [ Impact: cleanup ] Signed-off-by: Tejun Heo Acked-by: Geert Uytterhoeven Acked-by: Stephen Rothwell Tested-by: Grant Likely Acked-by: Grant Likely Ackec-by: Sergei Shtylyov Cc: Bartlomiej Zolnierkiewicz Cc: Borislav Petkov Cc: James Bottomley Signed-off-by: Jens Axboe --- block/blk-barrier.c | 2 +- block/blk-core.c | 2 +- block/cfq-iosched.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) (limited to 'block') diff --git a/block/blk-barrier.c b/block/blk-barrier.c index c8d087655ef..c167de5b9ef 100644 --- a/block/blk-barrier.c +++ b/block/blk-barrier.c @@ -163,7 +163,7 @@ static inline bool start_ordered(struct request_queue *q, struct request **rqp) * For an empty barrier, there's no actual BAR request, which * in turn makes POSTFLUSH unnecessary. Mask them off. */ - if (!rq->hard_nr_sectors) { + if (!blk_rq_sectors(rq)) { q->ordered &= ~(QUEUE_ORDERED_DO_BAR | QUEUE_ORDERED_DO_POSTFLUSH); /* diff --git a/block/blk-core.c b/block/blk-core.c index 394c5bd8127..895e55b74a4 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1683,7 +1683,7 @@ static void blk_account_io_done(struct request *req) unsigned int blk_rq_bytes(struct request *rq) { if (blk_fs_request(rq)) - return rq->hard_nr_sectors << 9; + return blk_rq_sectors(rq) << 9; return rq->data_len; } diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index def0c698a4b..575083a9ffe 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -760,7 +760,7 @@ static void cfq_activate_request(struct request_queue *q, struct request *rq) cfq_log_cfqq(cfqd, RQ_CFQQ(rq), "activate rq, drv=%d", cfqd->rq_in_driver); - cfqd->last_position = rq->hard_sector + rq->hard_nr_sectors; + cfqd->last_position = blk_rq_pos(rq) + blk_rq_sectors(rq); } static void cfq_deactivate_request(struct request_queue *q, struct request *rq) -- cgit v1.2.3 From 83096ebf1263b2c1ee5e653ba37d993d02e3eb7b Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 7 May 2009 22:24:39 +0900 Subject: block: convert to pos and nr_sectors accessors With recent cleanups, there is no place where low level driver directly manipulates request fields. This means that the 'hard' request fields always equal the !hard fields. Convert all rq->sectors, nr_sectors and current_nr_sectors references to accessors. While at it, drop superflous blk_rq_pos() < 0 test in swim.c. [ Impact: use pos and nr_sectors accessors ] Signed-off-by: Tejun Heo Acked-by: Geert Uytterhoeven Tested-by: Grant Likely Acked-by: Grant Likely Tested-by: Adrian McMenamin Acked-by: Adrian McMenamin Acked-by: Mike Miller Cc: James Bottomley Cc: Bartlomiej Zolnierkiewicz Cc: Borislav Petkov Cc: Sergei Shtylyov Cc: Eric Moore Cc: Alan Stern Cc: FUJITA Tomonori Cc: Pete Zaitcev Cc: Stephen Rothwell Cc: Paul Clements Cc: Tim Waugh Cc: Jeff Garzik Cc: Jeremy Fitzhardinge Cc: Alex Dubov Cc: David Woodhouse Cc: Martin Schwidefsky Cc: Dario Ballabio Cc: David S. Miller Cc: Rusty Russell Cc: unsik Kim Cc: Laurent Vivier Signed-off-by: Jens Axboe --- block/as-iosched.c | 18 ++++++++++-------- block/blk-barrier.c | 2 +- block/blk-core.c | 17 ++++++++--------- block/blk-merge.c | 10 +++++----- block/cfq-iosched.c | 18 +++++++++--------- block/deadline-iosched.c | 2 +- block/elevator.c | 22 +++++++++++----------- 7 files changed, 45 insertions(+), 44 deletions(-) (limited to 'block') diff --git a/block/as-iosched.c b/block/as-iosched.c index 45bd07059c2..7a12cf6ee1d 100644 --- a/block/as-iosched.c +++ b/block/as-iosched.c @@ -306,8 +306,8 @@ as_choose_req(struct as_data *ad, struct request *rq1, struct request *rq2) data_dir = rq_is_sync(rq1); last = ad->last_sector[data_dir]; - s1 = rq1->sector; - s2 = rq2->sector; + s1 = blk_rq_pos(rq1); + s2 = blk_rq_pos(rq2); BUG_ON(data_dir != rq_is_sync(rq2)); @@ -566,13 +566,15 @@ static void as_update_iohist(struct as_data *ad, struct as_io_context *aic, as_update_thinktime(ad, aic, thinktime); /* Calculate read -> read seek distance */ - if (aic->last_request_pos < rq->sector) - seek_dist = rq->sector - aic->last_request_pos; + if (aic->last_request_pos < blk_rq_pos(rq)) + seek_dist = blk_rq_pos(rq) - + aic->last_request_pos; else - seek_dist = aic->last_request_pos - rq->sector; + seek_dist = aic->last_request_pos - + blk_rq_pos(rq); as_update_seekdist(ad, aic, seek_dist); } - aic->last_request_pos = rq->sector + rq->nr_sectors; + aic->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq); set_bit(AS_TASK_IOSTARTED, &aic->state); spin_unlock(&aic->lock); } @@ -587,7 +589,7 @@ static int as_close_req(struct as_data *ad, struct as_io_context *aic, { unsigned long delay; /* jiffies */ sector_t last = ad->last_sector[ad->batch_data_dir]; - sector_t next = rq->sector; + sector_t next = blk_rq_pos(rq); sector_t delta; /* acceptable close offset (in sectors) */ sector_t s; @@ -981,7 +983,7 @@ static void as_move_to_dispatch(struct as_data *ad, struct request *rq) * This has to be set in order to be correctly updated by * as_find_next_rq */ - ad->last_sector[data_dir] = rq->sector + rq->nr_sectors; + ad->last_sector[data_dir] = blk_rq_pos(rq) + blk_rq_sectors(rq); if (data_dir == BLK_RW_SYNC) { struct io_context *ioc = RQ_IOC(rq); diff --git a/block/blk-barrier.c b/block/blk-barrier.c index c167de5b9ef..8713c2fbc4f 100644 --- a/block/blk-barrier.c +++ b/block/blk-barrier.c @@ -324,7 +324,7 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector) /* * The driver must store the error location in ->bi_sector, if * it supports it. For non-stacked drivers, this should be copied - * from rq->sector. + * from blk_rq_pos(rq). */ if (error_sector) *error_sector = bio->bi_sector; diff --git a/block/blk-core.c b/block/blk-core.c index 895e55b74a4..82dc20621c0 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -72,7 +72,7 @@ static void drive_stat_acct(struct request *rq, int new_io) return; cpu = part_stat_lock(); - part = disk_map_sector_rcu(rq->rq_disk, rq->sector); + part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq)); if (!new_io) part_stat_inc(cpu, part, merges[rw]); @@ -185,10 +185,9 @@ void blk_dump_rq_flags(struct request *rq, char *msg) rq->rq_disk ? rq->rq_disk->disk_name : "?", rq->cmd_type, rq->cmd_flags); - printk(KERN_INFO " sector %llu, nr/cnr %lu/%u\n", - (unsigned long long)rq->sector, - rq->nr_sectors, - rq->current_nr_sectors); + printk(KERN_INFO " sector %llu, nr/cnr %u/%u\n", + (unsigned long long)blk_rq_pos(rq), + blk_rq_sectors(rq), blk_rq_cur_sectors(rq)); printk(KERN_INFO " bio %p, biotail %p, buffer %p, len %u\n", rq->bio, rq->biotail, rq->buffer, rq->data_len); @@ -1557,7 +1556,7 @@ EXPORT_SYMBOL(submit_bio); */ int blk_rq_check_limits(struct request_queue *q, struct request *rq) { - if (rq->nr_sectors > q->max_sectors || + if (blk_rq_sectors(rq) > q->max_sectors || rq->data_len > q->max_hw_sectors << 9) { printk(KERN_ERR "%s: over max size limit.\n", __func__); return -EIO; @@ -1645,7 +1644,7 @@ static void blk_account_io_completion(struct request *req, unsigned int bytes) int cpu; cpu = part_stat_lock(); - part = disk_map_sector_rcu(req->rq_disk, req->sector); + part = disk_map_sector_rcu(req->rq_disk, blk_rq_pos(req)); part_stat_add(cpu, part, sectors[rw], bytes >> 9); part_stat_unlock(); } @@ -1665,7 +1664,7 @@ static void blk_account_io_done(struct request *req) int cpu; cpu = part_stat_lock(); - part = disk_map_sector_rcu(req->rq_disk, req->sector); + part = disk_map_sector_rcu(req->rq_disk, blk_rq_pos(req)); part_stat_inc(cpu, part, ios[rw]); part_stat_add(cpu, part, ticks[rw], duration); @@ -1846,7 +1845,7 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) if (error && (blk_fs_request(req) && !(req->cmd_flags & REQ_QUIET))) { printk(KERN_ERR "end_request: I/O error, dev %s, sector %llu\n", req->rq_disk ? req->rq_disk->disk_name : "?", - (unsigned long long)req->sector); + (unsigned long long)blk_rq_pos(req)); } blk_account_io_completion(req, nr_bytes); diff --git a/block/blk-merge.c b/block/blk-merge.c index 23d2a6fe34a..bf62a87a9da 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -259,7 +259,7 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req, else max_sectors = q->max_sectors; - if (req->nr_sectors + bio_sectors(bio) > max_sectors) { + if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) { req->cmd_flags |= REQ_NOMERGE; if (req == q->last_merge) q->last_merge = NULL; @@ -284,7 +284,7 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req, max_sectors = q->max_sectors; - if (req->nr_sectors + bio_sectors(bio) > max_sectors) { + if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) { req->cmd_flags |= REQ_NOMERGE; if (req == q->last_merge) q->last_merge = NULL; @@ -315,7 +315,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, /* * Will it become too large? */ - if ((req->nr_sectors + next->nr_sectors) > q->max_sectors) + if ((blk_rq_sectors(req) + blk_rq_sectors(next)) > q->max_sectors) return 0; total_phys_segments = req->nr_phys_segments + next->nr_phys_segments; @@ -345,7 +345,7 @@ static void blk_account_io_merge(struct request *req) int cpu; cpu = part_stat_lock(); - part = disk_map_sector_rcu(req->rq_disk, req->sector); + part = disk_map_sector_rcu(req->rq_disk, blk_rq_pos(req)); part_round_stats(cpu, part); part_dec_in_flight(part); @@ -366,7 +366,7 @@ static int attempt_merge(struct request_queue *q, struct request *req, /* * not contiguous */ - if (req->sector + req->nr_sectors != next->sector) + if (blk_rq_pos(req) + blk_rq_sectors(req) != blk_rq_pos(next)) return 0; if (rq_data_dir(req) != rq_data_dir(next) diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 575083a9ffe..db4d990a66b 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -349,8 +349,8 @@ cfq_choose_req(struct cfq_data *cfqd, struct request *rq1, struct request *rq2) else if (rq_is_meta(rq2) && !rq_is_meta(rq1)) return rq2; - s1 = rq1->sector; - s2 = rq2->sector; + s1 = blk_rq_pos(rq1); + s2 = blk_rq_pos(rq2); last = cfqd->last_position; @@ -949,10 +949,10 @@ static struct cfq_queue *cfq_set_active_queue(struct cfq_data *cfqd, static inline sector_t cfq_dist_from_last(struct cfq_data *cfqd, struct request *rq) { - if (rq->sector >= cfqd->last_position) - return rq->sector - cfqd->last_position; + if (blk_rq_pos(rq) >= cfqd->last_position) + return blk_rq_pos(rq) - cfqd->last_position; else - return cfqd->last_position - rq->sector; + return cfqd->last_position - blk_rq_pos(rq); } #define CIC_SEEK_THR 8 * 1024 @@ -1918,10 +1918,10 @@ cfq_update_io_seektime(struct cfq_data *cfqd, struct cfq_io_context *cic, if (!cic->last_request_pos) sdist = 0; - else if (cic->last_request_pos < rq->sector) - sdist = rq->sector - cic->last_request_pos; + else if (cic->last_request_pos < blk_rq_pos(rq)) + sdist = blk_rq_pos(rq) - cic->last_request_pos; else - sdist = cic->last_request_pos - rq->sector; + sdist = cic->last_request_pos - blk_rq_pos(rq); /* * Don't allow the seek distance to get too large from the @@ -2071,7 +2071,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq, cfq_update_io_seektime(cfqd, cic, rq); cfq_update_idle_window(cfqd, cfqq, cic); - cic->last_request_pos = rq->sector + rq->nr_sectors; + cic->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq); if (cfqq == cfqd->active_queue) { /* diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c index c4d991d4ade..b547cbca7b2 100644 --- a/block/deadline-iosched.c +++ b/block/deadline-iosched.c @@ -138,7 +138,7 @@ deadline_merge(struct request_queue *q, struct request **req, struct bio *bio) __rq = elv_rb_find(&dd->sort_list[bio_data_dir(bio)], sector); if (__rq) { - BUG_ON(sector != __rq->sector); + BUG_ON(sector != blk_rq_pos(__rq)); if (elv_rq_merge_ok(__rq, bio)) { ret = ELEVATOR_FRONT_MERGE; diff --git a/block/elevator.c b/block/elevator.c index 1af5d9f04af..918920056e4 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -52,7 +52,7 @@ static const int elv_hash_shift = 6; #define ELV_HASH_FN(sec) \ (hash_long(ELV_HASH_BLOCK((sec)), elv_hash_shift)) #define ELV_HASH_ENTRIES (1 << elv_hash_shift) -#define rq_hash_key(rq) ((rq)->sector + (rq)->nr_sectors) +#define rq_hash_key(rq) (blk_rq_pos(rq) + blk_rq_sectors(rq)) DEFINE_TRACE(block_rq_insert); DEFINE_TRACE(block_rq_issue); @@ -119,9 +119,9 @@ static inline int elv_try_merge(struct request *__rq, struct bio *bio) * we can merge and sequence is ok, check if it's possible */ if (elv_rq_merge_ok(__rq, bio)) { - if (__rq->sector + __rq->nr_sectors == bio->bi_sector) + if (blk_rq_pos(__rq) + blk_rq_sectors(__rq) == bio->bi_sector) ret = ELEVATOR_BACK_MERGE; - else if (__rq->sector - bio_sectors(bio) == bio->bi_sector) + else if (blk_rq_pos(__rq) - bio_sectors(bio) == bio->bi_sector) ret = ELEVATOR_FRONT_MERGE; } @@ -370,9 +370,9 @@ struct request *elv_rb_add(struct rb_root *root, struct request *rq) parent = *p; __rq = rb_entry(parent, struct request, rb_node); - if (rq->sector < __rq->sector) + if (blk_rq_pos(rq) < blk_rq_pos(__rq)) p = &(*p)->rb_left; - else if (rq->sector > __rq->sector) + else if (blk_rq_pos(rq) > blk_rq_pos(__rq)) p = &(*p)->rb_right; else return __rq; @@ -400,9 +400,9 @@ struct request *elv_rb_find(struct rb_root *root, sector_t sector) while (n) { rq = rb_entry(n, struct request, rb_node); - if (sector < rq->sector) + if (sector < blk_rq_pos(rq)) n = n->rb_left; - else if (sector > rq->sector) + else if (sector > blk_rq_pos(rq)) n = n->rb_right; else return rq; @@ -441,14 +441,14 @@ void elv_dispatch_sort(struct request_queue *q, struct request *rq) break; if (pos->cmd_flags & stop_flags) break; - if (rq->sector >= boundary) { - if (pos->sector < boundary) + if (blk_rq_pos(rq) >= boundary) { + if (blk_rq_pos(pos) < boundary) continue; } else { - if (pos->sector >= boundary) + if (blk_rq_pos(pos) >= boundary) break; } - if (rq->sector >= pos->sector) + if (blk_rq_pos(rq) >= blk_rq_pos(pos)) break; } -- cgit v1.2.3 From 2e46e8b27aa57c6bd34b3102b40ee4d0144b4fab Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 7 May 2009 22:24:41 +0900 Subject: block: drop request->hard_* and *nr_sectors struct request has had a few different ways to represent some properties of a request. ->hard_* represent block layer's view of the request progress (completion cursor) and the ones without the prefix are supposed to represent the issue cursor and allowed to be updated as necessary by the low level drivers. The thing is that as block layer supports partial completion, the two cursors really aren't necessary and only cause confusion. In addition, manual management of request detail from low level drivers is cumbersome and error-prone at the very least. Another interesting duplicate fields are rq->[hard_]nr_sectors and rq->{hard_cur|current}_nr_sectors against rq->data_len and rq->bio->bi_size. This is more convoluted than the hard_ case. rq->[hard_]nr_sectors are initialized for requests with bio but blk_rq_bytes() uses it only for !pc requests. rq->data_len is initialized for all request but blk_rq_bytes() uses it only for pc requests. This causes good amount of confusion throughout block layer and its drivers and determining the request length has been a bit of black magic which may or may not work depending on circumstances and what the specific LLD is actually doing. rq->{hard_cur|current}_nr_sectors represent the number of sectors in the contiguous data area at the front. This is mainly used by drivers which transfers data by walking request segment-by-segment. This value always equals rq->bio->bi_size >> 9. However, data length for pc requests may not be multiple of 512 bytes and using this field becomes a bit confusing. In general, having multiple fields to represent the same property leads only to confusion and subtle bugs. With recent block low level driver cleanups, no driver is accessing or manipulating these duplicate fields directly. Drop all the duplicates. Now rq->sector means the current sector, rq->data_len the current total length and rq->bio->bi_size the current segment length. Everything else is defined in terms of these three and available only through accessors. * blk_recalc_rq_sectors() is collapsed into blk_update_request() and now handles pc and fs requests equally other than rq->sector update. This means that now pc requests can use partial completion too (no in-kernel user yet tho). * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer now uses byte count as the primary data length. * blk_rq_pos() is now guranteed to be always correct. In-block users converted. * blk_rq_bytes() is now guaranteed to be always valid as is blk_rq_sectors(). In-block users converted. * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9. More convenient one is used. * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const pointer to request. [ Impact: API cleanup, single way to represent one property of a request ] Signed-off-by: Tejun Heo Cc: Boaz Harrosh Signed-off-by: Jens Axboe --- block/blk-core.c | 81 ++++++++++++++++++++--------------------------------- block/blk-merge.c | 36 +++--------------------- block/blk.h | 1 - block/cfq-iosched.c | 10 +++---- 4 files changed, 39 insertions(+), 89 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 82dc20621c0..3596ca71909 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -127,7 +127,7 @@ void blk_rq_init(struct request_queue *q, struct request *rq) INIT_LIST_HEAD(&rq->timeout_list); rq->cpu = -1; rq->q = q; - rq->sector = rq->hard_sector = (sector_t) -1; + rq->sector = (sector_t) -1; INIT_HLIST_NODE(&rq->hash); RB_CLEAR_NODE(&rq->rb_node); rq->cmd = rq->__cmd; @@ -189,8 +189,7 @@ void blk_dump_rq_flags(struct request *rq, char *msg) (unsigned long long)blk_rq_pos(rq), blk_rq_sectors(rq), blk_rq_cur_sectors(rq)); printk(KERN_INFO " bio %p, biotail %p, buffer %p, len %u\n", - rq->bio, rq->biotail, - rq->buffer, rq->data_len); + rq->bio, rq->biotail, rq->buffer, blk_rq_bytes(rq)); if (blk_pc_request(rq)) { printk(KERN_INFO " cdb: "); @@ -1096,7 +1095,7 @@ void init_request_from_bio(struct request *req, struct bio *bio) req->cmd_flags |= REQ_NOIDLE; req->errors = 0; - req->hard_sector = req->sector = bio->bi_sector; + req->sector = bio->bi_sector; req->ioprio = bio_prio(bio); blk_rq_bio_prep(req->q, req, bio); } @@ -1113,14 +1112,13 @@ static inline bool queue_should_plug(struct request_queue *q) static int __make_request(struct request_queue *q, struct bio *bio) { struct request *req; - int el_ret, nr_sectors; + int el_ret; + unsigned int bytes = bio->bi_size; const unsigned short prio = bio_prio(bio); const int sync = bio_sync(bio); const int unplug = bio_unplug(bio); int rw_flags; - nr_sectors = bio_sectors(bio); - /* * low level driver can indicate that it wants pages above a * certain limit bounced to low memory (ie for highmem, or even @@ -1145,7 +1143,7 @@ static int __make_request(struct request_queue *q, struct bio *bio) req->biotail->bi_next = bio; req->biotail = bio; - req->nr_sectors = req->hard_nr_sectors += nr_sectors; + req->data_len += bytes; req->ioprio = ioprio_best(req->ioprio, prio); if (!blk_rq_cpu_valid(req)) req->cpu = bio->bi_comp_cpu; @@ -1171,10 +1169,8 @@ static int __make_request(struct request_queue *q, struct bio *bio) * not touch req->buffer either... */ req->buffer = bio_data(bio); - req->current_nr_sectors = bio_cur_sectors(bio); - req->hard_cur_sectors = req->current_nr_sectors; - req->sector = req->hard_sector = bio->bi_sector; - req->nr_sectors = req->hard_nr_sectors += nr_sectors; + req->sector = bio->bi_sector; + req->data_len += bytes; req->ioprio = ioprio_best(req->ioprio, prio); if (!blk_rq_cpu_valid(req)) req->cpu = bio->bi_comp_cpu; @@ -1557,7 +1553,7 @@ EXPORT_SYMBOL(submit_bio); int blk_rq_check_limits(struct request_queue *q, struct request *rq) { if (blk_rq_sectors(rq) > q->max_sectors || - rq->data_len > q->max_hw_sectors << 9) { + blk_rq_bytes(rq) > q->max_hw_sectors << 9) { printk(KERN_ERR "%s: over max size limit.\n", __func__); return -EIO; } @@ -1675,35 +1671,6 @@ static void blk_account_io_done(struct request *req) } } -/** - * blk_rq_bytes - Returns bytes left to complete in the entire request - * @rq: the request being processed - **/ -unsigned int blk_rq_bytes(struct request *rq) -{ - if (blk_fs_request(rq)) - return blk_rq_sectors(rq) << 9; - - return rq->data_len; -} -EXPORT_SYMBOL_GPL(blk_rq_bytes); - -/** - * blk_rq_cur_bytes - Returns bytes left to complete in the current segment - * @rq: the request being processed - **/ -unsigned int blk_rq_cur_bytes(struct request *rq) -{ - if (blk_fs_request(rq)) - return rq->current_nr_sectors << 9; - - if (rq->bio) - return rq->bio->bi_size; - - return rq->data_len; -} -EXPORT_SYMBOL_GPL(blk_rq_cur_bytes); - struct request *elv_next_request(struct request_queue *q) { struct request *rq; @@ -1736,7 +1703,7 @@ struct request *elv_next_request(struct request_queue *q) if (rq->cmd_flags & REQ_DONTPREP) break; - if (q->dma_drain_size && rq->data_len) { + if (q->dma_drain_size && blk_rq_bytes(rq)) { /* * make sure space for the drain appears we * know we can do this because max_hw_segments @@ -1759,7 +1726,7 @@ struct request *elv_next_request(struct request_queue *q) * avoid resource deadlock. REQ_STARTED will * prevent other fs requests from passing this one. */ - if (q->dma_drain_size && rq->data_len && + if (q->dma_drain_size && blk_rq_bytes(rq) && !(rq->cmd_flags & REQ_DONTPREP)) { /* * remove the space for the drain we added @@ -1911,8 +1878,7 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) * can find how many bytes remain in the request * later. */ - req->nr_sectors = req->hard_nr_sectors = 0; - req->current_nr_sectors = req->hard_cur_sectors = 0; + req->data_len = 0; return false; } @@ -1926,8 +1892,25 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) bio_iovec(bio)->bv_len -= nr_bytes; } - blk_recalc_rq_sectors(req, total_bytes >> 9); + req->data_len -= total_bytes; + req->buffer = bio_data(req->bio); + + /* update sector only for requests with clear definition of sector */ + if (blk_fs_request(req) || blk_discard_rq(req)) + req->sector += total_bytes >> 9; + + /* + * If total number of sectors is less than the first segment + * size, something has gone terribly wrong. + */ + if (blk_rq_bytes(req) < blk_rq_cur_bytes(req)) { + printk(KERN_ERR "blk: request botched\n"); + req->data_len = blk_rq_cur_bytes(req); + } + + /* recalculate the number of segments */ blk_recalc_rq_segments(req); + return true; } EXPORT_SYMBOL_GPL(blk_update_request); @@ -2049,11 +2032,7 @@ void blk_rq_bio_prep(struct request_queue *q, struct request *rq, rq->nr_phys_segments = bio_phys_segments(q, bio); rq->buffer = bio_data(bio); } - rq->current_nr_sectors = bio_cur_sectors(bio); - rq->hard_cur_sectors = rq->current_nr_sectors; - rq->hard_nr_sectors = rq->nr_sectors = bio_sectors(bio); rq->data_len = bio->bi_size; - rq->bio = rq->biotail = bio; if (bio->bi_bdev) diff --git a/block/blk-merge.c b/block/blk-merge.c index bf62a87a9da..b8df66aef0f 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -9,35 +9,6 @@ #include "blk.h" -void blk_recalc_rq_sectors(struct request *rq, int nsect) -{ - if (blk_fs_request(rq) || blk_discard_rq(rq)) { - rq->hard_sector += nsect; - rq->hard_nr_sectors -= nsect; - - /* - * Move the I/O submission pointers ahead if required. - */ - if ((rq->nr_sectors >= rq->hard_nr_sectors) && - (rq->sector <= rq->hard_sector)) { - rq->sector = rq->hard_sector; - rq->nr_sectors = rq->hard_nr_sectors; - rq->hard_cur_sectors = bio_cur_sectors(rq->bio); - rq->current_nr_sectors = rq->hard_cur_sectors; - rq->buffer = bio_data(rq->bio); - } - - /* - * if total number of sectors is less than the first segment - * size, something has gone terribly wrong - */ - if (rq->nr_sectors < rq->current_nr_sectors) { - printk(KERN_ERR "blk: request botched\n"); - rq->nr_sectors = rq->current_nr_sectors; - } - } -} - static unsigned int __blk_recalc_rq_segments(struct request_queue *q, struct bio *bio) { @@ -199,8 +170,9 @@ new_segment: if (unlikely(rq->cmd_flags & REQ_COPY_USER) && - (rq->data_len & q->dma_pad_mask)) { - unsigned int pad_len = (q->dma_pad_mask & ~rq->data_len) + 1; + (blk_rq_bytes(rq) & q->dma_pad_mask)) { + unsigned int pad_len = + (q->dma_pad_mask & ~blk_rq_bytes(rq)) + 1; sg->length += pad_len; rq->extra_len += pad_len; @@ -398,7 +370,7 @@ static int attempt_merge(struct request_queue *q, struct request *req, req->biotail->bi_next = next->bio; req->biotail = next->biotail; - req->nr_sectors = req->hard_nr_sectors += next->hard_nr_sectors; + req->data_len += blk_rq_bytes(next); elv_merge_requests(q, req, next); diff --git a/block/blk.h b/block/blk.h index 51115599df9..ab54529103c 100644 --- a/block/blk.h +++ b/block/blk.h @@ -101,7 +101,6 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req, int attempt_back_merge(struct request_queue *q, struct request *rq); int attempt_front_merge(struct request_queue *q, struct request *rq); void blk_recalc_rq_segments(struct request *rq); -void blk_recalc_rq_sectors(struct request *rq, int nsect); void blk_queue_congestion_threshold(struct request_queue *q); diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index db4d990a66b..99ac4304d71 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -579,9 +579,9 @@ cfq_prio_tree_lookup(struct cfq_data *cfqd, struct rb_root *root, * Sort strictly based on sector. Smallest to the left, * largest to the right. */ - if (sector > cfqq->next_rq->sector) + if (sector > blk_rq_pos(cfqq->next_rq)) n = &(*p)->rb_right; - else if (sector < cfqq->next_rq->sector) + else if (sector < blk_rq_pos(cfqq->next_rq)) n = &(*p)->rb_left; else break; @@ -611,8 +611,8 @@ static void cfq_prio_tree_add(struct cfq_data *cfqd, struct cfq_queue *cfqq) return; cfqq->p_root = &cfqd->prio_trees[cfqq->org_ioprio]; - __cfqq = cfq_prio_tree_lookup(cfqd, cfqq->p_root, cfqq->next_rq->sector, - &parent, &p); + __cfqq = cfq_prio_tree_lookup(cfqd, cfqq->p_root, + blk_rq_pos(cfqq->next_rq), &parent, &p); if (!__cfqq) { rb_link_node(&cfqq->p_node, parent, p); rb_insert_color(&cfqq->p_node, cfqq->p_root); @@ -996,7 +996,7 @@ static struct cfq_queue *cfqq_close(struct cfq_data *cfqd, if (cfq_rq_close(cfqd, __cfqq->next_rq)) return __cfqq; - if (__cfqq->next_rq->sector < sector) + if (blk_rq_pos(__cfqq->next_rq) < sector) node = rb_next(&__cfqq->p_node); else node = rb_prev(&__cfqq->p_node); -- cgit v1.2.3 From a2dec7b36364a5cc564c4d76cf16d2e7d33f5c05 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 7 May 2009 22:24:44 +0900 Subject: block: hide request sector and data_len Block low level drivers for some reason have been pretty good at abusing block layer API. Especially struct request's fields tend to get violated in all possible ways. Make it clear that low level drivers MUST NOT access or manipulate rq->sector and rq->data_len directly by prefixing them with double underscores. This change is also necessary to break build of out-of-tree codes which assume the previous block API where internal fields can be manipulated and rq->data_len carries residual count on completion. [ Impact: hide internal fields, block API change ] Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe --- block/blk-core.c | 20 ++++++++++---------- block/blk-map.c | 2 +- block/blk-merge.c | 2 +- 3 files changed, 12 insertions(+), 12 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 3596ca71909..6226a380fb6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -127,7 +127,7 @@ void blk_rq_init(struct request_queue *q, struct request *rq) INIT_LIST_HEAD(&rq->timeout_list); rq->cpu = -1; rq->q = q; - rq->sector = (sector_t) -1; + rq->__sector = (sector_t) -1; INIT_HLIST_NODE(&rq->hash); RB_CLEAR_NODE(&rq->rb_node); rq->cmd = rq->__cmd; @@ -1095,7 +1095,7 @@ void init_request_from_bio(struct request *req, struct bio *bio) req->cmd_flags |= REQ_NOIDLE; req->errors = 0; - req->sector = bio->bi_sector; + req->__sector = bio->bi_sector; req->ioprio = bio_prio(bio); blk_rq_bio_prep(req->q, req, bio); } @@ -1143,7 +1143,7 @@ static int __make_request(struct request_queue *q, struct bio *bio) req->biotail->bi_next = bio; req->biotail = bio; - req->data_len += bytes; + req->__data_len += bytes; req->ioprio = ioprio_best(req->ioprio, prio); if (!blk_rq_cpu_valid(req)) req->cpu = bio->bi_comp_cpu; @@ -1169,8 +1169,8 @@ static int __make_request(struct request_queue *q, struct bio *bio) * not touch req->buffer either... */ req->buffer = bio_data(bio); - req->sector = bio->bi_sector; - req->data_len += bytes; + req->__sector = bio->bi_sector; + req->__data_len += bytes; req->ioprio = ioprio_best(req->ioprio, prio); if (!blk_rq_cpu_valid(req)) req->cpu = bio->bi_comp_cpu; @@ -1878,7 +1878,7 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) * can find how many bytes remain in the request * later. */ - req->data_len = 0; + req->__data_len = 0; return false; } @@ -1892,12 +1892,12 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) bio_iovec(bio)->bv_len -= nr_bytes; } - req->data_len -= total_bytes; + req->__data_len -= total_bytes; req->buffer = bio_data(req->bio); /* update sector only for requests with clear definition of sector */ if (blk_fs_request(req) || blk_discard_rq(req)) - req->sector += total_bytes >> 9; + req->__sector += total_bytes >> 9; /* * If total number of sectors is less than the first segment @@ -1905,7 +1905,7 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) */ if (blk_rq_bytes(req) < blk_rq_cur_bytes(req)) { printk(KERN_ERR "blk: request botched\n"); - req->data_len = blk_rq_cur_bytes(req); + req->__data_len = blk_rq_cur_bytes(req); } /* recalculate the number of segments */ @@ -2032,7 +2032,7 @@ void blk_rq_bio_prep(struct request_queue *q, struct request *rq, rq->nr_phys_segments = bio_phys_segments(q, bio); rq->buffer = bio_data(bio); } - rq->data_len = bio->bi_size; + rq->__data_len = bio->bi_size; rq->bio = rq->biotail = bio; if (bio->bi_bdev) diff --git a/block/blk-map.c b/block/blk-map.c index 694fefad34e..56082bea450 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -20,7 +20,7 @@ int blk_rq_append_bio(struct request_queue *q, struct request *rq, rq->biotail->bi_next = bio; rq->biotail = bio; - rq->data_len += bio->bi_size; + rq->__data_len += bio->bi_size; } return 0; } diff --git a/block/blk-merge.c b/block/blk-merge.c index b8df66aef0f..4974dd5767e 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -370,7 +370,7 @@ static int attempt_merge(struct request_queue *q, struct request *req, req->biotail->bi_next = next->bio; req->biotail = next->biotail; - req->data_len += blk_rq_bytes(next); + req->__data_len += blk_rq_bytes(next); elv_merge_requests(q, req, next); -- cgit v1.2.3 From 9934c8c04561413609d2bc38c6b9f268cba774a4 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 8 May 2009 11:54:16 +0900 Subject: block: implement and enforce request peek/start/fetch Till now block layer allowed two separate modes of request execution. A request is always acquired from the request queue via elv_next_request(). After that, drivers are free to either dequeue it or process it without dequeueing. Dequeue allows elv_next_request() to return the next request so that multiple requests can be in flight. Executing requests without dequeueing has its merits mostly in allowing drivers for simpler devices which can't do sg to deal with segments only without considering request boundary. However, the benefit this brings is dubious and declining while the cost of the API ambiguity is increasing. Segment based drivers are usually for very old or limited devices and as converting to dequeueing model isn't difficult, it doesn't justify the API overhead it puts on block layer and its more modern users. Previous patches converted all block low level drivers to dequeueing model. This patch completes the API transition by... * renaming elv_next_request() to blk_peek_request() * renaming blkdev_dequeue_request() to blk_start_request() * adding blk_fetch_request() which is combination of peek and start * disallowing completion of queued (not started) requests * applying new API to all LLDs Renamings are for consistency and to break out of tree code so that it's apparent that out of tree drivers need updating. [ Impact: block request issue API cleanup, no functional change ] Signed-off-by: Tejun Heo Cc: Rusty Russell Cc: James Bottomley Cc: Mike Miller Cc: unsik Kim Cc: Paul Clements Cc: Tim Waugh Cc: Geert Uytterhoeven Cc: David S. Miller Cc: Laurent Vivier Cc: Jeff Garzik Cc: Jeremy Fitzhardinge Cc: Grant Likely Cc: Adrian McMenamin Cc: Stephen Rothwell Cc: Bartlomiej Zolnierkiewicz Cc: Borislav Petkov Cc: Sergei Shtylyov Cc: Alex Dubov Cc: Pierre Ossman Cc: David Woodhouse Cc: Markus Lidel Cc: Stefan Weinhuber Cc: Martin Schwidefsky Cc: Pete Zaitcev Cc: FUJITA Tomonori Signed-off-by: Jens Axboe --- block/blk-barrier.c | 4 +- block/blk-core.c | 105 ++++++++++++++++++++++++++++++++++++++-------------- block/blk-tag.c | 2 +- block/blk.h | 1 + 4 files changed, 81 insertions(+), 31 deletions(-) (limited to 'block') diff --git a/block/blk-barrier.c b/block/blk-barrier.c index 8713c2fbc4f..0ab81a0a750 100644 --- a/block/blk-barrier.c +++ b/block/blk-barrier.c @@ -180,7 +180,7 @@ static inline bool start_ordered(struct request_queue *q, struct request **rqp) } /* stash away the original request */ - elv_dequeue_request(q, rq); + blk_dequeue_request(rq); q->orig_bar_rq = rq; rq = NULL; @@ -248,7 +248,7 @@ bool blk_do_ordered(struct request_queue *q, struct request **rqp) * Queue ordering not supported. Terminate * with prejudice. */ - elv_dequeue_request(q, rq); + blk_dequeue_request(rq); __blk_end_request_all(rq, -EOPNOTSUPP); *rqp = NULL; return false; diff --git a/block/blk-core.c b/block/blk-core.c index 6226a380fb6..93691d2ac5a 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -902,6 +902,8 @@ EXPORT_SYMBOL(blk_get_request); */ void blk_requeue_request(struct request_queue *q, struct request *rq) { + BUG_ON(blk_queued_rq(rq)); + blk_delete_timer(rq); blk_clear_rq_complete(rq); trace_block_rq_requeue(q, rq); @@ -1610,28 +1612,6 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq) } EXPORT_SYMBOL_GPL(blk_insert_cloned_request); -/** - * blkdev_dequeue_request - dequeue request and start timeout timer - * @req: request to dequeue - * - * Dequeue @req and start timeout timer on it. This hands off the - * request to the driver. - * - * Block internal functions which don't want to start timer should - * call elv_dequeue_request(). - */ -void blkdev_dequeue_request(struct request *req) -{ - elv_dequeue_request(req->q, req); - - /* - * We are now handing the request to the hardware, add the - * timeout handler. - */ - blk_add_timer(req); -} -EXPORT_SYMBOL(blkdev_dequeue_request); - static void blk_account_io_completion(struct request *req, unsigned int bytes) { if (blk_do_io_stat(req)) { @@ -1671,7 +1651,23 @@ static void blk_account_io_done(struct request *req) } } -struct request *elv_next_request(struct request_queue *q) +/** + * blk_peek_request - peek at the top of a request queue + * @q: request queue to peek at + * + * Description: + * Return the request at the top of @q. The returned request + * should be started using blk_start_request() before LLD starts + * processing it. + * + * Return: + * Pointer to the request at the top of @q if available. Null + * otherwise. + * + * Context: + * queue_lock must be held. + */ +struct request *blk_peek_request(struct request_queue *q) { struct request *rq; int ret; @@ -1748,10 +1744,12 @@ struct request *elv_next_request(struct request_queue *q) return rq; } -EXPORT_SYMBOL(elv_next_request); +EXPORT_SYMBOL(blk_peek_request); -void elv_dequeue_request(struct request_queue *q, struct request *rq) +void blk_dequeue_request(struct request *rq) { + struct request_queue *q = rq->q; + BUG_ON(list_empty(&rq->queuelist)); BUG_ON(ELV_ON_HASH(rq)); @@ -1766,6 +1764,58 @@ void elv_dequeue_request(struct request_queue *q, struct request *rq) q->in_flight++; } +/** + * blk_start_request - start request processing on the driver + * @req: request to dequeue + * + * Description: + * Dequeue @req and start timeout timer on it. This hands off the + * request to the driver. + * + * Block internal functions which don't want to start timer should + * call blk_dequeue_request(). + * + * Context: + * queue_lock must be held. + */ +void blk_start_request(struct request *req) +{ + blk_dequeue_request(req); + + /* + * We are now handing the request to the hardware, add the + * timeout handler. + */ + blk_add_timer(req); +} +EXPORT_SYMBOL(blk_start_request); + +/** + * blk_fetch_request - fetch a request from a request queue + * @q: request queue to fetch a request from + * + * Description: + * Return the request at the top of @q. The request is started on + * return and LLD can start processing it immediately. + * + * Return: + * Pointer to the request at the top of @q if available. Null + * otherwise. + * + * Context: + * queue_lock must be held. + */ +struct request *blk_fetch_request(struct request_queue *q) +{ + struct request *rq; + + rq = blk_peek_request(q); + if (rq) + blk_start_request(rq); + return rq; +} +EXPORT_SYMBOL(blk_fetch_request); + /** * blk_update_request - Special helper function for request stacking drivers * @rq: the request being processed @@ -1937,12 +1987,11 @@ static bool blk_update_bidi_request(struct request *rq, int error, */ static void blk_finish_request(struct request *req, int error) { + BUG_ON(blk_queued_rq(req)); + if (blk_rq_tagged(req)) blk_queue_end_tag(req->q, req); - if (blk_queued_rq(req)) - elv_dequeue_request(req->q, req); - if (unlikely(laptop_mode) && blk_fs_request(req)) laptop_io_completion(); diff --git a/block/blk-tag.c b/block/blk-tag.c index 3c518e3303a..c260f7c30dd 100644 --- a/block/blk-tag.c +++ b/block/blk-tag.c @@ -374,7 +374,7 @@ int blk_queue_start_tag(struct request_queue *q, struct request *rq) rq->cmd_flags |= REQ_QUEUED; rq->tag = tag; bqt->tag_index[tag] = rq; - blkdev_dequeue_request(rq); + blk_start_request(rq); list_add(&rq->queuelist, &q->tag_busy_list); return 0; } diff --git a/block/blk.h b/block/blk.h index ab54529103c..9e0042ca949 100644 --- a/block/blk.h +++ b/block/blk.h @@ -13,6 +13,7 @@ extern struct kobj_type blk_queue_ktype; void init_request_from_bio(struct request *req, struct bio *bio); void blk_rq_bio_prep(struct request_queue *q, struct request *rq, struct bio *bio); +void blk_dequeue_request(struct request *rq); void __blk_queue_free_tags(struct request_queue *q); void blk_unplug_work(struct work_struct *work); -- cgit v1.2.3 From b1f744937f1be3e6d3009382a755679133cf782d Mon Sep 17 00:00:00 2001 From: FUJITA Tomonori Date: Mon, 11 May 2009 17:56:09 +0900 Subject: block: move completion related functions back to blk-core.c Let's put the completion related functions back to block/blk-core.c where they have lived. We can also unexport blk_end_bidi_request() and __blk_end_bidi_request(), which nobody uses. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe --- block/blk-core.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 122 insertions(+), 6 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 93691d2ac5a..a2d97de1a12 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2026,8 +2026,8 @@ static void blk_finish_request(struct request *req, int error) * %false - we are done with this request * %true - still buffers pending for this request **/ -bool blk_end_bidi_request(struct request *rq, int error, - unsigned int nr_bytes, unsigned int bidi_bytes) +static bool blk_end_bidi_request(struct request *rq, int error, + unsigned int nr_bytes, unsigned int bidi_bytes) { struct request_queue *q = rq->q; unsigned long flags; @@ -2041,7 +2041,6 @@ bool blk_end_bidi_request(struct request *rq, int error, return false; } -EXPORT_SYMBOL_GPL(blk_end_bidi_request); /** * __blk_end_bidi_request - Complete a bidi request with queue lock held @@ -2058,8 +2057,8 @@ EXPORT_SYMBOL_GPL(blk_end_bidi_request); * %false - we are done with this request * %true - still buffers pending for this request **/ -bool __blk_end_bidi_request(struct request *rq, int error, - unsigned int nr_bytes, unsigned int bidi_bytes) +static bool __blk_end_bidi_request(struct request *rq, int error, + unsigned int nr_bytes, unsigned int bidi_bytes) { if (blk_update_bidi_request(rq, error, nr_bytes, bidi_bytes)) return true; @@ -2068,7 +2067,124 @@ bool __blk_end_bidi_request(struct request *rq, int error, return false; } -EXPORT_SYMBOL_GPL(__blk_end_bidi_request); + +/** + * blk_end_request - Helper function for drivers to complete the request. + * @rq: the request being processed + * @error: %0 for success, < %0 for error + * @nr_bytes: number of bytes to complete + * + * Description: + * Ends I/O on a number of bytes attached to @rq. + * If @rq has leftover, sets it up for the next range of segments. + * + * Return: + * %false - we are done with this request + * %true - still buffers pending for this request + **/ +bool blk_end_request(struct request *rq, int error, unsigned int nr_bytes) +{ + return blk_end_bidi_request(rq, error, nr_bytes, 0); +} +EXPORT_SYMBOL_GPL(blk_end_request); + +/** + * blk_end_request_all - Helper function for drives to finish the request. + * @rq: the request to finish + * @err: %0 for success, < %0 for error + * + * Description: + * Completely finish @rq. + */ +void blk_end_request_all(struct request *rq, int error) +{ + bool pending; + unsigned int bidi_bytes = 0; + + if (unlikely(blk_bidi_rq(rq))) + bidi_bytes = blk_rq_bytes(rq->next_rq); + + pending = blk_end_bidi_request(rq, error, blk_rq_bytes(rq), bidi_bytes); + BUG_ON(pending); +} +EXPORT_SYMBOL_GPL(blk_end_request_all); + +/** + * blk_end_request_cur - Helper function to finish the current request chunk. + * @rq: the request to finish the current chunk for + * @err: %0 for success, < %0 for error + * + * Description: + * Complete the current consecutively mapped chunk from @rq. + * + * Return: + * %false - we are done with this request + * %true - still buffers pending for this request + */ +bool blk_end_request_cur(struct request *rq, int error) +{ + return blk_end_request(rq, error, blk_rq_cur_bytes(rq)); +} +EXPORT_SYMBOL_GPL(blk_end_request_cur); + +/** + * __blk_end_request - Helper function for drivers to complete the request. + * @rq: the request being processed + * @error: %0 for success, < %0 for error + * @nr_bytes: number of bytes to complete + * + * Description: + * Must be called with queue lock held unlike blk_end_request(). + * + * Return: + * %false - we are done with this request + * %true - still buffers pending for this request + **/ +bool __blk_end_request(struct request *rq, int error, unsigned int nr_bytes) +{ + return __blk_end_bidi_request(rq, error, nr_bytes, 0); +} +EXPORT_SYMBOL_GPL(__blk_end_request); + +/** + * __blk_end_request_all - Helper function for drives to finish the request. + * @rq: the request to finish + * @err: %0 for success, < %0 for error + * + * Description: + * Completely finish @rq. Must be called with queue lock held. + */ +void __blk_end_request_all(struct request *rq, int error) +{ + bool pending; + unsigned int bidi_bytes = 0; + + if (unlikely(blk_bidi_rq(rq))) + bidi_bytes = blk_rq_bytes(rq->next_rq); + + pending = __blk_end_bidi_request(rq, error, blk_rq_bytes(rq), bidi_bytes); + BUG_ON(pending); +} +EXPORT_SYMBOL_GPL(__blk_end_request_all); + +/** + * __blk_end_request_cur - Helper function to finish the current request chunk. + * @rq: the request to finish the current chunk for + * @err: %0 for success, < %0 for error + * + * Description: + * Complete the current consecutively mapped chunk from @rq. Must + * be called with queue lock held. + * + * Return: + * %false - we are done with this request + * %true - still buffers pending for this request + */ +bool __blk_end_request_cur(struct request *rq, int error) +{ + return __blk_end_request(rq, error, blk_rq_cur_bytes(rq)); +} +EXPORT_SYMBOL_GPL(__blk_end_request_cur); void blk_rq_bio_prep(struct request_queue *q, struct request *rq, struct bio *bio) -- cgit v1.2.3 From 5f49f63178360b07a095bd33b0d850d60edf7590 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 19 May 2009 18:33:05 +0900 Subject: block: set rq->resid_len to blk_rq_bytes() on issue In commit c3a4d78c580de4edc9ef0f7c59812fb02ceb037f, while introducing rq->resid_len, the default value of residue count was changed from full count to zero. The conversion was done under the assumption that when a request fails residue count wasn't defined. However, Boaz and James pointed out that this wasn't true and the residue count should be preserved for failed requests too. This patchset restores the original behavior by setting rq->resid_len to blk_rq_bytes(rq) on request start and restoring explicit clearing in affected drivers. While at it, take advantage of the fact that rq->resid_len is set to full count where applicable. * ide-cd: rq->resid_len cleared on pc success * mptsas: req->resid_len cleared on success * sas_expander: rsp/req->resid_len cleared on success * mpt2sas_transport: req->resid_len cleared on success * ide-cd, ide-tape, mptsas, sas_host_smp, mpt2sas_transport, ub: take advantage of initial full count to simplify code Boaz Harrosh spotted bug in resid_len initialization. Fixed as suggested. Signed-off-by: Tejun Heo Acked-by: Borislav Petkov Cc: Boaz Harrosh Cc: James Bottomley Cc: Pete Zaitcev Cc: Bartlomiej Zolnierkiewicz Cc: Sergei Shtylyov Cc: Eric Moore Cc: Darrick J. Wong Signed-off-by: Jens Axboe --- block/blk-core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index a2d97de1a12..e3f7e6a3a09 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1783,9 +1783,10 @@ void blk_start_request(struct request *req) blk_dequeue_request(req); /* - * We are now handing the request to the hardware, add the - * timeout handler. + * We are now handing the request to the hardware, initialize + * resid_len to full count and add the timeout handler. */ + req->resid_len = blk_rq_bytes(req); blk_add_timer(req); } EXPORT_SYMBOL(blk_start_request); -- cgit v1.2.3 From 3a5a39276d2a32b05b1ee25b384516805b17cf87 Mon Sep 17 00:00:00 2001 From: James Bottomley Date: Sun, 17 May 2009 18:55:18 +0300 Subject: block: allow blk_rq_map_kern to append to requests Use blk_rq_append_bio() internally instead of blk_rq_bio_prep() so blk_rq_map_kern can be called multiple times, to map multiple buffers. This is in the effort to un-export blk_rq_append_bio() Signed-off-by: James Bottomley Signed-off-by: Boaz Harrosh Signed-off-by: Jens Axboe --- block/blk-map.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'block') diff --git a/block/blk-map.c b/block/blk-map.c index 56082bea450..caa05a66774 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -282,7 +282,8 @@ EXPORT_SYMBOL(blk_rq_unmap_user); * * Description: * Data will be mapped directly if possible. Otherwise a bounce - * buffer is used. + * buffer is used. Can be called multple times to append multple + * buffers. */ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf, unsigned int len, gfp_t gfp_mask) @@ -290,6 +291,7 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf, int reading = rq_data_dir(rq) == READ; int do_copy = 0; struct bio *bio; + int ret; if (len > (q->max_hw_sectors << 9)) return -EINVAL; @@ -311,7 +313,13 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf, if (do_copy) rq->cmd_flags |= REQ_COPY_USER; - blk_rq_bio_prep(q, rq, bio); + ret = blk_rq_append_bio(q, rq, bio); + if (unlikely(ret)) { + /* request is too big */ + bio_put(bio); + return ret; + } + blk_queue_bounce(q, &rq->bio); rq->buffer = NULL; return 0; -- cgit v1.2.3 From 79eb63e9e5875b84341a3a05f8e6ae9cdb4bb6f6 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh Date: Sun, 17 May 2009 18:57:15 +0300 Subject: block: Add blk_make_request(), takes bio, returns a request New block API: given a struct bio allocates a new request. This is the parallel of generic_make_request for BLOCK_PC commands users. The passed bio may be a chained-bio. The bio is bounced if needed inside the call to this member. This is in the effort of un-exporting blk_rq_append_bio(). Signed-off-by: Boaz Harrosh CC: Jeff Garzik Signed-off-by: Jens Axboe --- block/blk-core.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index e3f7e6a3a09..bec1d69952d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -890,6 +890,51 @@ struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) } EXPORT_SYMBOL(blk_get_request); +/** + * blk_make_request - given a bio, allocate a corresponding struct request. + * + * @bio: The bio describing the memory mappings that will be submitted for IO. + * It may be a chained-bio properly constructed by block/bio layer. + * + * blk_make_request is the parallel of generic_make_request for BLOCK_PC + * type commands. Where the struct request needs to be farther initialized by + * the caller. It is passed a &struct bio, which describes the memory info of + * the I/O transfer. + * + * The caller of blk_make_request must make sure that bi_io_vec + * are set to describe the memory buffers. That bio_data_dir() will return + * the needed direction of the request. (And all bio's in the passed bio-chain + * are properly set accordingly) + * + * If called under none-sleepable conditions, mapped bio buffers must not + * need bouncing, by calling the appropriate masked or flagged allocator, + * suitable for the target device. Otherwise the call to blk_queue_bounce will + * BUG. + */ +struct request *blk_make_request(struct request_queue *q, struct bio *bio, + gfp_t gfp_mask) +{ + struct request *rq = blk_get_request(q, bio_data_dir(bio), gfp_mask); + + if (unlikely(!rq)) + return ERR_PTR(-ENOMEM); + + for_each_bio(bio) { + struct bio *bounce_bio = bio; + int ret; + + blk_queue_bounce(q, &bounce_bio); + ret = blk_rq_append_bio(q, rq, bounce_bio); + if (unlikely(ret)) { + blk_put_request(rq); + return ERR_PTR(ret); + } + } + + return rq; +} +EXPORT_SYMBOL(blk_make_request); + /** * blk_requeue_request - put a request back on queue * @q: request queue where request should be inserted -- cgit v1.2.3 From a411f4bbb89f1f08687b344064d6775bce1e4658 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh Date: Sun, 17 May 2009 19:00:01 +0300 Subject: block: Un-export blk_rq_append_bio OSD was the last in-tree user of blk_rq_append_bio(). Now that it is fixed blk_rq_append_bio is un-exported and is only used internally by block layer. Signed-off-by: Boaz Harrosh Signed-off-by: Jens Axboe --- block/blk-map.c | 1 - block/blk.h | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) (limited to 'block') diff --git a/block/blk-map.c b/block/blk-map.c index caa05a66774..ef2492adca7 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -24,7 +24,6 @@ int blk_rq_append_bio(struct request_queue *q, struct request *rq, } return 0; } -EXPORT_SYMBOL(blk_rq_append_bio); static int __blk_rq_unmap_user(struct bio *bio) { diff --git a/block/blk.h b/block/blk.h index 9e0042ca949..c863ec2281e 100644 --- a/block/blk.h +++ b/block/blk.h @@ -13,6 +13,8 @@ extern struct kobj_type blk_queue_ktype; void init_request_from_bio(struct request *req, struct bio *bio); void blk_rq_bio_prep(struct request_queue *q, struct request *rq, struct bio *bio); +int blk_rq_append_bio(struct request_queue *q, struct request *rq, + struct bio *bio); void blk_dequeue_request(struct request *rq); void __blk_queue_free_tags(struct request_queue *q); -- cgit v1.2.3 From 53674ac5a997a8eedbb2dfdc308b895170746c8b Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Tue, 19 May 2009 19:52:35 +0200 Subject: block: add warning to blk_make_request() Add a note about how one needs to be careful when setting up these bio chains. Extracted from Boaz's updated patch. Signed-off-by: Jens Axboe --- block/blk-core.c | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index bec1d69952d..49065075d46 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -910,6 +910,15 @@ EXPORT_SYMBOL(blk_get_request); * need bouncing, by calling the appropriate masked or flagged allocator, * suitable for the target device. Otherwise the call to blk_queue_bounce will * BUG. + * + * WARNING: When allocating/cloning a bio-chain, careful consideration should be + * given to how you allocate bios. In particular, you cannot use __GFP_WAIT for + * anything but the first bio in the chain. Otherwise you risk waiting for IO + * completion of a bio that hasn't been submitted yet, thus resulting in a + * deadlock. Alternatively bios should be allocated using bio_kmalloc() instead + * of bio_alloc(), as that avoids the mempool deadlock. + * If possible a big IO should be split into smaller parts when allocation + * fails. Partial allocation should not be an error, or you risk a live-lock. */ struct request *blk_make_request(struct request_queue *q, struct bio *bio, gfp_t gfp_mask) -- cgit v1.2.3 From 0a7ae2ff0d29bb3b327edff4c8ab67b3834fa811 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Wed, 20 May 2009 08:54:31 +0200 Subject: block: change the tag sync vs async restriction logic Make them fully share the tag space, but disallow async requests using the last any two slots. Signed-off-by: Jens Axboe --- block/blk-barrier.c | 2 +- block/blk-core.c | 2 +- block/blk-tag.c | 15 +++++++++------ block/elevator.c | 8 ++++---- 4 files changed, 15 insertions(+), 12 deletions(-) (limited to 'block') diff --git a/block/blk-barrier.c b/block/blk-barrier.c index 0ab81a0a750..0d98054cdbd 100644 --- a/block/blk-barrier.c +++ b/block/blk-barrier.c @@ -218,7 +218,7 @@ static inline bool start_ordered(struct request_queue *q, struct request **rqp) } else skip |= QUEUE_ORDSEQ_PREFLUSH; - if ((q->ordered & QUEUE_ORDERED_BY_DRAIN) && q->in_flight) + if ((q->ordered & QUEUE_ORDERED_BY_DRAIN) && queue_in_flight(q)) rq = NULL; else skip |= QUEUE_ORDSEQ_DRAIN; diff --git a/block/blk-core.c b/block/blk-core.c index 49065075d46..1c748403882 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1815,7 +1815,7 @@ void blk_dequeue_request(struct request *rq) * the driver side. */ if (blk_account_rq(rq)) - q->in_flight++; + q->in_flight[rq_is_sync(rq)]++; } /** diff --git a/block/blk-tag.c b/block/blk-tag.c index c260f7c30dd..2e5cfeb5933 100644 --- a/block/blk-tag.c +++ b/block/blk-tag.c @@ -336,7 +336,7 @@ EXPORT_SYMBOL(blk_queue_end_tag); int blk_queue_start_tag(struct request_queue *q, struct request *rq) { struct blk_queue_tag *bqt = q->queue_tags; - unsigned max_depth, offset; + unsigned max_depth; int tag; if (unlikely((rq->cmd_flags & REQ_QUEUED))) { @@ -355,13 +355,16 @@ int blk_queue_start_tag(struct request_queue *q, struct request *rq) * to starve sync IO on behalf of flooding async IO. */ max_depth = bqt->max_depth; - if (rq_is_sync(rq)) - offset = 0; - else - offset = max_depth >> 2; + if (!rq_is_sync(rq) && max_depth > 1) { + max_depth -= 2; + if (!max_depth) + max_depth = 1; + if (q->in_flight[0] > max_depth) + return 1; + } do { - tag = find_next_zero_bit(bqt->tag_map, max_depth, offset); + tag = find_first_zero_bit(bqt->tag_map, max_depth); if (tag >= max_depth) return 1; diff --git a/block/elevator.c b/block/elevator.c index 918920056e4..ebee948293e 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -546,7 +546,7 @@ void elv_requeue_request(struct request_queue *q, struct request *rq) * in_flight count again */ if (blk_account_rq(rq)) { - q->in_flight--; + q->in_flight[rq_is_sync(rq)]--; if (blk_sorted_rq(rq)) elv_deactivate_rq(q, rq); } @@ -685,7 +685,7 @@ void elv_insert(struct request_queue *q, struct request *rq, int where) if (unplug_it && blk_queue_plugged(q)) { int nrq = q->rq.count[BLK_RW_SYNC] + q->rq.count[BLK_RW_ASYNC] - - q->in_flight; + - queue_in_flight(q); if (nrq >= q->unplug_thresh) __generic_unplug_device(q); @@ -823,7 +823,7 @@ void elv_completed_request(struct request_queue *q, struct request *rq) * request is released from the driver, io must be done */ if (blk_account_rq(rq)) { - q->in_flight--; + q->in_flight[rq_is_sync(rq)]--; if (blk_sorted_rq(rq) && e->ops->elevator_completed_req_fn) e->ops->elevator_completed_req_fn(q, rq); } @@ -838,7 +838,7 @@ void elv_completed_request(struct request_queue *q, struct request *rq) if (!list_empty(&q->queue_head)) next = list_entry_rq(q->queue_head.next); - if (!q->in_flight && + if (!queue_in_flight(q) && blk_ordered_cur_seq(q) == QUEUE_ORDSEQ_DRAIN && (!next || blk_ordered_req_seq(next) > QUEUE_ORDSEQ_DRAIN)) { blk_ordered_complete_seq(q, QUEUE_ORDSEQ_DRAIN, 0); -- cgit v1.2.3 From e1defc4ff0cf57aca6c5e3ff99fa503f5943c1f1 Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Fri, 22 May 2009 17:17:49 -0400 Subject: block: Do away with the notion of hardsect_size Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe --- block/blk-integrity.c | 2 +- block/blk-settings.c | 21 ++++++++++----------- block/blk-sysfs.c | 12 +++++++++--- block/compat_ioctl.c | 2 +- block/ioctl.c | 2 +- 5 files changed, 22 insertions(+), 17 deletions(-) (limited to 'block') diff --git a/block/blk-integrity.c b/block/blk-integrity.c index 91fa8e06b6a..73e28d35568 100644 --- a/block/blk-integrity.c +++ b/block/blk-integrity.c @@ -340,7 +340,7 @@ int blk_integrity_register(struct gendisk *disk, struct blk_integrity *template) kobject_uevent(&bi->kobj, KOBJ_ADD); bi->flags |= INTEGRITY_FLAG_READ | INTEGRITY_FLAG_WRITE; - bi->sector_size = disk->queue->hardsect_size; + bi->sector_size = queue_logical_block_size(disk->queue); disk->integrity = bi; } else bi = disk->integrity; diff --git a/block/blk-settings.c b/block/blk-settings.c index 57af728d94b..15c3164537b 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -134,7 +134,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) q->backing_dev_info.state = 0; q->backing_dev_info.capabilities = BDI_CAP_MAP_COPY; blk_queue_max_sectors(q, SAFE_MAX_SECTORS); - blk_queue_hardsect_size(q, 512); + blk_queue_logical_block_size(q, 512); blk_queue_dma_alignment(q, 511); blk_queue_congestion_threshold(q); q->nr_batching = BLK_BATCH_REQ; @@ -288,21 +288,20 @@ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size) EXPORT_SYMBOL(blk_queue_max_segment_size); /** - * blk_queue_hardsect_size - set hardware sector size for the queue + * blk_queue_logical_block_size - set logical block size for the queue * @q: the request queue for the device - * @size: the hardware sector size, in bytes + * @size: the logical block size, in bytes * * Description: - * This should typically be set to the lowest possible sector size - * that the hardware can operate on (possible without reverting to - * even internal read-modify-write operations). Usually the default - * of 512 covers most hardware. + * This should be set to the lowest possible block size that the + * storage device can address. The default of 512 covers most + * hardware. **/ -void blk_queue_hardsect_size(struct request_queue *q, unsigned short size) +void blk_queue_logical_block_size(struct request_queue *q, unsigned short size) { - q->hardsect_size = size; + q->logical_block_size = size; } -EXPORT_SYMBOL(blk_queue_hardsect_size); +EXPORT_SYMBOL(blk_queue_logical_block_size); /* * Returns the minimum that is _not_ zero, unless both are zero. @@ -324,7 +323,7 @@ void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b) t->max_phys_segments = min_not_zero(t->max_phys_segments, b->max_phys_segments); t->max_hw_segments = min_not_zero(t->max_hw_segments, b->max_hw_segments); t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); - t->hardsect_size = max(t->hardsect_size, b->hardsect_size); + t->logical_block_size = max(t->logical_block_size, b->logical_block_size); if (!t->queue_lock) WARN_ON_ONCE(1); else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) { diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 3ff9bba3379..13d38b7e4d0 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -100,9 +100,9 @@ static ssize_t queue_max_sectors_show(struct request_queue *q, char *page) return queue_var_show(max_sectors_kb, (page)); } -static ssize_t queue_hw_sector_size_show(struct request_queue *q, char *page) +static ssize_t queue_logical_block_size_show(struct request_queue *q, char *page) { - return queue_var_show(q->hardsect_size, page); + return queue_var_show(queue_logical_block_size(q), page); } static ssize_t @@ -249,7 +249,12 @@ static struct queue_sysfs_entry queue_iosched_entry = { static struct queue_sysfs_entry queue_hw_sector_size_entry = { .attr = {.name = "hw_sector_size", .mode = S_IRUGO }, - .show = queue_hw_sector_size_show, + .show = queue_logical_block_size_show, +}; + +static struct queue_sysfs_entry queue_logical_block_size_entry = { + .attr = {.name = "logical_block_size", .mode = S_IRUGO }, + .show = queue_logical_block_size_show, }; static struct queue_sysfs_entry queue_nonrot_entry = { @@ -283,6 +288,7 @@ static struct attribute *default_attrs[] = { &queue_max_sectors_entry.attr, &queue_iosched_entry.attr, &queue_hw_sector_size_entry.attr, + &queue_logical_block_size_entry.attr, &queue_nonrot_entry.attr, &queue_nomerges_entry.attr, &queue_rq_affinity_entry.attr, diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c index f87615dea46..9eaa1940273 100644 --- a/block/compat_ioctl.c +++ b/block/compat_ioctl.c @@ -763,7 +763,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg) case BLKBSZGET_32: /* get the logical block size (cf. BLKSSZGET) */ return compat_put_int(arg, block_size(bdev)); case BLKSSZGET: /* get block device hardware sector size */ - return compat_put_int(arg, bdev_hardsect_size(bdev)); + return compat_put_int(arg, bdev_logical_block_size(bdev)); case BLKSECTGET: return compat_put_ushort(arg, bdev_get_queue(bdev)->max_sectors); diff --git a/block/ioctl.c b/block/ioctl.c index ad474d4bbcc..7aa97f65da8 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -311,7 +311,7 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, case BLKBSZGET: /* get the logical block size (cf. BLKSSZGET) */ return put_int(arg, block_size(bdev)); case BLKSSZGET: /* get block device hardware sector size */ - return put_int(arg, bdev_hardsect_size(bdev)); + return put_int(arg, bdev_logical_block_size(bdev)); case BLKSECTGET: return put_ushort(arg, bdev_get_queue(bdev)->max_sectors); case BLKRASET: -- cgit v1.2.3 From ae03bf639a5027d27270123f5f6e3ee6a412781d Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Fri, 22 May 2009 17:17:50 -0400 Subject: block: Use accessor functions for queue limits Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe --- block/blk-barrier.c | 8 ++++---- block/blk-core.c | 16 ++++++++-------- block/blk-map.c | 4 ++-- block/blk-merge.c | 27 ++++++++++++++------------- block/blk-settings.c | 15 ++++++++++++--- block/blk-sysfs.c | 8 ++++---- block/compat_ioctl.c | 2 +- block/ioctl.c | 10 +++++----- block/scsi_ioctl.c | 8 ++++---- 9 files changed, 54 insertions(+), 44 deletions(-) (limited to 'block') diff --git a/block/blk-barrier.c b/block/blk-barrier.c index 0d98054cdbd..30022b4e2f6 100644 --- a/block/blk-barrier.c +++ b/block/blk-barrier.c @@ -388,10 +388,10 @@ int blkdev_issue_discard(struct block_device *bdev, bio->bi_sector = sector; - if (nr_sects > q->max_hw_sectors) { - bio->bi_size = q->max_hw_sectors << 9; - nr_sects -= q->max_hw_sectors; - sector += q->max_hw_sectors; + if (nr_sects > queue_max_hw_sectors(q)) { + bio->bi_size = queue_max_hw_sectors(q) << 9; + nr_sects -= queue_max_hw_sectors(q); + sector += queue_max_hw_sectors(q); } else { bio->bi_size = nr_sects << 9; nr_sects = 0; diff --git a/block/blk-core.c b/block/blk-core.c index 59c4af52311..7a4c40184a6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1437,11 +1437,11 @@ static inline void __generic_make_request(struct bio *bio) goto end_io; } - if (unlikely(nr_sectors > q->max_hw_sectors)) { + if (unlikely(nr_sectors > queue_max_hw_sectors(q))) { printk(KERN_ERR "bio too big device %s (%u > %u)\n", - bdevname(bio->bi_bdev, b), - bio_sectors(bio), - q->max_hw_sectors); + bdevname(bio->bi_bdev, b), + bio_sectors(bio), + queue_max_hw_sectors(q)); goto end_io; } @@ -1608,8 +1608,8 @@ EXPORT_SYMBOL(submit_bio); */ int blk_rq_check_limits(struct request_queue *q, struct request *rq) { - if (blk_rq_sectors(rq) > q->max_sectors || - blk_rq_bytes(rq) > q->max_hw_sectors << 9) { + if (blk_rq_sectors(rq) > queue_max_sectors(q) || + blk_rq_bytes(rq) > queue_max_hw_sectors(q) << 9) { printk(KERN_ERR "%s: over max size limit.\n", __func__); return -EIO; } @@ -1621,8 +1621,8 @@ int blk_rq_check_limits(struct request_queue *q, struct request *rq) * limitation. */ blk_recalc_rq_segments(rq); - if (rq->nr_phys_segments > q->max_phys_segments || - rq->nr_phys_segments > q->max_hw_segments) { + if (rq->nr_phys_segments > queue_max_phys_segments(q) || + rq->nr_phys_segments > queue_max_hw_segments(q)) { printk(KERN_ERR "%s: over max segments limit.\n", __func__); return -EIO; } diff --git a/block/blk-map.c b/block/blk-map.c index ef2492adca7..9083cf0180c 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -115,7 +115,7 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq, struct bio *bio = NULL; int ret; - if (len > (q->max_hw_sectors << 9)) + if (len > (queue_max_hw_sectors(q) << 9)) return -EINVAL; if (!len) return -EINVAL; @@ -292,7 +292,7 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf, struct bio *bio; int ret; - if (len > (q->max_hw_sectors << 9)) + if (len > (queue_max_hw_sectors(q) << 9)) return -EINVAL; if (!len || !kbuf) return -EINVAL; diff --git a/block/blk-merge.c b/block/blk-merge.c index 4974dd5767e..39ce64432ba 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -32,11 +32,12 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q, * never considered part of another segment, since that * might change with the bounce page. */ - high = page_to_pfn(bv->bv_page) > q->bounce_pfn; + high = page_to_pfn(bv->bv_page) > queue_bounce_pfn(q); if (high || highprv) goto new_segment; if (cluster) { - if (seg_size + bv->bv_len > q->max_segment_size) + if (seg_size + bv->bv_len + > queue_max_segment_size(q)) goto new_segment; if (!BIOVEC_PHYS_MERGEABLE(bvprv, bv)) goto new_segment; @@ -91,7 +92,7 @@ static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio, return 0; if (bio->bi_seg_back_size + nxt->bi_seg_front_size > - q->max_segment_size) + queue_max_segment_size(q)) return 0; if (!bio_has_data(bio)) @@ -134,7 +135,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq, int nbytes = bvec->bv_len; if (bvprv && cluster) { - if (sg->length + nbytes > q->max_segment_size) + if (sg->length + nbytes > queue_max_segment_size(q)) goto new_segment; if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec)) @@ -205,8 +206,8 @@ static inline int ll_new_hw_segment(struct request_queue *q, { int nr_phys_segs = bio_phys_segments(q, bio); - if (req->nr_phys_segments + nr_phys_segs > q->max_hw_segments - || req->nr_phys_segments + nr_phys_segs > q->max_phys_segments) { + if (req->nr_phys_segments + nr_phys_segs > queue_max_hw_segments(q) || + req->nr_phys_segments + nr_phys_segs > queue_max_phys_segments(q)) { req->cmd_flags |= REQ_NOMERGE; if (req == q->last_merge) q->last_merge = NULL; @@ -227,9 +228,9 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req, unsigned short max_sectors; if (unlikely(blk_pc_request(req))) - max_sectors = q->max_hw_sectors; + max_sectors = queue_max_hw_sectors(q); else - max_sectors = q->max_sectors; + max_sectors = queue_max_sectors(q); if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) { req->cmd_flags |= REQ_NOMERGE; @@ -251,9 +252,9 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req, unsigned short max_sectors; if (unlikely(blk_pc_request(req))) - max_sectors = q->max_hw_sectors; + max_sectors = queue_max_hw_sectors(q); else - max_sectors = q->max_sectors; + max_sectors = queue_max_sectors(q); if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) { @@ -287,7 +288,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, /* * Will it become too large? */ - if ((blk_rq_sectors(req) + blk_rq_sectors(next)) > q->max_sectors) + if ((blk_rq_sectors(req) + blk_rq_sectors(next)) > queue_max_sectors(q)) return 0; total_phys_segments = req->nr_phys_segments + next->nr_phys_segments; @@ -299,10 +300,10 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, total_phys_segments--; } - if (total_phys_segments > q->max_phys_segments) + if (total_phys_segments > queue_max_phys_segments(q)) return 0; - if (total_phys_segments > q->max_hw_segments) + if (total_phys_segments > queue_max_hw_segments(q)) return 0; /* Merge is OK... */ diff --git a/block/blk-settings.c b/block/blk-settings.c index 15c3164537b..0b32f984eed 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -219,6 +219,15 @@ void blk_queue_max_sectors(struct request_queue *q, unsigned int max_sectors) } EXPORT_SYMBOL(blk_queue_max_sectors); +void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_sectors) +{ + if (BLK_DEF_MAX_SECTORS > max_sectors) + q->max_hw_sectors = BLK_DEF_MAX_SECTORS; + else + q->max_hw_sectors = max_sectors; +} +EXPORT_SYMBOL(blk_queue_max_hw_sectors); + /** * blk_queue_max_phys_segments - set max phys segments for a request for this queue * @q: the request queue for the device @@ -395,11 +404,11 @@ int blk_queue_dma_drain(struct request_queue *q, dma_drain_needed_fn *dma_drain_needed, void *buf, unsigned int size) { - if (q->max_hw_segments < 2 || q->max_phys_segments < 2) + if (queue_max_hw_segments(q) < 2 || queue_max_phys_segments(q) < 2) return -EINVAL; /* make room for appending the drain */ - --q->max_hw_segments; - --q->max_phys_segments; + blk_queue_max_hw_segments(q, queue_max_hw_segments(q) - 1); + blk_queue_max_phys_segments(q, queue_max_phys_segments(q) - 1); q->dma_drain_needed = dma_drain_needed; q->dma_drain_buffer = buf; q->dma_drain_size = size; diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 13d38b7e4d0..142a4acddd4 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -95,7 +95,7 @@ queue_ra_store(struct request_queue *q, const char *page, size_t count) static ssize_t queue_max_sectors_show(struct request_queue *q, char *page) { - int max_sectors_kb = q->max_sectors >> 1; + int max_sectors_kb = queue_max_sectors(q) >> 1; return queue_var_show(max_sectors_kb, (page)); } @@ -109,7 +109,7 @@ static ssize_t queue_max_sectors_store(struct request_queue *q, const char *page, size_t count) { unsigned long max_sectors_kb, - max_hw_sectors_kb = q->max_hw_sectors >> 1, + max_hw_sectors_kb = queue_max_hw_sectors(q) >> 1, page_kb = 1 << (PAGE_CACHE_SHIFT - 10); ssize_t ret = queue_var_store(&max_sectors_kb, page, count); @@ -117,7 +117,7 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count) return -EINVAL; spin_lock_irq(q->queue_lock); - q->max_sectors = max_sectors_kb << 1; + blk_queue_max_sectors(q, max_sectors_kb << 1); spin_unlock_irq(q->queue_lock); return ret; @@ -125,7 +125,7 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count) static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page) { - int max_hw_sectors_kb = q->max_hw_sectors >> 1; + int max_hw_sectors_kb = queue_max_hw_sectors(q) >> 1; return queue_var_show(max_hw_sectors_kb, (page)); } diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c index 9eaa1940273..df18a156d01 100644 --- a/block/compat_ioctl.c +++ b/block/compat_ioctl.c @@ -766,7 +766,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg) return compat_put_int(arg, bdev_logical_block_size(bdev)); case BLKSECTGET: return compat_put_ushort(arg, - bdev_get_queue(bdev)->max_sectors); + queue_max_sectors(bdev_get_queue(bdev))); case BLKRASET: /* compatible, but no compat_ptr (!) */ case BLKFRASET: if (!capable(CAP_SYS_ADMIN)) diff --git a/block/ioctl.c b/block/ioctl.c index 7aa97f65da8..500e4c73cc5 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -152,10 +152,10 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start, bio->bi_private = &wait; bio->bi_sector = start; - if (len > q->max_hw_sectors) { - bio->bi_size = q->max_hw_sectors << 9; - len -= q->max_hw_sectors; - start += q->max_hw_sectors; + if (len > queue_max_hw_sectors(q)) { + bio->bi_size = queue_max_hw_sectors(q) << 9; + len -= queue_max_hw_sectors(q); + start += queue_max_hw_sectors(q); } else { bio->bi_size = len << 9; len = 0; @@ -313,7 +313,7 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, case BLKSSZGET: /* get block device hardware sector size */ return put_int(arg, bdev_logical_block_size(bdev)); case BLKSECTGET: - return put_ushort(arg, bdev_get_queue(bdev)->max_sectors); + return put_ushort(arg, queue_max_sectors(bdev_get_queue(bdev))); case BLKRASET: case BLKFRASET: if(!capable(CAP_SYS_ADMIN)) diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index a9670dd4b5d..5f8e798ede4 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -75,7 +75,7 @@ static int sg_set_timeout(struct request_queue *q, int __user *p) static int sg_get_reserved_size(struct request_queue *q, int __user *p) { - unsigned val = min(q->sg_reserved_size, q->max_sectors << 9); + unsigned val = min(q->sg_reserved_size, queue_max_sectors(q) << 9); return put_user(val, p); } @@ -89,8 +89,8 @@ static int sg_set_reserved_size(struct request_queue *q, int __user *p) if (size < 0) return -EINVAL; - if (size > (q->max_sectors << 9)) - size = q->max_sectors << 9; + if (size > (queue_max_sectors(q) << 9)) + size = queue_max_sectors(q) << 9; q->sg_reserved_size = size; return 0; @@ -264,7 +264,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk, if (hdr->cmd_len > BLK_MAX_CDB) return -EINVAL; - if (hdr->dxfer_len > (q->max_hw_sectors << 9)) + if (hdr->dxfer_len > (queue_max_hw_sectors(q) << 9)) return -EIO; if (hdr->dxfer_len) -- cgit v1.2.3 From 025146e13b63483add912706c101fb0fb6f015cc Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Fri, 22 May 2009 17:17:51 -0400 Subject: block: Move queue limits to an embedded struct To accommodate stacking drivers that do not have an associated request queue we're moving the limits to a separate, embedded structure. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe --- block/blk-settings.c | 55 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 34 insertions(+), 21 deletions(-) (limited to 'block') diff --git a/block/blk-settings.c b/block/blk-settings.c index 0b32f984eed..b0f547cecfb 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -179,16 +179,16 @@ void blk_queue_bounce_limit(struct request_queue *q, u64 dma_mask) */ if (b_pfn < (min_t(u64, 0xffffffffUL, BLK_BOUNCE_HIGH) >> PAGE_SHIFT)) dma = 1; - q->bounce_pfn = max_low_pfn; + q->limits.bounce_pfn = max_low_pfn; #else if (b_pfn < blk_max_low_pfn) dma = 1; - q->bounce_pfn = b_pfn; + q->limits.bounce_pfn = b_pfn; #endif if (dma) { init_emergency_isa_pool(); q->bounce_gfp = GFP_NOIO | GFP_DMA; - q->bounce_pfn = b_pfn; + q->limits.bounce_pfn = b_pfn; } } EXPORT_SYMBOL(blk_queue_bounce_limit); @@ -211,10 +211,10 @@ void blk_queue_max_sectors(struct request_queue *q, unsigned int max_sectors) } if (BLK_DEF_MAX_SECTORS > max_sectors) - q->max_hw_sectors = q->max_sectors = max_sectors; + q->limits.max_hw_sectors = q->limits.max_sectors = max_sectors; else { - q->max_sectors = BLK_DEF_MAX_SECTORS; - q->max_hw_sectors = max_sectors; + q->limits.max_sectors = BLK_DEF_MAX_SECTORS; + q->limits.max_hw_sectors = max_sectors; } } EXPORT_SYMBOL(blk_queue_max_sectors); @@ -222,9 +222,9 @@ EXPORT_SYMBOL(blk_queue_max_sectors); void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_sectors) { if (BLK_DEF_MAX_SECTORS > max_sectors) - q->max_hw_sectors = BLK_DEF_MAX_SECTORS; + q->limits.max_hw_sectors = BLK_DEF_MAX_SECTORS; else - q->max_hw_sectors = max_sectors; + q->limits.max_hw_sectors = max_sectors; } EXPORT_SYMBOL(blk_queue_max_hw_sectors); @@ -247,7 +247,7 @@ void blk_queue_max_phys_segments(struct request_queue *q, __func__, max_segments); } - q->max_phys_segments = max_segments; + q->limits.max_phys_segments = max_segments; } EXPORT_SYMBOL(blk_queue_max_phys_segments); @@ -271,7 +271,7 @@ void blk_queue_max_hw_segments(struct request_queue *q, __func__, max_segments); } - q->max_hw_segments = max_segments; + q->limits.max_hw_segments = max_segments; } EXPORT_SYMBOL(blk_queue_max_hw_segments); @@ -292,7 +292,7 @@ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size) __func__, max_size); } - q->max_segment_size = max_size; + q->limits.max_segment_size = max_size; } EXPORT_SYMBOL(blk_queue_max_segment_size); @@ -308,7 +308,7 @@ EXPORT_SYMBOL(blk_queue_max_segment_size); **/ void blk_queue_logical_block_size(struct request_queue *q, unsigned short size) { - q->logical_block_size = size; + q->limits.logical_block_size = size; } EXPORT_SYMBOL(blk_queue_logical_block_size); @@ -325,14 +325,27 @@ EXPORT_SYMBOL(blk_queue_logical_block_size); void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b) { /* zero is "infinity" */ - t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors); - t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors); - t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask, b->seg_boundary_mask); - - t->max_phys_segments = min_not_zero(t->max_phys_segments, b->max_phys_segments); - t->max_hw_segments = min_not_zero(t->max_hw_segments, b->max_hw_segments); - t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); - t->logical_block_size = max(t->logical_block_size, b->logical_block_size); + t->limits.max_sectors = min_not_zero(queue_max_sectors(t), + queue_max_sectors(b)); + + t->limits.max_hw_sectors = min_not_zero(queue_max_hw_sectors(t), + queue_max_hw_sectors(b)); + + t->limits.seg_boundary_mask = min_not_zero(queue_segment_boundary(t), + queue_segment_boundary(b)); + + t->limits.max_phys_segments = min_not_zero(queue_max_phys_segments(t), + queue_max_phys_segments(b)); + + t->limits.max_hw_segments = min_not_zero(queue_max_hw_segments(t), + queue_max_hw_segments(b)); + + t->limits.max_segment_size = min_not_zero(queue_max_segment_size(t), + queue_max_segment_size(b)); + + t->limits.logical_block_size = max(queue_logical_block_size(t), + queue_logical_block_size(b)); + if (!t->queue_lock) WARN_ON_ONCE(1); else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) { @@ -430,7 +443,7 @@ void blk_queue_segment_boundary(struct request_queue *q, unsigned long mask) __func__, mask); } - q->seg_boundary_mask = mask; + q->limits.seg_boundary_mask = mask; } EXPORT_SYMBOL(blk_queue_segment_boundary); -- cgit v1.2.3 From cd43e26f071524647e660706b784ebcbefbd2e44 Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Fri, 22 May 2009 17:17:52 -0400 Subject: block: Expose stacked device queues in sysfs Currently stacking devices do not have a queue directory in sysfs. However, many of the I/O characteristics like sector size, maximum request size, etc. are queue properties. This patch enables the queue directory for MD/DM devices. The elevator code has been modified to deal with queues that do not have an I/O scheduler. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe --- block/blk-sysfs.c | 6 +++--- block/elevator.c | 13 ++++++++++++- 2 files changed, 15 insertions(+), 4 deletions(-) (limited to 'block') diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 142a4acddd4..3ccdadb8e20 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -395,9 +395,6 @@ int blk_register_queue(struct gendisk *disk) if (WARN_ON(!q)) return -ENXIO; - if (!q->request_fn) - return 0; - ret = kobject_add(&q->kobj, kobject_get(&disk_to_dev(disk)->kobj), "%s", "queue"); if (ret < 0) @@ -405,6 +402,9 @@ int blk_register_queue(struct gendisk *disk) kobject_uevent(&q->kobj, KOBJ_ADD); + if (!q->request_fn) + return 0; + ret = elv_register_queue(q); if (ret) { kobject_uevent(&q->kobj, KOBJ_REMOVE); diff --git a/block/elevator.c b/block/elevator.c index ebee948293e..bd78f693fcf 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -575,6 +575,9 @@ void elv_drain_elevator(struct request_queue *q) */ void elv_quiesce_start(struct request_queue *q) { + if (!q->elevator) + return; + queue_flag_set(QUEUE_FLAG_ELVSWITCH, q); /* @@ -1050,6 +1053,9 @@ ssize_t elv_iosched_store(struct request_queue *q, const char *name, char elevator_name[ELV_NAME_MAX]; struct elevator_type *e; + if (!q->elevator) + return count; + strlcpy(elevator_name, name, sizeof(elevator_name)); strstrip(elevator_name); @@ -1073,10 +1079,15 @@ ssize_t elv_iosched_store(struct request_queue *q, const char *name, ssize_t elv_iosched_show(struct request_queue *q, char *name) { struct elevator_queue *e = q->elevator; - struct elevator_type *elv = e->elevator_type; + struct elevator_type *elv; struct elevator_type *__e; int len = 0; + if (!q->elevator) + return sprintf(name, "none\n"); + + elv = e->elevator_type; + spin_lock(&elv_list_lock); list_for_each_entry(__e, &elv_list, list) { if (!strcmp(elv->elevator_name, __e->elevator_name)) -- cgit v1.2.3 From c72758f33784e5e2a1a4bb9421ef3e6de8f9fcf3 Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Fri, 22 May 2009 17:17:53 -0400 Subject: block: Export I/O topology for block devices and partitions To support devices with physical block sizes bigger than 512 bytes we need to ensure proper alignment. This patch adds support for exposing I/O topology characteristics as devices are stacked. logical_block_size is the smallest unit the device can address. physical_block_size indicates the smallest I/O the device can write without incurring a read-modify-write penalty. The io_min parameter is the smallest preferred I/O size reported by the device. In many cases this is the same as the physical block size. However, the io_min parameter can be scaled up when stacking (RAID5 chunk size > physical block size). The io_opt characteristic indicates the optimal I/O size reported by the device. This is usually the stripe width for arrays. The alignment_offset parameter indicates the number of bytes the start of the device/partition is offset from the device's natural alignment. Partition tools and MD/DM utilities can use this to pad their offsets so filesystems start on proper boundaries. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe --- block/blk-settings.c | 186 +++++++++++++++++++++++++++++++++++++++++++++++++++ block/blk-sysfs.c | 33 +++++++++ block/genhd.c | 11 +++ 3 files changed, 230 insertions(+) (limited to 'block') diff --git a/block/blk-settings.c b/block/blk-settings.c index b0f547cecfb..5649f34adb4 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -309,9 +309,94 @@ EXPORT_SYMBOL(blk_queue_max_segment_size); void blk_queue_logical_block_size(struct request_queue *q, unsigned short size) { q->limits.logical_block_size = size; + + if (q->limits.physical_block_size < size) + q->limits.physical_block_size = size; + + if (q->limits.io_min < q->limits.physical_block_size) + q->limits.io_min = q->limits.physical_block_size; } EXPORT_SYMBOL(blk_queue_logical_block_size); +/** + * blk_queue_physical_block_size - set physical block size for the queue + * @q: the request queue for the device + * @size: the physical block size, in bytes + * + * Description: + * This should be set to the lowest possible sector size that the + * hardware can operate on without reverting to read-modify-write + * operations. + */ +void blk_queue_physical_block_size(struct request_queue *q, unsigned short size) +{ + q->limits.physical_block_size = size; + + if (q->limits.physical_block_size < q->limits.logical_block_size) + q->limits.physical_block_size = q->limits.logical_block_size; + + if (q->limits.io_min < q->limits.physical_block_size) + q->limits.io_min = q->limits.physical_block_size; +} +EXPORT_SYMBOL(blk_queue_physical_block_size); + +/** + * blk_queue_alignment_offset - set physical block alignment offset + * @q: the request queue for the device + * @alignment: alignment offset in bytes + * + * Description: + * Some devices are naturally misaligned to compensate for things like + * the legacy DOS partition table 63-sector offset. Low-level drivers + * should call this function for devices whose first sector is not + * naturally aligned. + */ +void blk_queue_alignment_offset(struct request_queue *q, unsigned int offset) +{ + q->limits.alignment_offset = + offset & (q->limits.physical_block_size - 1); + q->limits.misaligned = 0; +} +EXPORT_SYMBOL(blk_queue_alignment_offset); + +/** + * blk_queue_io_min - set minimum request size for the queue + * @q: the request queue for the device + * @io_min: smallest I/O size in bytes + * + * Description: + * Some devices have an internal block size bigger than the reported + * hardware sector size. This function can be used to signal the + * smallest I/O the device can perform without incurring a performance + * penalty. + */ +void blk_queue_io_min(struct request_queue *q, unsigned int min) +{ + q->limits.io_min = min; + + if (q->limits.io_min < q->limits.logical_block_size) + q->limits.io_min = q->limits.logical_block_size; + + if (q->limits.io_min < q->limits.physical_block_size) + q->limits.io_min = q->limits.physical_block_size; +} +EXPORT_SYMBOL(blk_queue_io_min); + +/** + * blk_queue_io_opt - set optimal request size for the queue + * @q: the request queue for the device + * @io_opt: optimal request size in bytes + * + * Description: + * Drivers can call this function to set the preferred I/O request + * size for devices that report such a value. + */ +void blk_queue_io_opt(struct request_queue *q, unsigned int opt) +{ + q->limits.io_opt = opt; +} +EXPORT_SYMBOL(blk_queue_io_opt); + /* * Returns the minimum that is _not_ zero, unless both are zero. */ @@ -357,6 +442,107 @@ void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b) } EXPORT_SYMBOL(blk_queue_stack_limits); +/** + * blk_stack_limits - adjust queue_limits for stacked devices + * @t: the stacking driver limits (top) + * @bdev: the underlying queue limits (bottom) + * @offset: offset to beginning of data within component device + * + * Description: + * Merges two queue_limit structs. Returns 0 if alignment didn't + * change. Returns -1 if adding the bottom device caused + * misalignment. + */ +int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, + sector_t offset) +{ + t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors); + t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors); + + t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask, + b->seg_boundary_mask); + + t->max_phys_segments = min_not_zero(t->max_phys_segments, + b->max_phys_segments); + + t->max_hw_segments = min_not_zero(t->max_hw_segments, + b->max_hw_segments); + + t->max_segment_size = min_not_zero(t->max_segment_size, + b->max_segment_size); + + t->logical_block_size = max(t->logical_block_size, + b->logical_block_size); + + t->physical_block_size = max(t->physical_block_size, + b->physical_block_size); + + t->io_min = max(t->io_min, b->io_min); + t->no_cluster |= b->no_cluster; + + /* Bottom device offset aligned? */ + if (offset && + (offset & (b->physical_block_size - 1)) != b->alignment_offset) { + t->misaligned = 1; + return -1; + } + + /* If top has no alignment offset, inherit from bottom */ + if (!t->alignment_offset) + t->alignment_offset = + b->alignment_offset & (b->physical_block_size - 1); + + /* Top device aligned on logical block boundary? */ + if (t->alignment_offset & (t->logical_block_size - 1)) { + t->misaligned = 1; + return -1; + } + + return 0; +} + +/** + * disk_stack_limits - adjust queue limits for stacked drivers + * @t: MD/DM gendisk (top) + * @bdev: the underlying block device (bottom) + * @offset: offset to beginning of data within component device + * + * Description: + * Merges the limits for two queues. Returns 0 if alignment + * didn't change. Returns -1 if adding the bottom device caused + * misalignment. + */ +void disk_stack_limits(struct gendisk *disk, struct block_device *bdev, + sector_t offset) +{ + struct request_queue *t = disk->queue; + struct request_queue *b = bdev_get_queue(bdev); + + offset += get_start_sect(bdev) << 9; + + if (blk_stack_limits(&t->limits, &b->limits, offset) < 0) { + char top[BDEVNAME_SIZE], bottom[BDEVNAME_SIZE]; + + disk_name(disk, 0, top); + bdevname(bdev, bottom); + + printk(KERN_NOTICE "%s: Warning: Device %s is misaligned\n", + top, bottom); + } + + if (!t->queue_lock) + WARN_ON_ONCE(1); + else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) { + unsigned long flags; + + spin_lock_irqsave(t->queue_lock, flags); + if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) + queue_flag_clear(QUEUE_FLAG_CLUSTER, t); + spin_unlock_irqrestore(t->queue_lock, flags); + } +} +EXPORT_SYMBOL(disk_stack_limits); + /** * blk_queue_dma_pad - set pad mask * @q: the request queue for the device diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 3ccdadb8e20..9337e17f911 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -105,6 +105,21 @@ static ssize_t queue_logical_block_size_show(struct request_queue *q, char *page return queue_var_show(queue_logical_block_size(q), page); } +static ssize_t queue_physical_block_size_show(struct request_queue *q, char *page) +{ + return queue_var_show(queue_physical_block_size(q), page); +} + +static ssize_t queue_io_min_show(struct request_queue *q, char *page) +{ + return queue_var_show(queue_io_min(q), page); +} + +static ssize_t queue_io_opt_show(struct request_queue *q, char *page) +{ + return queue_var_show(queue_io_opt(q), page); +} + static ssize_t queue_max_sectors_store(struct request_queue *q, const char *page, size_t count) { @@ -257,6 +272,21 @@ static struct queue_sysfs_entry queue_logical_block_size_entry = { .show = queue_logical_block_size_show, }; +static struct queue_sysfs_entry queue_physical_block_size_entry = { + .attr = {.name = "physical_block_size", .mode = S_IRUGO }, + .show = queue_physical_block_size_show, +}; + +static struct queue_sysfs_entry queue_io_min_entry = { + .attr = {.name = "minimum_io_size", .mode = S_IRUGO }, + .show = queue_io_min_show, +}; + +static struct queue_sysfs_entry queue_io_opt_entry = { + .attr = {.name = "optimal_io_size", .mode = S_IRUGO }, + .show = queue_io_opt_show, +}; + static struct queue_sysfs_entry queue_nonrot_entry = { .attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR }, .show = queue_nonrot_show, @@ -289,6 +319,9 @@ static struct attribute *default_attrs[] = { &queue_iosched_entry.attr, &queue_hw_sector_size_entry.attr, &queue_logical_block_size_entry.attr, + &queue_physical_block_size_entry.attr, + &queue_io_min_entry.attr, + &queue_io_opt_entry.attr, &queue_nonrot_entry.attr, &queue_nomerges_entry.attr, &queue_rq_affinity_entry.attr, diff --git a/block/genhd.c b/block/genhd.c index 1a4916e0173..fe7ccc0a618 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -852,11 +852,21 @@ static ssize_t disk_capability_show(struct device *dev, return sprintf(buf, "%x\n", disk->flags); } +static ssize_t disk_alignment_offset_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct gendisk *disk = dev_to_disk(dev); + + return sprintf(buf, "%d\n", queue_alignment_offset(disk->queue)); +} + static DEVICE_ATTR(range, S_IRUGO, disk_range_show, NULL); static DEVICE_ATTR(ext_range, S_IRUGO, disk_ext_range_show, NULL); static DEVICE_ATTR(removable, S_IRUGO, disk_removable_show, NULL); static DEVICE_ATTR(ro, S_IRUGO, disk_ro_show, NULL); static DEVICE_ATTR(size, S_IRUGO, part_size_show, NULL); +static DEVICE_ATTR(alignment_offset, S_IRUGO, disk_alignment_offset_show, NULL); static DEVICE_ATTR(capability, S_IRUGO, disk_capability_show, NULL); static DEVICE_ATTR(stat, S_IRUGO, part_stat_show, NULL); #ifdef CONFIG_FAIL_MAKE_REQUEST @@ -875,6 +885,7 @@ static struct attribute *disk_attrs[] = { &dev_attr_removable.attr, &dev_attr_ro.attr, &dev_attr_size.attr, + &dev_attr_alignment_offset.attr, &dev_attr_capability.attr, &dev_attr_stat.attr, #ifdef CONFIG_FAIL_MAKE_REQUEST -- cgit v1.2.3 From ba396a6c104682dfe5c8b4fbbf5974d5ac9f3687 Mon Sep 17 00:00:00 2001 From: James Bottomley Date: Wed, 27 May 2009 14:17:08 +0200 Subject: block: fix oops with block tag queueing commit e8939a50466fd963eb1ba9118c34b9ffb7ff6aa6 Author: Tejun Heo Date: Fri May 8 11:54:16 2009 +0900 block: implement and enforce request peek/start/fetch Added a BUG_ON(blk_queued_rq(req)) to the top of blk_finish_req(). Unfortunately, this checks whether req->queuelist is empty. This list is doing double duty both as the queue list and the tag list, so tagged requests come in here with this not empty and boom (the tag list is emptied by blk_queue_end_tag() lower down). Fix this by moving the BUG_ON to below the end tag we also seem vulnerable to this in blk_requeue_request() as well. I think all uses of blk_queued_rq() need auditing because the check is clearly wrong in the tagged case. Signed-off-by: James Bottomley Signed-off-by: Jens Axboe --- block/blk-core.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 7a4c40184a6..8b3b74e6918 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -956,8 +956,6 @@ EXPORT_SYMBOL(blk_make_request); */ void blk_requeue_request(struct request_queue *q, struct request *rq) { - BUG_ON(blk_queued_rq(rq)); - blk_delete_timer(rq); blk_clear_rq_complete(rq); trace_block_rq_requeue(q, rq); @@ -965,6 +963,8 @@ void blk_requeue_request(struct request_queue *q, struct request *rq) if (blk_rq_tagged(rq)) blk_queue_end_tag(q, rq); + BUG_ON(blk_queued_rq(rq)); + elv_requeue_request(q, rq); } EXPORT_SYMBOL(blk_requeue_request); @@ -2042,11 +2042,11 @@ static bool blk_update_bidi_request(struct request *rq, int error, */ static void blk_finish_request(struct request *req, int error) { - BUG_ON(blk_queued_rq(req)); - if (blk_rq_tagged(req)) blk_queue_end_tag(req->q, req); + BUG_ON(blk_queued_rq(req)); + if (unlikely(laptop_mode) && blk_fs_request(req)) laptop_io_completion(); -- cgit v1.2.3 From 3c4198e874cde694f5ea1463706873e7907bdb18 Mon Sep 17 00:00:00 2001 From: Kiyoshi Ueda Date: Wed, 27 May 2009 14:50:02 +0200 Subject: block: fix no diskstat problem The commit below in 2.6-block/for-2.6.31 causes no diskstat problem because the blk_discard_rq() check was added with '&&'. It should be 'blk_fs_request() || blk_discard_rq()'. This patch does it and fixes the no diskstat problem. Please review and apply. ------ /proc/diskstat without this patch ------------------------------------- 8 0 sda 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------ ----- /proc/diskstat with this patch applied --------------------------------- 8 0 sda 4186 303 373621 61600 9578 3859 107468 169479 2 89755 231059 ------------------------------------------------------------------------------ -------------------------------------------------------------------------- commit c69d48540c201394d08cb4d48b905e001313d9b8 Author: Jens Axboe Date: Fri Apr 24 08:12:19 2009 +0200 block: include discard requests in IO accounting We currently don't do merging on discard requests, but we potentially could. If we do, then we need to include discard requests in the IO accounting, or merging would end up decrementing in_flight IO counters for an IO which never incremented them. So enable accounting for discard requests. static inline int blk_do_io_stat(struct request *rq) { - return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq); + return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq) && + blk_discard_rq(rq); } -------------------------------------------------------------------------- Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Signed-off-by: Jens Axboe --- block/blk.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'block') diff --git a/block/blk.h b/block/blk.h index c863ec2281e..3fae6add543 100644 --- a/block/blk.h +++ b/block/blk.h @@ -156,12 +156,12 @@ static inline int blk_cpu_to_group(int cpu) * * a) it's attached to a gendisk, and * b) the queue had IO stats enabled when this request was started, and - * c) it's a file system request + * c) it's a file system request or a discard request */ static inline int blk_do_io_stat(struct request *rq) { - return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq) && - blk_discard_rq(rq); + return rq->rq_disk && blk_rq_io_stat(rq) && + (blk_fs_request(rq) || blk_discard_rq(rq)); } #endif -- cgit v1.2.3 From 5d85d3247cc3555215d7d11c78576a396c98e4d9 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Thu, 28 May 2009 11:04:53 +0200 Subject: block: export blk_stack_limits() DM needs to use blk_stack_limits(), so it needs to be exported. Acked-by: Martin K. Petersen Signed-off-by: Jens Axboe --- block/blk-settings.c | 1 + 1 file changed, 1 insertion(+) (limited to 'block') diff --git a/block/blk-settings.c b/block/blk-settings.c index 5649f34adb4..8d339349289 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -500,6 +500,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, return 0; } +EXPORT_SYMBOL(blk_stack_limits); /** * disk_stack_limits - adjust queue limits for stacked drivers -- cgit v1.2.3 From c143dc903d7c0b15f5052e00b2c7de33a8b4299c Mon Sep 17 00:00:00 2001 From: James Bottomley Date: Sat, 30 May 2009 06:43:49 +0200 Subject: block: fix an oops on BLKPREP_KILL Doing a bit of torture testing, I ran across a BUG in the block subsystem (at blk-core.c:2048): the test for if the request is queued. It turns out the trigger was a BLKPREP_KILL coming out of the SCSI prep function. Currently for BLKPREP_KILL requests, we send them straight into __blk_end_request_all() with an error, but they've never been dequeued, so they trip the bug. Fix this by starting requests before killing them. Signed-off-by: James Bottomley Signed-off-by: Jens Axboe --- block/blk-core.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 8b3b74e6918..7ae83a1e2ac 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1789,6 +1789,11 @@ struct request *blk_peek_request(struct request_queue *q) break; } else if (ret == BLKPREP_KILL) { rq->cmd_flags |= REQ_QUIET; + /* + * Mark this request as started so we don't trigger + * any debug logic in the end I/O path. + */ + blk_start_request(rq); __blk_end_request_all(rq, -EIO); } else { printk(KERN_ERR "%s: bad return=%d\n", __func__, ret); -- cgit v1.2.3 From 53c663ce0f39ba8e8ef652e400b317bc60ac7f19 Mon Sep 17 00:00:00 2001 From: Kiyoshi Ueda Date: Tue, 2 Jun 2009 08:44:01 +0200 Subject: block: fix a possible oops on elv_abort_queue() I found one more mis-conversion to the 'request is always dequeued when completing' model in elv_abort_queue() during code inspection. Although I haven't hit any problem caused by this mis-conversion yet and just done compile/boot test, please apply if you have no problem. Request must be dequeued when it completes. However, elv_abort_queue() completes requests without dequeueing. This will cause oops in the __blk_end_request_all(). This patch fixes the oops. Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Signed-off-by: Jens Axboe --- block/elevator.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'block') diff --git a/block/elevator.c b/block/elevator.c index bd78f693fcf..a029cfed80d 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -813,6 +813,11 @@ void elv_abort_queue(struct request_queue *q) rq = list_entry_rq(q->queue_head.next); rq->cmd_flags |= REQ_QUIET; trace_block_rq_abort(q, rq); + /* + * Mark this request as started so we don't trigger + * any debug logic in the end I/O path. + */ + blk_start_request(rq); __blk_end_request_all(rq, -EIO); } } -- cgit v1.2.3 From a05c0205ba031c01bba33a21bf0a35920eb64833 Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Wed, 3 Jun 2009 09:33:18 +0200 Subject: block: Fix bounce limit setting in DM blk_queue_bounce_limit() is more than a wrapper about the request queue limits.bounce_pfn variable. Introduce blk_queue_bounce_pfn() which can be called by stacking drivers that wish to set the bounce limit explicitly. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe --- block/blk-settings.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'block') diff --git a/block/blk-settings.c b/block/blk-settings.c index 8d339349289..9acd0b7e802 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -193,6 +193,23 @@ void blk_queue_bounce_limit(struct request_queue *q, u64 dma_mask) } EXPORT_SYMBOL(blk_queue_bounce_limit); +/** + * blk_queue_bounce_pfn - set the bounce buffer limit for queue + * @q: the request queue for the device + * @pfn: max address + * + * Description: + * This function is similar to blk_queue_bounce_limit except it + * neither changes allocation flags, nor does it set up the ISA DMA + * pool. This function should only be used by stacking drivers. + * Hardware drivers should use blk_queue_bounce_limit instead. + */ +void blk_queue_bounce_pfn(struct request_queue *q, u64 pfn) +{ + q->limits.bounce_pfn = pfn; +} +EXPORT_SYMBOL(blk_queue_bounce_pfn); + /** * blk_queue_max_sectors - set max sectors for a request for this queue * @q: the request queue for the device -- cgit v1.2.3 From dbb66c4be020b01dc2f3d7c609ddb0e09d2c0af7 Mon Sep 17 00:00:00 2001 From: FUJITA Tomonori Date: Tue, 9 Jun 2009 05:47:10 +0200 Subject: block: needs to set the residual length of a bidi request Tejun's "block: set rq->resid_len to blk_rq_bytes() on issue" patch seems to be incomplete; It doesn't set rq->resid_len to blk_rq_bytes() for a bidi request (req->next_rq). As a result, all bidi users are broken. Signed-off-by: FUJITA Tomonori Acked-by: Tejun Heo Signed-off-by: Jens Axboe --- block/blk-core.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 7ae83a1e2ac..03c5a64b6cc 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1846,6 +1846,9 @@ void blk_start_request(struct request *req) * resid_len to full count and add the timeout handler. */ req->resid_len = blk_rq_bytes(req); + if (unlikely(blk_bidi_rq(req))) + req->next_rq->resid_len = blk_rq_bytes(req->next_rq); + blk_add_timer(req); } EXPORT_SYMBOL(blk_start_request); -- cgit v1.2.3 From 9df1bb9b516daeece159ab7fb262d01a0359247c Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Tue, 9 Jun 2009 06:22:57 +0200 Subject: Revert "block: Fix bounce limit setting in DM" This reverts commit a05c0205ba031c01bba33a21bf0a35920eb64833. DM doesn't need to access the bounce_pfn directly. Signed-off-by: Jens Axboe --- block/blk-settings.c | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'block') diff --git a/block/blk-settings.c b/block/blk-settings.c index 9acd0b7e802..8d339349289 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -193,23 +193,6 @@ void blk_queue_bounce_limit(struct request_queue *q, u64 dma_mask) } EXPORT_SYMBOL(blk_queue_bounce_limit); -/** - * blk_queue_bounce_pfn - set the bounce buffer limit for queue - * @q: the request queue for the device - * @pfn: max address - * - * Description: - * This function is similar to blk_queue_bounce_limit except it - * neither changes allocation flags, nor does it set up the ISA DMA - * pool. This function should only be used by stacking drivers. - * Hardware drivers should use blk_queue_bounce_limit instead. - */ -void blk_queue_bounce_pfn(struct request_queue *q, u64 pfn) -{ - q->limits.bounce_pfn = pfn; -} -EXPORT_SYMBOL(blk_queue_bounce_pfn); - /** * blk_queue_max_sectors - set max sectors for a request for this queue * @q: the request queue for the device -- cgit v1.2.3 From 77634f33d4078542cf1087995cced0ffdae25aa2 Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Tue, 9 Jun 2009 06:23:22 +0200 Subject: block: Add missing bounce_pfn stacking and fix comments DM no longer needs to set limits explicitly when calling blk_stack_limits. Let the latter automatically deal with bounce_pfn scaling. Fix kerneldoc variable names. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe --- block/blk-settings.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'block') diff --git a/block/blk-settings.c b/block/blk-settings.c index 8d339349289..1c4df9bf681 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -445,7 +445,7 @@ EXPORT_SYMBOL(blk_queue_stack_limits); /** * blk_stack_limits - adjust queue_limits for stacked devices * @t: the stacking driver limits (top) - * @bdev: the underlying queue limits (bottom) + * @b: the underlying queue limits (bottom) * @offset: offset to beginning of data within component device * * Description: @@ -458,6 +458,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, { t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors); t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors); + t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn); t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask, b->seg_boundary_mask); @@ -504,7 +505,7 @@ EXPORT_SYMBOL(blk_stack_limits); /** * disk_stack_limits - adjust queue limits for stacked drivers - * @t: MD/DM gendisk (top) + * @disk: MD/DM gendisk (top) * @bdev: the underlying block device (bottom) * @offset: offset to beginning of data within component device * -- cgit v1.2.3 From 55782138e47d9baf2f7d3a7af9e7cf42adf72c56 Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Tue, 9 Jun 2009 13:43:05 +0800 Subject: tracing/events: convert block trace points to TRACE_EVENT() TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions ... Cons: - no dev_t info for the output of plug, unplug_timer and unplug_io events. no dev_t info for getrq and sleeprq events if bio == NULL. no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL. This is mainly because we can't get the deivce from a request queue. But this may change in the future. - A packet command is converted to a string in TP_assign, not TP_print. While blktrace do the convertion just before output. Since pc requests should be rather rare, this is not a big issue. - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT has a unique format, which means we have some unused data in a trace entry. The overhead is minimized by using __dynamic_array() instead of __array(). I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing: dd dd + ioctl blktrace dd + TRACE_EVENT (splice) 1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s 2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s 3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s So the overhead of tracing is very small, and no regression when using those trace events vs blktrace. And the binary output of TRACE_EVENT is much smaller than blktrace: # ls -l -h -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out Following are some comparisons between TRACE_EVENT and blktrace: plug: kjournald-480 [000] 303.084981: block_plug: [kjournald] kjournald-480 [000] 303.084981: 8,0 P N [kjournald] unplug_io: kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1 kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1 remap: kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384 kjournald-480 [000] 303.085043: 8,0 A W 102736992 + 8 <- (8,8) 33384 bio_backmerge: kjournald-480 [000] 303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald] kjournald-480 [000] 303.085086: 8,0 M W 102737032 + 8 [kjournald] getrq: kjournald-480 [000] 303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald] kjournald-480 [000] 303.084975: 8,0 G W 102736984 + 8 [kjournald] bash-2066 [001] 1072.953770: 8,0 G N [bash] bash-2066 [001] 1072.953773: block_getrq: 0,0 N 0 + 0 [bash] rq_complete: konsole-2065 [001] 300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0] konsole-2065 [001] 300.053191: 8,0 C W 103669040 + 16 [0] ksoftirqd/1-7 [001] 1072.953811: 8,0 C N (5a 00 08 00 00 00 00 00 24 00) [0] ksoftirqd/1-7 [001] 1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0] rq_insert: kjournald-480 [000] 303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald] kjournald-480 [000] 303.084986: 8,0 I W 102736984 + 8 [kjournald] Changelog from v2 -> v3: - use the newly introduced __dynamic_array(). Changelog from v1 -> v2: - use __string() instead of __array() to minimize the memory required to store hex dump of rq->cmd(). - support large pc requests. - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT. - some cleanups. Signed-off-by: Li Zefan LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com> Signed-off-by: Steven Rostedt --- block/blk-core.c | 16 ++++------------ block/elevator.c | 8 ++------ 2 files changed, 6 insertions(+), 18 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 1306de9cce0..9475bf99b89 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -28,22 +28,14 @@ #include #include #include -#include + +#define CREATE_TRACE_POINTS +#include #include "blk.h" -DEFINE_TRACE(block_plug); -DEFINE_TRACE(block_unplug_io); -DEFINE_TRACE(block_unplug_timer); -DEFINE_TRACE(block_getrq); -DEFINE_TRACE(block_sleeprq); -DEFINE_TRACE(block_rq_requeue); -DEFINE_TRACE(block_bio_backmerge); -DEFINE_TRACE(block_bio_frontmerge); -DEFINE_TRACE(block_bio_queue); -DEFINE_TRACE(block_rq_complete); -DEFINE_TRACE(block_remap); /* Also used in drivers/md/dm.c */ EXPORT_TRACEPOINT_SYMBOL_GPL(block_remap); +EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete); static int __make_request(struct request_queue *q, struct bio *bio); diff --git a/block/elevator.c b/block/elevator.c index 7073a907257..e220f0c543e 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -33,17 +33,16 @@ #include #include #include -#include #include #include +#include + #include "blk.h" static DEFINE_SPINLOCK(elv_list_lock); static LIST_HEAD(elv_list); -DEFINE_TRACE(block_rq_abort); - /* * Merge hash stuff. */ @@ -55,9 +54,6 @@ static const int elv_hash_shift = 6; #define rq_hash_key(rq) ((rq)->sector + (rq)->nr_sectors) #define ELV_ON_HASH(rq) (!hlist_unhashed(&(rq)->hash)) -DEFINE_TRACE(block_rq_insert); -DEFINE_TRACE(block_rq_issue); - /* * Query io scheduler to see if the current process issuing bio may be * merged with rq. -- cgit v1.2.3 From d9c7d394a8ebacb60097b192939ae9f15235225e Mon Sep 17 00:00:00 2001 From: Nikanth Karthikesan Date: Wed, 10 Jun 2009 12:57:06 -0700 Subject: block: prevent possible io_context->refcount overflow Currently io_context has an atomic_t(32-bit) as refcount. In the case of cfq, for each device against whcih a task does I/O, a reference to the io_context would be taken. And when there are multiple process sharing io_contexts(CLONE_IO) would also have a reference to the same io_context. Theoretically the possible maximum number of processes sharing the same io_context + the number of disks/cfq_data referring to the same io_context can overflow the 32-bit counter on a very high-end machine. Even though it is an improbable case, let us make it atomic_long_t. Signed-off-by: Nikanth Karthikesan Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe --- block/blk-ioc.c | 12 ++++++------ block/cfq-iosched.c | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) (limited to 'block') diff --git a/block/blk-ioc.c b/block/blk-ioc.c index 012f065ac8e..d4ed6000147 100644 --- a/block/blk-ioc.c +++ b/block/blk-ioc.c @@ -35,9 +35,9 @@ int put_io_context(struct io_context *ioc) if (ioc == NULL) return 1; - BUG_ON(atomic_read(&ioc->refcount) == 0); + BUG_ON(atomic_long_read(&ioc->refcount) == 0); - if (atomic_dec_and_test(&ioc->refcount)) { + if (atomic_long_dec_and_test(&ioc->refcount)) { rcu_read_lock(); if (ioc->aic && ioc->aic->dtor) ioc->aic->dtor(ioc->aic); @@ -90,7 +90,7 @@ struct io_context *alloc_io_context(gfp_t gfp_flags, int node) ret = kmem_cache_alloc_node(iocontext_cachep, gfp_flags, node); if (ret) { - atomic_set(&ret->refcount, 1); + atomic_long_set(&ret->refcount, 1); atomic_set(&ret->nr_tasks, 1); spin_lock_init(&ret->lock); ret->ioprio_changed = 0; @@ -151,7 +151,7 @@ struct io_context *get_io_context(gfp_t gfp_flags, int node) ret = current_io_context(gfp_flags, node); if (unlikely(!ret)) break; - } while (!atomic_inc_not_zero(&ret->refcount)); + } while (!atomic_long_inc_not_zero(&ret->refcount)); return ret; } @@ -163,8 +163,8 @@ void copy_io_context(struct io_context **pdst, struct io_context **psrc) struct io_context *dst = *pdst; if (src) { - BUG_ON(atomic_read(&src->refcount) == 0); - atomic_inc(&src->refcount); + BUG_ON(atomic_long_read(&src->refcount) == 0); + atomic_long_inc(&src->refcount); put_io_context(dst); *pdst = src; } diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 99ac4304d71..ef2f72d4243 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -1282,7 +1282,7 @@ static void cfq_dispatch_request(struct cfq_data *cfqd, struct cfq_queue *cfqq) if (!cfqd->active_cic) { struct cfq_io_context *cic = RQ_CIC(rq); - atomic_inc(&cic->ioc->refcount); + atomic_long_inc(&cic->ioc->refcount); cfqd->active_cic = cic; } } -- cgit v1.2.3 From b0fd271d5fba0b2d00888363f3869e3f9b26caa9 Mon Sep 17 00:00:00 2001 From: Kiyoshi Ueda Date: Thu, 11 Jun 2009 13:10:16 +0200 Subject: block: add request clone interface (v2) This patch adds the following 2 interfaces for request-stacking drivers: - blk_rq_prep_clone(struct request *clone, struct request *orig, struct bio_set *bs, gfp_t gfp_mask, int (*bio_ctr)(struct bio *, struct bio*, void *), void *data) * Clones bios in the original request to the clone request (bio_ctr is called for each cloned bios.) * Copies attributes of the original request to the clone request. The actual data parts (e.g. ->cmd, ->buffer, ->sense) are not copied. - blk_rq_unprep_clone(struct request *clone) * Frees cloned bios from the clone request. Request stacking drivers (e.g. request-based dm) need to make a clone request for a submitted request and dispatch it to other devices. To allocate request for the clone, request stacking drivers may not be able to use blk_get_request() because the allocation may be done in an irq-disabled context. So blk_rq_prep_clone() takes a request allocated by the caller as an argument. For each clone bio in the clone request, request stacking drivers should be able to set up their own completion handler. So blk_rq_prep_clone() takes a callback function which is called for each clone bio, and a pointer for private data which is passed to the callback. NOTE: blk_rq_prep_clone() doesn't copy any actual data of the original request. Pages are shared between original bios and cloned bios. So caller must not complete the original request before the clone request. Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Cc: Boaz Harrosh Signed-off-by: Jens Axboe --- block/blk-core.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 100 insertions(+) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 03c5a64b6cc..02a9252107a 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2295,6 +2295,106 @@ int blk_lld_busy(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_lld_busy); +/** + * blk_rq_unprep_clone - Helper function to free all bios in a cloned request + * @rq: the clone request to be cleaned up + * + * Description: + * Free all bios in @rq for a cloned request. + */ +void blk_rq_unprep_clone(struct request *rq) +{ + struct bio *bio; + + while ((bio = rq->bio) != NULL) { + rq->bio = bio->bi_next; + + bio_put(bio); + } +} +EXPORT_SYMBOL_GPL(blk_rq_unprep_clone); + +/* + * Copy attributes of the original request to the clone request. + * The actual data parts (e.g. ->cmd, ->buffer, ->sense) are not copied. + */ +static void __blk_rq_prep_clone(struct request *dst, struct request *src) +{ + dst->cpu = src->cpu; + dst->cmd_flags = (rq_data_dir(src) | REQ_NOMERGE); + dst->cmd_type = src->cmd_type; + dst->__sector = blk_rq_pos(src); + dst->__data_len = blk_rq_bytes(src); + dst->nr_phys_segments = src->nr_phys_segments; + dst->ioprio = src->ioprio; + dst->extra_len = src->extra_len; +} + +/** + * blk_rq_prep_clone - Helper function to setup clone request + * @rq: the request to be setup + * @rq_src: original request to be cloned + * @bs: bio_set that bios for clone are allocated from + * @gfp_mask: memory allocation mask for bio + * @bio_ctr: setup function to be called for each clone bio. + * Returns %0 for success, non %0 for failure. + * @data: private data to be passed to @bio_ctr + * + * Description: + * Clones bios in @rq_src to @rq, and copies attributes of @rq_src to @rq. + * The actual data parts of @rq_src (e.g. ->cmd, ->buffer, ->sense) + * are not copied, and copying such parts is the caller's responsibility. + * Also, pages which the original bios are pointing to are not copied + * and the cloned bios just point same pages. + * So cloned bios must be completed before original bios, which means + * the caller must complete @rq before @rq_src. + */ +int blk_rq_prep_clone(struct request *rq, struct request *rq_src, + struct bio_set *bs, gfp_t gfp_mask, + int (*bio_ctr)(struct bio *, struct bio *, void *), + void *data) +{ + struct bio *bio, *bio_src; + + if (!bs) + bs = fs_bio_set; + + blk_rq_init(NULL, rq); + + __rq_for_each_bio(bio_src, rq_src) { + bio = bio_alloc_bioset(gfp_mask, bio_src->bi_max_vecs, bs); + if (!bio) + goto free_and_out; + + __bio_clone(bio, bio_src); + + if (bio_integrity(bio_src) && + bio_integrity_clone(bio, bio_src, gfp_mask)) + goto free_and_out; + + if (bio_ctr && bio_ctr(bio, bio_src, data)) + goto free_and_out; + + if (rq->bio) { + rq->biotail->bi_next = bio; + rq->biotail = bio; + } else + rq->bio = rq->biotail = bio; + } + + __blk_rq_prep_clone(rq, rq_src); + + return 0; + +free_and_out: + if (bio) + bio_free(bio, bs); + blk_rq_unprep_clone(rq); + + return -ENOMEM; +} +EXPORT_SYMBOL_GPL(blk_rq_prep_clone); + int kblockd_schedule_work(struct request_queue *q, struct work_struct *work) { return queue_work(kblockd_workqueue, work); -- cgit v1.2.3 From 8ebf975608aaebd7feb33d77f07ba21a6380e086 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 11 Jun 2009 20:00:41 -0700 Subject: block: fix kernel-doc in recent block/ changes Fix kernel-doc warnings in recently changed block/ source code. Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- block/blk-core.c | 21 +++++++++++---------- block/blk-settings.c | 6 +++--- 2 files changed, 14 insertions(+), 13 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index d17d71c71d4..f6452f69250 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -884,9 +884,10 @@ EXPORT_SYMBOL(blk_get_request); /** * blk_make_request - given a bio, allocate a corresponding struct request. - * + * @q: target request queue * @bio: The bio describing the memory mappings that will be submitted for IO. * It may be a chained-bio properly constructed by block/bio layer. + * @gfp_mask: gfp flags to be used for memory allocation * * blk_make_request is the parallel of generic_make_request for BLOCK_PC * type commands. Where the struct request needs to be farther initialized by @@ -1872,14 +1873,14 @@ EXPORT_SYMBOL(blk_fetch_request); /** * blk_update_request - Special helper function for request stacking drivers - * @rq: the request being processed + * @req: the request being processed * @error: %0 for success, < %0 for error - * @nr_bytes: number of bytes to complete @rq + * @nr_bytes: number of bytes to complete @req * * Description: - * Ends I/O on a number of bytes attached to @rq, but doesn't complete - * the request structure even if @rq doesn't have leftover. - * If @rq has leftover, sets it up for the next range of segments. + * Ends I/O on a number of bytes attached to @req, but doesn't complete + * the request structure even if @req doesn't have leftover. + * If @req has leftover, sets it up for the next range of segments. * * This special helper function is only for request stacking drivers * (e.g. request-based dm) so that they can handle partial completion. @@ -2145,7 +2146,7 @@ EXPORT_SYMBOL_GPL(blk_end_request); /** * blk_end_request_all - Helper function for drives to finish the request. * @rq: the request to finish - * @err: %0 for success, < %0 for error + * @error: %0 for success, < %0 for error * * Description: * Completely finish @rq. @@ -2166,7 +2167,7 @@ EXPORT_SYMBOL_GPL(blk_end_request_all); /** * blk_end_request_cur - Helper function to finish the current request chunk. * @rq: the request to finish the current chunk for - * @err: %0 for success, < %0 for error + * @error: %0 for success, < %0 for error * * Description: * Complete the current consecutively mapped chunk from @rq. @@ -2203,7 +2204,7 @@ EXPORT_SYMBOL_GPL(__blk_end_request); /** * __blk_end_request_all - Helper function for drives to finish the request. * @rq: the request to finish - * @err: %0 for success, < %0 for error + * @error: %0 for success, < %0 for error * * Description: * Completely finish @rq. Must be called with queue lock held. @@ -2224,7 +2225,7 @@ EXPORT_SYMBOL_GPL(__blk_end_request_all); /** * __blk_end_request_cur - Helper function to finish the current request chunk. * @rq: the request to finish the current chunk for - * @err: %0 for success, < %0 for error + * @error: %0 for success, < %0 for error * * Description: * Complete the current consecutively mapped chunk from @rq. Must diff --git a/block/blk-settings.c b/block/blk-settings.c index 1c4df9bf681..d71cedc09c4 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -343,7 +343,7 @@ EXPORT_SYMBOL(blk_queue_physical_block_size); /** * blk_queue_alignment_offset - set physical block alignment offset * @q: the request queue for the device - * @alignment: alignment offset in bytes + * @offset: alignment offset in bytes * * Description: * Some devices are naturally misaligned to compensate for things like @@ -362,7 +362,7 @@ EXPORT_SYMBOL(blk_queue_alignment_offset); /** * blk_queue_io_min - set minimum request size for the queue * @q: the request queue for the device - * @io_min: smallest I/O size in bytes + * @min: smallest I/O size in bytes * * Description: * Some devices have an internal block size bigger than the reported @@ -385,7 +385,7 @@ EXPORT_SYMBOL(blk_queue_io_min); /** * blk_queue_io_opt - set optimal request size for the queue * @q: the request queue for the device - * @io_opt: optimal request size in bytes + * @opt: optimal request size in bytes * * Description: * Drivers can call this function to set the preferred I/O request -- cgit v1.2.3 From b03f38b685e2e1db174fb8982930e789a516f414 Mon Sep 17 00:00:00 2001 From: Kay Sievers Date: Thu, 30 Apr 2009 15:23:42 +0200 Subject: Driver Core: block: add nodename support for block drivers. This adds support for block drivers to report their requested nodename to userspace. It also updates a number of block drivers to provide the needed subdirectory and device name to be used for them. Signed-off-by: Kay Sievers Signed-off-by: Jan Blunck Signed-off-by: Greg Kroah-Hartman --- block/genhd.c | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'block') diff --git a/block/genhd.c b/block/genhd.c index fe7ccc0a618..f4c64c2b303 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -996,10 +996,20 @@ struct class block_class = { .name = "block", }; +static char *block_nodename(struct device *dev) +{ + struct gendisk *disk = dev_to_disk(dev); + + if (disk->nodename) + return disk->nodename(disk); + return NULL; +} + static struct device_type disk_type = { .name = "disk", .groups = disk_attr_groups, .release = disk_release, + .nodename = block_nodename, }; #ifdef CONFIG_PROC_FS -- cgit v1.2.3 From 2bdf914915e98fe82495d05741a57e37f4b604e8 Mon Sep 17 00:00:00 2001 From: Kay Sievers Date: Thu, 30 Apr 2009 15:23:42 +0200 Subject: Driver Core: bsg: add nodename for bsg driver This adds support to the BSG driver to report the proper device name to userspace for the bsg devices. Signed-off-by: Kay Sievers Signed-off-by: Jan Blunck Signed-off-by: Greg Kroah-Hartman --- block/bsg.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'block') diff --git a/block/bsg.c b/block/bsg.c index 5358f9ae13c..54106f052f7 100644 --- a/block/bsg.c +++ b/block/bsg.c @@ -1065,6 +1065,11 @@ EXPORT_SYMBOL_GPL(bsg_register_queue); static struct cdev bsg_cdev; +static char *bsg_nodename(struct device *dev) +{ + return kasprintf(GFP_KERNEL, "bsg/%s", dev_name(dev)); +} + static int __init bsg_init(void) { int ret, i; @@ -1085,6 +1090,7 @@ static int __init bsg_init(void) ret = PTR_ERR(bsg_class); goto destroy_kmemcache; } + bsg_class->nodename = bsg_nodename; ret = alloc_chrdev_region(&devid, 0, BSG_MAX_DEVS, "bsg"); if (ret) -- cgit v1.2.3 From 81be834713a7d6b9463663145d493fe9daee2d61 Mon Sep 17 00:00:00 2001 From: Gui Jianfeng Date: Fri, 12 Jun 2009 15:27:50 +0200 Subject: cfq: cleanup for last_end_request in cfq_data Actually, last_end_request in cfq_data isn't used now. So lets just remove it. Signed-off-by: Gui Jianfeng Signed-off-by: Jens Axboe --- block/cfq-iosched.c | 5 ----- 1 file changed, 5 deletions(-) (limited to 'block') diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index ef2f72d4243..8447fde9550 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -122,7 +122,6 @@ struct cfq_data { struct cfq_queue *async_idle_cfqq; sector_t last_position; - unsigned long last_end_request; /* * tunables, see top of file @@ -2164,9 +2163,6 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq) if (cfq_cfqq_sync(cfqq)) cfqd->sync_flight--; - if (!cfq_class_idle(cfqq)) - cfqd->last_end_request = now; - if (sync) RQ_CIC(rq)->last_end_request = now; @@ -2479,7 +2475,6 @@ static void *cfq_init_queue(struct request_queue *q) INIT_WORK(&cfqd->unplug_work, cfq_kick_queue); - cfqd->last_end_request = jiffies; cfqd->cfq_quantum = cfq_quantum; cfqd->cfq_fifo_expire[0] = cfq_fifo_expire[0]; cfqd->cfq_fifo_expire[1] = cfq_fifo_expire[1]; -- cgit v1.2.3 From 0989a025d2f4f9f51ea641bd26c563c53dc63f55 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Fri, 12 Jun 2009 14:42:56 +0200 Subject: block: don't overwrite bdi->state after bdi_init() has been run Move the defaults to where we do the init of the backing_dev_info. Signed-off-by: Jens Axboe --- block/blk-core.c | 5 +++++ block/blk-settings.c | 4 ---- 2 files changed, 5 insertions(+), 4 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index f6452f69250..94d88fabc4b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -498,6 +498,11 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) q->backing_dev_info.unplug_io_fn = blk_backing_dev_unplug; q->backing_dev_info.unplug_io_data = q; + q->backing_dev_info.ra_pages = + (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; + q->backing_dev_info.state = 0; + q->backing_dev_info.capabilities = BDI_CAP_MAP_COPY; + err = bdi_init(&q->backing_dev_info); if (err) { kmem_cache_free(blk_requestq_cachep, q); diff --git a/block/blk-settings.c b/block/blk-settings.c index d71cedc09c4..138610b9895 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -129,10 +129,6 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE); q->make_request_fn = mfn; - q->backing_dev_info.ra_pages = - (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; - q->backing_dev_info.state = 0; - q->backing_dev_info.capabilities = BDI_CAP_MAP_COPY; blk_queue_max_sectors(q, SAFE_MAX_SECTORS); blk_queue_logical_block_size(q, 512); blk_queue_dma_alignment(q, 511); -- cgit v1.2.3 From 6923715ae39ed39ac2fc1993e5061668f4f71ad0 Mon Sep 17 00:00:00 2001 From: Jeff Moyer Date: Fri, 12 Jun 2009 15:29:30 +0200 Subject: cfq: remove extraneous '\n' in blktrace output I noticed a blank line in blktrace output. This patch fixes that. Signed-off-by: Jeff Moyer Signed-off-by: Jens Axboe --- block/cfq-iosched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'block') diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 8447fde9550..833ec18eaa6 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -1252,7 +1252,7 @@ static int cfq_forced_dispatch(struct cfq_data *cfqd) BUG_ON(cfqd->busy_queues); - cfq_log(cfqd, "forced_dispatch=%d\n", dispatched); + cfq_log(cfqd, "forced_dispatch=%d", dispatched); return dispatched; } -- cgit v1.2.3 From e475bba2fdee9c3dbfe25f026f8fb8de69508ad2 Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Tue, 16 Jun 2009 08:23:52 +0200 Subject: block: Introduce helper to reset queue limits to default values DM reuses the request queue when swapping in a new device table Introduce blk_set_default_limits() which can be used to reset the the queue_limits prior to stacking devices. Signed-off-by: Martin K. Petersen Acked-by: Alasdair G Kergon Acked-by: Mike Snitzer Signed-off-by: Jens Axboe --- block/blk-settings.c | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-) (limited to 'block') diff --git a/block/blk-settings.c b/block/blk-settings.c index 138610b9895..7541ea4bf9f 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -95,6 +95,31 @@ void blk_queue_lld_busy(struct request_queue *q, lld_busy_fn *fn) } EXPORT_SYMBOL_GPL(blk_queue_lld_busy); +/** + * blk_set_default_limits - reset limits to default values + * @limits: the queue_limits structure to reset + * + * Description: + * Returns a queue_limit struct to its default state. Can be used by + * stacking drivers like DM that stage table swaps and reuse an + * existing device queue. + */ +void blk_set_default_limits(struct queue_limits *lim) +{ + lim->max_phys_segments = MAX_PHYS_SEGMENTS; + lim->max_hw_segments = MAX_HW_SEGMENTS; + lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK; + lim->max_segment_size = MAX_SEGMENT_SIZE; + lim->max_sectors = lim->max_hw_sectors = SAFE_MAX_SECTORS; + lim->logical_block_size = lim->physical_block_size = lim->io_min = 512; + lim->bounce_pfn = BLK_BOUNCE_ANY; + lim->alignment_offset = 0; + lim->io_opt = 0; + lim->misaligned = 0; + lim->no_cluster = 0; +} +EXPORT_SYMBOL(blk_set_default_limits); + /** * blk_queue_make_request - define an alternate make_request function for a device * @q: the request queue for the device to be affected @@ -123,14 +148,8 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) * set defaults */ q->nr_requests = BLKDEV_MAX_RQ; - blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS); - blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS); - blk_queue_segment_boundary(q, BLK_SEG_BOUNDARY_MASK); - blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE); q->make_request_fn = mfn; - blk_queue_max_sectors(q, SAFE_MAX_SECTORS); - blk_queue_logical_block_size(q, 512); blk_queue_dma_alignment(q, 511); blk_queue_congestion_threshold(q); q->nr_batching = BLK_BATCH_REQ; @@ -143,6 +162,8 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) q->unplug_timer.function = blk_unplug_timeout; q->unplug_timer.data = (unsigned long)q; + blk_set_default_limits(&q->limits); + /* * by default assume old behaviour and bounce for any highmem page */ -- cgit v1.2.3 From e212d6f25084e8e9b02a04ba514d7bb1e4a4924a Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Tue, 16 Jun 2009 11:19:36 +0200 Subject: block: remove some includings of blktrace_api.h When porting blktrace to tracepoints, we changed to trace/block.h for trace prober declarations. Signed-off-by: Li Zefan Signed-off-by: Jens Axboe --- block/blk-core.c | 1 - 1 file changed, 1 deletion(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index 94d88fabc4b..b06cf5c2a82 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -26,7 +26,6 @@ #include #include #include -#include #include #define CREATE_TRACE_POINTS -- cgit v1.2.3 From 3a02c8e8142f7f133d4c6e72bc3e1d830e6b8b9e Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Thu, 18 Jun 2009 09:56:03 +0200 Subject: block: Fix bounce_pfn setting Correct stacking bounce_pfn limit setting and prevent warnings on 32-bit. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe --- block/blk-settings.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'block') diff --git a/block/blk-settings.c b/block/blk-settings.c index 7541ea4bf9f..81bc0d1fdb5 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -112,7 +112,7 @@ void blk_set_default_limits(struct queue_limits *lim) lim->max_segment_size = MAX_SEGMENT_SIZE; lim->max_sectors = lim->max_hw_sectors = SAFE_MAX_SECTORS; lim->logical_block_size = lim->physical_block_size = lim->io_min = 512; - lim->bounce_pfn = BLK_BOUNCE_ANY; + lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT); lim->alignment_offset = 0; lim->io_opt = 0; lim->misaligned = 0; -- cgit v1.2.3 From 90c699a9ee4be165966d40f1837909ccb8890a68 Mon Sep 17 00:00:00 2001 From: Bartlomiej Zolnierkiewicz Date: Fri, 19 Jun 2009 08:08:50 +0200 Subject: block: rename CONFIG_LBD to CONFIG_LBDAF Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Jens Axboe --- block/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'block') diff --git a/block/Kconfig b/block/Kconfig index 2c39527aa7d..95a86adc33a 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -23,8 +23,8 @@ menuconfig BLOCK if BLOCK -config LBD - bool "Support for large block devices and files" +config LBDAF + bool "Support for large (2TB+) block devices and files" depends on !64BIT default y help -- cgit v1.2.3 From f740f5ca056f0a4eff3abdf272a8a4ba3965d57d Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 19 Jun 2009 09:18:32 +0200 Subject: Fix kernel-doc parameter name typo in blk-settings.c: Warning(block/blk-settings.c:108): No description found for parameter 'lim' Warning(block/blk-settings.c:108): Excess function parameter 'limits' description in 'blk_set_default_limits' Signed-off-by: Randy Dunlap Signed-off-by: Jens Axboe --- block/blk-settings.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'block') diff --git a/block/blk-settings.c b/block/blk-settings.c index 81bc0d1fdb5..bd582a7f531 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -97,7 +97,7 @@ EXPORT_SYMBOL_GPL(blk_queue_lld_busy); /** * blk_set_default_limits - reset limits to default values - * @limits: the queue_limits structure to reset + * @lim: the queue_limits structure to reset * * Description: * Returns a queue_limit struct to its default state. Can be used by -- cgit v1.2.3 From 111986593561fc4c94a1fba3f3cd84476fb40b22 Mon Sep 17 00:00:00 2001 From: FUJITA Tomonori Date: Wed, 17 Jun 2009 15:10:11 +0900 Subject: block: revert "bsg: setting rq->bio to NULL" The SMP handler (sas_smp_request) was fixed to use the block API properly, so we don't need this workaround to avoid blk_put_request() warning. Signed-off-by: FUJITA Tomonori Signed-off-by: James Bottomley --- block/bsg.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'block') diff --git a/block/bsg.c b/block/bsg.c index 54106f052f7..e7d47525424 100644 --- a/block/bsg.c +++ b/block/bsg.c @@ -315,7 +315,6 @@ out: blk_put_request(rq); if (next_rq) { blk_rq_unmap_user(next_rq->bio); - next_rq->bio = NULL; blk_put_request(next_rq); } return ERR_PTR(ret); @@ -449,7 +448,6 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr, hdr->dout_resid = rq->resid_len; hdr->din_resid = rq->next_rq->resid_len; blk_rq_unmap_user(bidi_bio); - rq->next_rq->bio = NULL; blk_put_request(rq->next_rq); } else if (rq_data_dir(rq) == READ) hdr->din_resid = rq->resid_len; @@ -468,7 +466,6 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr, blk_rq_unmap_user(bio); if (rq->cmd != rq->__cmd) kfree(rq->cmd); - rq->bio = NULL; blk_put_request(rq); return ret; -- cgit v1.2.3