aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2009-09-11scsi,block: update SCSI to handle mixed merge failuresTejun Heo
Update scsi_io_completion() such that it only fails requests till the next error boundary and retry the leftover. This enables block layer to merge requests with different failfast settings and still behave correctly on errors. Allow merge of requests of different failfast settings. As SCSI is currently the only subsystem which follows failfast status, there's no need to worry about other block drivers for now. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Niel Lambrechts <niel.lambrechts@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11block: implement mixed merge of different failfast requestsTejun Heo
Failfast has characteristics from other attributes. When issuing, executing and successuflly completing requests, failfast doesn't make any difference. It only affects how a request is handled on failure. Allowing requests with different failfast settings to be merged cause normal IOs to fail prematurely while not allowing has performance penalties as failfast is used for read aheads which are likely to be located near in-flight or to-be-issued normal IOs. This patch introduces the concept of 'mixed merge'. A request is a mixed merge if it is merge of segments which require different handling on failure. Currently the only mixable attributes are failfast ones (or lack thereof). When a bio with different failfast settings is added to an existing request or requests of different failfast settings are merged, the merged request is marked mixed. Each bio carries failfast settings and the request always tracks failfast state of the first bio. When the request fails, blk_rq_err_bytes() can be used to determine how many bytes can be safely failed without crossing into an area which requires further retrials. This allows request merging regardless of failfast settings while keeping the failure handling correct. This patch only implements mixed merge but doesn't enable it. The next one will update SCSI to make use of mixed merge. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Niel Lambrechts <niel.lambrechts@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11block: use the same failfast bits for bio and requestTejun Heo
bio and request use the same set of failfast bits. This patch makes the following changes to simplify things. * enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_* bits coincide with __REQ_FAILFAST_* bits. * The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV but the matching is useless anyway. init_request_from_bio() is responsible for setting FAILFAST bits on FS requests and non-FS requests never use BIO_RW_AHEAD. Drop the code and comment from blk_rq_bio_prep(). * Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and simplify FAILFAST flags handling in init_request_from_bio(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: add name to backing_dev_infoJens Axboe
This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-01block: Allow changing max_sectors_kb above the default 512Nikanth Karthikesan
The patch "block: Use accessor functions for queue limits" (ae03bf639a5027d27270123f5f6e3ee6a412781d) changed queue_max_sectors_store() to use blk_queue_max_sectors() instead of directly assigning the value. But blk_queue_max_sectors() differs a bit 1. It sets both max_sectors_kb, and max_hw_sectors_kb 2. Never allows one to change max_sectors_kb above BLK_DEF_MAX_SECTORS. If one specifies a value greater then max_hw_sectors is set to that value but max_sectors is set to BLK_DEF_MAX_SECTORS I am not sure whether blk_queue_max_sectors() should be changed, as it seems to be that way for a long time. And there may be callers dependent on that behaviour. This patch simply reverts to the older way of directly assigning the value to max_sectors as it was before. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-08-14Merge branch 'percpu-for-linus' into percpu-for-nextTejun Heo
Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-04Make SCSI SG v4 driver enabled by default and remove EXPERIMENTAL ↵John Stoffel
dependency, since udev depends on BSG Make Block Layer SG support v4 the default, since recent udev versions depend on this to access serial numbers and other low level info properly. This should be backported to older kernels as well, since most distros have enabled this for a long time. Signed-off-by: John Stoffel <john@stoffel.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-08-01block: Update topology documentationMartin K. Petersen
Update topology comments and sysfs documentation based upon discussions with Neil Brown. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-08-01block: Stack optimal I/O sizeMartin K. Petersen
When stacking block devices ensure that optimal I/O size is scaled accordingly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-08-01block: Add a wrapper for setting minimum request size without a queueMartin K. Petersen
Introduce blk_limits_io_min() and make blk_queue_io_min() call it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-08-01block: Make blk_queue_stack_limits use the new stacking interfaceMartin K. Petersen
blk_queue_stack_limits() has been superceded by blk_stack_limits() and disk_stack_limits(). Wrap the function call for now, we'll deprecate it later. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-28block: make the end_io functions be non-GPL exportsJens Axboe
Prior to the change for more sane end_io functions, we exported the helpers with the normal EXPORT_SYMBOL(). That got changed to _GPL() for the new interface. Revert that particular change, on the basis that this is basic functionality and doesn't dip into internal structures. If these exports can't be non-GPL, then we may as well make EXPORT_SYMBOL() imply GPL for everything. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-28block: fix improper kobject release in blk_integrity_unregisterXiaotian Feng
blk_integrity_unregister should use kobject_put to release the kobject, otherwise after bi is freed, memory of bi->kobj->name is leaked. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-28block: always assign default lock to queuesJens Axboe
Move the assignment of a default lock below blk_init_queue() to blk_queue_make_request(), so we also get to set the default lock for ->make_request_fn() based drivers. This is important since the queue flag locking requires a lock to be in place. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-17block: sysfs fix mismatched queue_var_{store,show} in 64bit kernelXiaotian Feng
In blk-sysfs.c, queue_var_store uses unsigned long to store data, but queue_var_show uses unsigned int to show data. This causes, # echo 70000000000 > /sys/block/<dev>/queue/read_ahead_kb # cat /sys/block/<dev>/queue/read_ahead_kb => get wrong value Fix it by using unsigned long. While at it, convert queue_rq_affinity_show() such that it uses bool variable instead of explicit != 0 testing. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2009-07-17block: fix failfast merge testing in elv_rq_merge_ok()Tejun Heo
Commit ab0fd1debe730ec9998678a0c53caefbd121ed10 tries to prevent merge of requests with different failfast settings. In elv_rq_merge_ok(), it compares new bio's failfast flags against the merge target request's. However, the flag testing accessors for bio and blk don't return boolean but the tested bit value directly and FAILFAST on bio and blk don't match, so directly comparing them with == results in false negative unnecessary preventing merge of readahead requests. This patch convert the results to boolean by negating them before comparison. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jeff Garzik <jeff@garzik.org>
2009-07-10cfq-iosched: reset oom_cfqq in cfq_set_request()Vivek Goyal
In case memory is scarce, we now default to oom_cfqq. Once memory is available again, we should allocate a new cfqq and stop using oom_cfqq for a particular io context. Once a new request comes in, check if we are using oom_cfqq, and if yes, try to allocate a new cfqq. Tested the patch by forcing the use of oom_cfqq and upon next request thread realized that it was using oom_cfqq and it allocated a new cfqq. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-10block: call blk_scsi_ioctl_init()FUJITA Tomonori
Currently, blk_scsi_ioctl_init() is not called since it lacks an initcall marking. This causes the command table to be unitialized, hence somce commands are block when they should not have been. This fixes a regression introduced by commit 018e0446890661504783f92388ecce7138c1566d Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-04Merge branch 'master' into for-nextTejun Heo
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix changes. As alpha in percpu tree uses 'weak' attribute instead of inline assembly, there's no need for __used attribute. Conflicts: arch/alpha/include/asm/percpu.h arch/mn10300/kernel/vmlinux.lds.S include/linux/percpu-defs.h
2009-07-03block: don't merge requests of different failfast settingsTejun Heo
Block layer used to merge requests and bios with different failfast settings. This caused regular IOs to fail prematurely when they were merged into failfast requests for readahead. Niel Lambrechts could trigger the problem semi-reliably on ext4 when resuming from STR. ext4 uses readahead when reading inodes and combined with the deterministic extra SATA PHY exception cycle during resume on the specific configuration, non-readahead inode read would fail causing ext4 errors. Please read the following thread for details. http://lkml.org/lkml/2009/5/23/21 This patch makes block layer reject merging if the failfast settings don't match. This is correct but likely to lower IO performance by preventing regular IOs from mingling into surrounding readahead requests. Changes to allow such mixed merges and handle errors correctly will be added later. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Niel Lambrechts <niel.lambrechts@gmail.com> Cc: Theodore Tso <tytso@mit.edu> Signed-off-by: Jens Axboe <axboe@carl.(none)>
2009-07-01cfq-iosched: remove redundant check for NULL cfqq in cfq_set_request()Shan Wei
With the changes for falling back to an oom_cfqq, we never fail to find/allocate a queue in cfq_get_queue(). So remove the check. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-01blocK: Restore barrier support for md and probably other virtual devices.NeilBrown
The next_ordered flag is only meaningful for devices that use __make_request. So move the test against next_ordered out of generic code and in to __make_request Since this test was added, barriers have not worked on md or any devices that don't use __make_request and so don't bother to set next_ordered. (dm explicitly sets something other than QUEUE_ORDERED_NONE since commit 99360b4c18f7675b50d283301d46d755affe75fd but notes in the comments that it is otherwise meaningless). Cc: Ken Milmore <ken.milmore@googlemail.com> Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-01block: get rid of queue-private command filterJens Axboe
The initial patches to support this through sysfs export were broken and have been if 0'ed out in any release. So lets just kill the code and reclaim some space in struct request_queue, if anyone would later like to fixup the sysfs bits, the git history can easily restore the removed bits. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-01block: Create bip slabs with embedded integrity vectorsMartin K. Petersen
This patch restores stacking ability to the block layer integrity infrastructure by creating a set of dedicated bip slabs. Each bip slab has an embedded bio_vec array at the end. This cuts down on memory allocations and also simplifies the code compared to the original bvec version. Only the largest bip slab is backed by a mempool. The pool is contained in the bio_set so stacking drivers can ensure forward progress. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.(none)>
2009-07-01cfq-iosched: get rid of the need for __GFP_NOFAIL in cfq_find_alloc_queue()Jens Axboe
Setup an emergency fallback cfqq that we allocate at IO scheduler init time. If the slab allocation fails in cfq_find_alloc_queue(), we'll just punt IO to that cfqq instead. This ensures that cfq_find_alloc_queue() never fails without having to ensure free memory. On cfqq lookup, always try to allocate a new cfqq if the given cfq io context has the oom_cfqq assigned. This ensures that we only temporarily punt to this shared queue. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-01cfq-iosched: move cfqq initialization out of cfq_find_alloc_queue()Jens Axboe
We're going to be needing that init code outside of that function to get rid of the __GFP_NOFAIL in cfqq allocation. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-24percpu: clean up percpu variable definitionsTejun Heo
Percpu variable definition is about to be updated such that all percpu symbols including the static ones must be unique. Update percpu variable definitions accordingly. * as,cfq: rename ioc_count uniquely * cpufreq: rename cpu_dbs_info uniquely * xen: move nesting_count out of xen_evtchn_do_upcall() and rename it * mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and rename it * ipv4,6: rename cookie_scratch uniquely * x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to pmc_irq_entry and nmi_entry to pmc_nmi_entry * perf_counter: rename disable_count to perf_disable_count * ftrace: rename test_event_disable to ftrace_test_event_disable * kmemleak: rename test_pointer to kmemleak_test_pointer * mce: rename next_interval to mce_next_interval [ Impact: percpu usage cleanups, no duplicate static percpu var names ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm <linux-mm@kvack.org> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andi Kleen <andi@firstfloor.org>
2009-06-21block: revert "bsg: setting rq->bio to NULL"FUJITA Tomonori
The SMP handler (sas_smp_request) was fixed to use the block API properly, so we don't need this workaround to avoid blk_put_request() warning. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-19Fix kernel-doc parameter name typo in blk-settings.c:Randy Dunlap
Warning(block/blk-settings.c:108): No description found for parameter 'lim' Warning(block/blk-settings.c:108): Excess function parameter 'limits' description in 'blk_set_default_limits' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-19block: rename CONFIG_LBD to CONFIG_LBDAFBartlomiej Zolnierkiewicz
Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-18block: Fix bounce_pfn settingMartin K. Petersen
Correct stacking bounce_pfn limit setting and prevent warnings on 32-bit. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (64 commits) debugfs: use specified mode to possibly mark files read/write only debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem. xen: remove driver_data direct access of struct device from more drivers usb: gadget: at91_udc: remove driver_data direct access of struct device uml: remove driver_data direct access of struct device block/ps3: remove driver_data direct access of struct device s390: remove driver_data direct access of struct device parport: remove driver_data direct access of struct device parisc: remove driver_data direct access of struct device of_serial: remove driver_data direct access of struct device mips: remove driver_data direct access of struct device ipmi: remove driver_data direct access of struct device infiniband: ehca: remove driver_data direct access of struct device ibmvscsi: gadget: at91_udc: remove driver_data direct access of struct device hvcs: remove driver_data direct access of struct device xen block: remove driver_data direct access of struct device thermal: remove driver_data direct access of struct device scsi: remove driver_data direct access of struct device pcmcia: remove driver_data direct access of struct device PCIE: remove driver_data direct access of struct device ... Manually fix up trivial conflicts due to different direct driver_data direct access fixups in drivers/block/{ps3disk.c,ps3vram.c}
2009-06-16block: remove some includings of blktrace_api.hLi Zefan
When porting blktrace to tracepoints, we changed to trace/block.h for trace prober declarations. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-16block: Introduce helper to reset queue limits to default valuesMartin K. Petersen
DM reuses the request queue when swapping in a new device table Introduce blk_set_default_limits() which can be used to reset the the queue_limits prior to stacking devices. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-16cfq: remove extraneous '\n' in blktrace outputJeff Moyer
I noticed a blank line in blktrace output. This patch fixes that. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-16block: don't overwrite bdi->state after bdi_init() has been runJens Axboe
Move the defaults to where we do the init of the backing_dev_info. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-16cfq: cleanup for last_end_request in cfq_dataGui Jianfeng
Actually, last_end_request in cfq_data isn't used now. So lets just remove it. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-15Driver Core: bsg: add nodename for bsg driverKay Sievers
This adds support to the BSG driver to report the proper device name to userspace for the bsg devices. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-15Driver Core: block: add nodename support for block drivers.Kay Sievers
This adds support for block drivers to report their requested nodename to userspace. It also updates a number of block drivers to provide the needed subdirectory and device name to be used for them. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11block: fix kernel-doc in recent block/ changesRandy Dunlap
Fix kernel-doc warnings in recently changed block/ source code. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c
2009-06-11Merge branch 'for-2.6.31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (28 commits) ide-tape: fix debug call alim15x3: Remove historical hacks, re-enable init_hwif for PowerPC ide-dma: don't reset request fields on dma_timeout_retry() ide: drop rq->data handling from ide_map_sg() ide-atapi: kill unused fields and callbacks ide-tape: simplify read/write functions ide-tape: use byte size instead of sectors on rw issue functions ide-tape: unify r/w init paths ide-tape: kill idetape_bh ide-tape: use standard data transfer mechanism ide-tape: use single continuous buffer ide-atapi,tape,floppy: allow ->pc_callback() to change rq->data_len ide-tape,floppy: fix failed command completion after request sense ide-pm: don't abuse rq->data ide-cd,atapi: use bio for internal commands ide-atapi: convert ide-{floppy,tape} to using preallocated sense buffer ide-cd: convert to using generic sense request ide: add helpers for preparing sense requests ide-cd: don't abuse rq->buffer ide-atapi: don't abuse rq->buffer ...
2009-06-11block: add request clone interface (v2)Kiyoshi Ueda
This patch adds the following 2 interfaces for request-stacking drivers: - blk_rq_prep_clone(struct request *clone, struct request *orig, struct bio_set *bs, gfp_t gfp_mask, int (*bio_ctr)(struct bio *, struct bio*, void *), void *data) * Clones bios in the original request to the clone request (bio_ctr is called for each cloned bios.) * Copies attributes of the original request to the clone request. The actual data parts (e.g. ->cmd, ->buffer, ->sense) are not copied. - blk_rq_unprep_clone(struct request *clone) * Frees cloned bios from the clone request. Request stacking drivers (e.g. request-based dm) need to make a clone request for a submitted request and dispatch it to other devices. To allocate request for the clone, request stacking drivers may not be able to use blk_get_request() because the allocation may be done in an irq-disabled context. So blk_rq_prep_clone() takes a request allocated by the caller as an argument. For each clone bio in the clone request, request stacking drivers should be able to set up their own completion handler. So blk_rq_prep_clone() takes a callback function which is called for each clone bio, and a pointer for private data which is passed to the callback. NOTE: blk_rq_prep_clone() doesn't copy any actual data of the original request. Pages are shared between original bios and cloned bios. So caller must not complete the original request before the clone request. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-10Merge branch 'tracing-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...
2009-06-10block: prevent possible io_context->refcount overflowNikanth Karthikesan
Currently io_context has an atomic_t(32-bit) as refcount. In the case of cfq, for each device against whcih a task does I/O, a reference to the io_context would be taken. And when there are multiple process sharing io_contexts(CLONE_IO) would also have a reference to the same io_context. Theoretically the possible maximum number of processes sharing the same io_context + the number of disks/cfq_data referring to the same io_context can overflow the 32-bit counter on a very high-end machine. Even though it is an improbable case, let us make it atomic_long_t. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-09tracing/events: convert block trace points to TRACE_EVENT()Li Zefan
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions ... Cons: - no dev_t info for the output of plug, unplug_timer and unplug_io events. no dev_t info for getrq and sleeprq events if bio == NULL. no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL. This is mainly because we can't get the deivce from a request queue. But this may change in the future. - A packet command is converted to a string in TP_assign, not TP_print. While blktrace do the convertion just before output. Since pc requests should be rather rare, this is not a big issue. - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT has a unique format, which means we have some unused data in a trace entry. The overhead is minimized by using __dynamic_array() instead of __array(). I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing: dd dd + ioctl blktrace dd + TRACE_EVENT (splice) 1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s 2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s 3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s So the overhead of tracing is very small, and no regression when using those trace events vs blktrace. And the binary output of TRACE_EVENT is much smaller than blktrace: # ls -l -h -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out Following are some comparisons between TRACE_EVENT and blktrace: plug: kjournald-480 [000] 303.084981: block_plug: [kjournald] kjournald-480 [000] 303.084981: 8,0 P N [kjournald] unplug_io: kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1 kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1 remap: kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384 kjournald-480 [000] 303.085043: 8,0 A W 102736992 + 8 <- (8,8) 33384 bio_backmerge: kjournald-480 [000] 303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald] kjournald-480 [000] 303.085086: 8,0 M W 102737032 + 8 [kjournald] getrq: kjournald-480 [000] 303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald] kjournald-480 [000] 303.084975: 8,0 G W 102736984 + 8 [kjournald] bash-2066 [001] 1072.953770: 8,0 G N [bash] bash-2066 [001] 1072.953773: block_getrq: 0,0 N 0 + 0 [bash] rq_complete: konsole-2065 [001] 300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0] konsole-2065 [001] 300.053191: 8,0 C W 103669040 + 16 [0] ksoftirqd/1-7 [001] 1072.953811: 8,0 C N (5a 00 08 00 00 00 00 00 24 00) [0] ksoftirqd/1-7 [001] 1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0] rq_insert: kjournald-480 [000] 303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald] kjournald-480 [000] 303.084986: 8,0 I W 102736984 + 8 [kjournald] Changelog from v2 -> v3: - use the newly introduced __dynamic_array(). Changelog from v1 -> v2: - use __string() instead of __array() to minimize the memory required to store hex dump of rq->cmd(). - support large pc requests. - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT. - some cleanups. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-09bsg: setting rq->bio to NULLFUJITA Tomonori
Due to commit 1cd96c242a829d52f7a5ae98f554ca9775429685 ("block: WARN in __blk_put_request() for potential bio leak"), BSG SMP requests get the false warnings: WARNING: at block/blk-core.c:1068 __blk_put_request+0x52/0xc0() This sets rq->bio to NULL to avoid that false warnings. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-09block: Add missing bounce_pfn stacking and fix commentsMartin K. Petersen
DM no longer needs to set limits explicitly when calling blk_stack_limits. Let the latter automatically deal with bounce_pfn scaling. Fix kerneldoc variable names. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-09Revert "block: Fix bounce limit setting in DM"Jens Axboe
This reverts commit a05c0205ba031c01bba33a21bf0a35920eb64833. DM doesn't need to access the bounce_pfn directly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-09block: needs to set the residual length of a bidi requestFUJITA Tomonori
Tejun's "block: set rq->resid_len to blk_rq_bytes() on issue" patch seems to be incomplete; It doesn't set rq->resid_len to blk_rq_bytes() for a bidi request (req->next_rq). As a result, all bidi users are broken. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>