aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2008-04-21block: fix blk_register_queue() return valueAkinobu Mita
blk_register_queue() returns -ENXIO when queue->request_fn is NULL. But there are some block drivers that call blk_register_queue() via add_disk() with queue->request_fn == NULL. (For example, brd, loop) Although no one checks return value of blk_register_queue(), this patch makes it return 0 instead of -ENXIO when queue->request_fn is NULL, Also this patch adds warning when blk_register_queue() and blk_unregister_queue() are called with queue == NULL rather than ignore invalid usage silently. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21block: fix memory hotplug and bouncing in block layerAndi Kleen
Only noticed this while hacking something else, no test case. blk_max_low_pfn is initialized once at bootup by the block layer from max_low_pfn. But max_low_pfn is not necessarily constant over the runtime of the system when you consider memory hotplug. What could happen if that someone adds memory later the block layer wouldn't get updated and then start bouncing memory unnecessarily. Also on 64bit blk_max_low_pfn actually isn't needed because it just disables bouncing essentially and there is no highmem. And nobody can pass pfns > max_low_pfn to the block layer, because those wouldn't have a struct page and I suspect block layer wouldn't be very happy without that. So set BLK_BOUNCE_HIGH to infinity (-1ULL) on 64bit. That avoids the problem of having to update it on memory hotadd. On 32bit I kept the same behaviour because at least on i386 memory hotadd only adds HIGHMEM, never lowmem. BLK_BOUNCE_ANY is always set to infinity on both 32 and 64bit. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21block: replace remaining __FUNCTION__ occurrencesHarvey Harrison
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21Kconfig: clean up block/Kconfig help descriptionsNick Andrew
Modify the help descriptions of block/Kconfig for clarity, accuracy and consistency. Refactor the BLOCK description a bit. The wording "This permits ... to be removed" isn't quite right; the block layer is removed when the option is disabled, whereas most descriptions talk about what happens when the option is enabled. Reformat the list of what is affected by disabling the block layer. Add more examples of large block devices to LBD and strive for technical accuracy; block devices of size _exactly_ 2TB require CONFIG_LBD, not only "bigger than 2TB". Also try to say (perhaps not very clearly) that the config option is only needed when you want to have individual block devices of size >= 2TB, for example if you had 3 x 1TB disks in your computer you'd have a total storage size of 3TB but you wouldn't need the option unless you want to aggregate those disks into a RAID or LVM. Improve terminology and grammar on BLK_DEV_IO_TRACE. I also added the boilerplate "If unsure, say N" to most options. Precisely say "2TB and larger" for LSF. Indent the help text for BLK_DEV_BSG by 2 spaces in accordance with the standard. Signed-off-by: Nick Andrew <nick@nick-andrew.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21cciss: fix warning oops on rmmod of driverscameron@beardog.cca.cpqcorp.net
* Fix oops on cciss rmmod due to calling pci_free_consistent with irqs disabled. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21cciss: Fix race between disk-adding code and interrupt handlerscameron@beardog.cca.cpqcorp.net
Fix race condition between cciss_init_one(), cciss_update_drive_info(), and cciss_check_queues(). Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21block: move the padding adjustment to blk_rq_map_sgFUJITA Tomonori
blk_rq_map_user adjusts bi_size of the last bio. It breaks the rule that req->data_len (the true data length) is equal to sum(bio). It broke the scsi command completion code. commit e97a294ef6938512b655b1abf17656cf2b26f709 was introduced to fix the above issue. However, the partial completion code doesn't work with it. The commit is also a layer violation (scsi mid-layer should not know about the block layer's padding). This patch moves the padding adjustment to blk_rq_map_sg (suggested by James). The padding works like the drain buffer. This patch breaks the rule that req->data_len is equal to sum(sg), however, the drain buffer already broke it. So this patch just restores the rule that req->data_len is equal to sub(bio) without breaking anything new. Now when a low level driver needs padding, blk_rq_map_user and blk_rq_map_user_iov guarantee there's enough room for padding. blk_rq_map_sg can safely extend the last entry of a scatter list. blk_rq_map_sg must extend the last entry of a scatter list only for a request that got through bio_copy_user_iov. This patches introduces new REQ_COPY_USER flag. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21block: add bio_copy_user_iov support to blk_rq_map_user_iovFUJITA Tomonori
With this patch, blk_rq_map_user_iov uses bio_copy_user_iov when a low level driver needs padding or a buffer in sg_iovec isn't aligned. That is, it uses temporary kernel buffers instead of mapping user pages directly. When a LLD needs padding, later blk_rq_map_sg needs to extend the last entry of a scatter list. bio_copy_user_iov guarantees that there is enough space for padding by using temporary kernel buffers instead of user pages. blk_rq_map_user_iov needs buffers in sg_iovec to be aligned. The comment in blk_rq_map_user_iov indicates that drivers/scsi/sg.c also needs buffers in sg_iovec to be aligned. Actually, drivers/scsi/sg.c works with unaligned buffers in sg_iovec (it always uses temporary kernel buffers). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21block: convert bio_copy_user to bio_copy_user_iovFUJITA Tomonori
This patch enables bio_copy_user to take struct sg_iovec (renamed bio_copy_user_iov). bio_copy_user uses bio_copy_user_iov internally as bio_map_user uses bio_map_user_iov. The major changes are: - adds sg_iovec array to struct bio_map_data - adds __bio_copy_iov that copy data between bio and sg_iovec. bio_copy_user_iov and bio_uncopy_user use it. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21loop: manage partitions in disk imageLaurent Vivier
This patch allows to use loop device with partitionned disk image. Original behavior of loop is not modified. A new parameter is introduced to define how many partition we want to be able to manage per loop device. This parameter is "max_part". For instance, to manage 63 partitions / loop device, we will do: # modprobe loop max_part=63 # ls -l /dev/loop?* brw-rw---- 1 root disk 7, 0 2008-03-05 14:55 /dev/loop0 brw-rw---- 1 root disk 7, 64 2008-03-05 14:55 /dev/loop1 brw-rw---- 1 root disk 7, 128 2008-03-05 14:55 /dev/loop2 brw-rw---- 1 root disk 7, 192 2008-03-05 14:55 /dev/loop3 brw-rw---- 1 root disk 7, 256 2008-03-05 14:55 /dev/loop4 brw-rw---- 1 root disk 7, 320 2008-03-05 14:55 /dev/loop5 brw-rw---- 1 root disk 7, 384 2008-03-05 14:55 /dev/loop6 brw-rw---- 1 root disk 7, 448 2008-03-05 14:55 /dev/loop7 And to attach a raw partitionned disk image, the original losetup is used: # losetup -f etch.img # ls -l /dev/loop?* brw-rw---- 1 root disk 7, 0 2008-03-05 14:55 /dev/loop0 brw-rw---- 1 root disk 7, 1 2008-03-05 14:57 /dev/loop0p1 brw-rw---- 1 root disk 7, 2 2008-03-05 14:57 /dev/loop0p2 brw-rw---- 1 root disk 7, 5 2008-03-05 14:57 /dev/loop0p5 brw-rw---- 1 root disk 7, 64 2008-03-05 14:55 /dev/loop1 brw-rw---- 1 root disk 7, 128 2008-03-05 14:55 /dev/loop2 brw-rw---- 1 root disk 7, 192 2008-03-05 14:55 /dev/loop3 brw-rw---- 1 root disk 7, 256 2008-03-05 14:55 /dev/loop4 brw-rw---- 1 root disk 7, 320 2008-03-05 14:55 /dev/loop5 brw-rw---- 1 root disk 7, 384 2008-03-05 14:55 /dev/loop6 brw-rw---- 1 root disk 7, 448 2008-03-05 14:55 /dev/loop7 # mount /dev/loop0p1 /mnt # ls /mnt bench cdrom home lib mnt root srv usr bin dev initrd lost+found opt sbin sys var boot etc initrd.img media proc selinux tmp vmlinuz # umount /mnt # losetup -d /dev/loop0 Of course, the same behavior can be done using kpartx on a loop device, but modifying loop avoids to stack several layers of block device (loop + device mapper), this is a very light modification (40% of modifications are to manage the new parameter). Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21cdrom: use kmalloced buffers instead of buffers on stackThomas Bogendoerfer
If cdrom commands are issued to a scsi drive in most cases the buffer will be filled via dma. This leads to bad stack corruption on non coherent platforms, because the buffers are neither cache line aligned nor is the size a multiple of the cache line size. Using kmalloced buffers avoids this. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21cdrom: make unregister_cdrom() return voidAkinobu Mita
Now unregister_cdrom() always returns 0. Make it return void and update all callers that check the return value. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk> Cc: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21cdrom: use list_head for cdrom_device_info listAkinobu Mita
Use list_head for cdrom_device_info list instead of opencoded singly list handling. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21cdrom: protect cdrom_device_info list by mutexAkinobu Mita
This patch protects the list of cdrom_device_info by cdrom_mutex when the file in /proc/sys/dev/cdrom/ is written. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21cdrom: cleanup hardcoded error-codeAkinobu Mita
This patch eliminates hardcoded return value of register_cdrom(). It also changes the return value to -EINVAL. It is more appropriate than -2 (-ENOENT) because it is only happen invalid usage of register_cdrom() by broken cdrom driver. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21cdrom: remove ifdef CONFIG_SYSCTLAkinobu Mita
This patch removes #ifdef for CONFIG_SYSCTL by defining empty cdrom_sysctl_register and cdrom_sysctl_unregister when CONFIG_SYSCTL is not defined. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-21hrtimer: optimize the softirq time optimizationThomas Gleixner
The previous optimization did not take the case into account where a clock provides its own softirq_get_time() function. Check for the availablitiy of the clock get time function first and then check if we need to retrieve the time for both clocks via hrtimer_softirq_gettime() to avoid a double evaluation of time in that case as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-21hrtimer: reduce calls to hrtimer_get_softirq_time()Dimitri Sivanich
It seems that hrtimer_run_queues() is calling hrtimer_get_softirq_time() more often than it needs to. This can cause frequent contention on systems with large numbers of processors/cores. With this patch, hrtimer_run_queues only calls hrtimer_get_softirq_time() if there is a pending timer in one of the hrtimer bases, and only once. This also combines hrtimer_run_queues() and the inline run_hrtimer_queue() into one function. [ tglx@linutronix.de: coding style ] Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-21clockevents: fix typo in tick-broadcast.cGlauber Costa
braodcast -> broadcast Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-21jiffies: add time_is_after_jiffies and others which compare with jiffiesDave Young
Most of time_after like macros usages just compare jiffies and another number, so here add some time_is_* macros for convenience. Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-20PCI: Change PCI subsystem MAINTAINERGreg Kroah-Hartman
Jesse foolishly volunteered to handle PCI patches. Update MAINTAINERS to reflect this. Woho! Time to kick back and relax... Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: pci-iommu-iotlb-flushing-speedupmark gross
The following patch is an update to use an array instead of a list of IOVA's in the implementation of defered iotlb flushes. It takes inspiration from sba_iommu.c I like this implementation better as it encapsulates the batch process within intel-iommu.c, and no longer touches iova.h (which is shared) Performance data: Netperf 32byte UDP streaming 2.6.25-rc3-mm1: IOMMU-strict : 58Mps @ 62% cpu NO-IOMMU : 71Mbs @ 41% cpu List-based IOMMU-default-batched-IOTLB flush: 66Mbps @ 57% cpu with this patch: IOMMU-strict : 73Mps @ 75% cpu NO-IOMMU : 74Mbs @ 42% cpu Array-based IOMMU-default-batched-IOTLB flush: 72Mbps @ 62% cpu Signed-off-by: <mgross@linux.intel.com> Cc: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: pci_setup_bridge() mustn't be __devinitAdrian Bunk
WARNING: drivers/pci/built-in.o(.text+0x28ee9): Section mismatch in reference from the function pci_bus_assign_resources() to the function .devinit.text:pci_setup_bridge() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: pci_bus_size_cardbus() mustn't be __devinitAdrian Bunk
WARNING: drivers/pci/built-in.o(.text+0x28e1f): Section mismatch in reference from the function pci_bus_size_bridges() to the function .devinit.text:pci_bus_size_cardbus() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: pci_scan_device() mustn't be __devinitAdrian Bunk
WARNING: drivers/pci/built-in.o(.text+0x150f): Section mismatch in reference from the function pci_scan_single_device() to the function .devinit.text:pci_scan_device() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: pci_alloc_child_bus() mustn't be __devinitAdrian Bunk
WARNING: drivers/pci/built-in.o(.text+0xc4c): Section mismatch in reference from the function pci_add_new_bus() to the function .devinit.text:pci_alloc_child_bus() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: replace remaining __FUNCTION__ occurrencesHarvey Harrison
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: Hotplug: fakephp: Return success, not ENODEV, when bus rescan is triggeredTrent Piepho
The 'power' attribute of the fakephp driver originally only let one turn a slot off. If one tried to turn a slot on (echo 1 > .../power), it would return ENODEV, as fakephp did not support this function. An old (pre-git) patch changed this: 2004/11/11 16:33:31-08:00 jdittmer [PATCH] fakephp: add pci bus rescan ability http://article.gmane.org/gmane.linux.kernel/251183 Now writing "1" to the power attribute has the effect of triggering a bus rescan, but it still returns ENODEV, probably an oversight in the above patch. Using the BusyBox echo will not produce an error message, but will trigger *two* bus rescans (and return an exit code of 1): ~ # strace echo -n 1 > /sys/bus/pci/slots/0000:00:00.0/power ... write(1, "1", 1) = -1 ENODEV (No such device) write(1, "1", 1) = -1 ENODEV (No such device) exit(1) = ? Using cp gives a write error, even though the write did happen and a rescan was triggered: ~ # echo -n 1 > tmp ; cp tmp /sys/bus/pci/slots/0000:00:00.0/power cp: Write Error: No such device It seems much better to return success instead of failure. The actual status of the bus rescan is hard to return. It happens asynchronously in a work thread, so the sysfs store functions returns before any status is ready (the whole point of the work queue). And even if it didn't do this, the rescan doesn't have any clear status to return. Signed-off-by: Trent Piepho <tpiepho@freescale.com> CC: Jan Dittmer <jdittmer@ppp0.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: Hotplug: Fix leaks in IBM Hot Plug Controller Driver - ibmphp_init_devno()Jesper Juhl
In drivers/pci/hotplug/ibmphp_core.c::ibmphp_init_devno() we allocate space dynamically for a PCI irq routing table by calling pcibios_get_irq_routing_table(), but we never free the allocated space. This patch frees the allocated space at the function exit points. Spotted by the Coverity checker. Compile tested only. Please consider applying. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: clean up resource alignment managementIvan Kokshaysky
Done per Linus' request and suggestions. Linus has explained that better than I'll be able to explain: On Thu, Mar 27, 2008 at 10:12:10AM -0700, Linus Torvalds wrote: > Actually, before we go any further, there might be a less intrusive > alternative: add just a couple of flags to the resource flags field (we > still have something like 8 unused bits on 32-bit), and use those to > implement a generic "resource_alignment()" routine. > > Two flags would do it: > > - IORESOURCE_SIZEALIGN: size indicates alignment (regular PCI device > resources) > > - IORESOURCE_STARTALIGN: start field is alignment (PCI bus resources > during probing) > > and then the case of both flags zero (or both bits set) would actually be > "invalid", and we would also clear the IORESOURCE_STARTALIGN flag when we > actually allocate the resource (so that we don't use the "start" field as > alignment incorrectly when it no longer indicates alignment). > > That wouldn't be totally generic, but it would have the nice property of > automatically at least add sanity checking for that whole "res->start has > the odd meaning of 'alignment' during probing" and remove the need for a > new field, and it would allow us to have a generic "resource_alignment()" > routine that just gets a resource pointer. Besides, I removed IORESOURCE_BUS_HAS_VGA flag which was unused for ages. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: aerdrv_acpi.c: remove unneeded NULL checkAdrian Bunk
There's no reason for checking pdev->bus for being NULL here (and we'd anyway Oops 3 lines below if it was). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: Update VIA CX700 quirkTim Yamin
This follows up 53a9bf4267b8b1f958dbeb7c8c1ef21c82229b71. Some newer CX700 BIOSes from our vendor have PCI Bus Parking disabled but PCI Master read caching enabled. This creates problems such as system freezing when both the network controller and the USB controller are active and one of them is pretty busy (e.g. heavy network traffic). This patch separates the checks and both the bus parking and the read caching are disabled independently if either is enabled by the BIOS. Signed-off-by: Tim Yamin <tim.yamin@zonbu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: Expose PCI VPD through sysfsBen Hutchings
Vital Product Data (VPD) may be exposed by PCI devices in several ways. It is generally unsafe to read this information through the existing interfaces to user-land because of stateful interfaces. This adds: - abstract operations for VPD access (struct pci_vpd_ops) - VPD state information in struct pci_dev (struct pci_vpd) - an implementation of the VPD access method specified in PCI 2.2 (in access.c) - a 'vpd' binary file in sysfs directories for PCI devices with VPD operations defined It adds a probe for PCI 2.2 VPD in pci_scan_device() and release of VPD state in pci_release_dev(). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: iommu: iotlb flushingmark gross
This patch is for batching up the flushing of the IOTLB for the DMAR implementation found in the Intel VT-d hardware. It works by building a list of to be flushed IOTLB entries and a bitmap list of which DMAR engine they are from. After either a high water mark (250 accessible via debugfs) or 10ms the list of iova's will be reclaimed and the DMAR engines associated are IOTLB-flushed. This approach recovers 15 to 20% of the performance lost when using the IOMMU for my netperf udp stream benchmark with small packets. It can be disabled with a kernel boot parameter "intel_iommu=strict". Its use does weaken the IOMMU protections a bit. Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: simplify quirk debug outputBjorn Helgaas
print_fn_descriptor_symbol() prints the address if we don't have a symbol, so no need to print both. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: iova RB tree setup tweakmark gross
The following patch merges two functions into one allowing for a 3% reduction in overhead in locating, allocating and inserting pages for use in IOMMU operations. Its a bit of a eye-crosser so I welcome any RB-tree / MM experts to take a look. It works by re-using some of the information gathered in the search for the pages to use in setting up the IOTLB's in the insertion of the iova structure into the RB tree. Signed-off-by: <mgross@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: parisc: use generic pci_enable_resources()Bjorn Helgaas
Use the generic pci_enable_resources() instead of the arch-specific code. Unlike this arch-specific code, the generic version: - checks PCI_NUM_RESOURCES (11), not DEVICE_COUNT_RESOURCE (12), resources - skips resources that have neither IORESOURCE_IO nor IORESOURCE_MEM set - skips ROM resources unless IORESOURCE_ROM_ENABLE is set - checks for resource collisions with "!r->parent" Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: ppc: use generic pci_enable_resources()Bjorn Helgaas
Use the generic pci_enable_resources() instead of the arch-specific code. Unlike this arch-specific code, the generic version: - checks PCI_NUM_RESOURCES (11), not 6, resources - skips resources that have neither IORESOURCE_IO nor IORESOURCE_MEM set - skips ROM resources unless IORESOURCE_ROM_ENABLE is set - checks for resource collisions with "!r->parent", not IORESOURCE_UNSET Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: powerpc: use generic pci_enable_resources()Bjorn Helgaas
Use the generic pci_enable_resources() instead of the arch-specific code. The generic version is functionally equivalent, but uses dev_printk. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: ia64: use generic pci_enable_resources()Bjorn Helgaas
Use the generic pci_enable_resources() instead of the arch-specific code. Unlike this arch-specific code, the generic version: - does not check for a NULL dev pointer - skips resources that have neither IORESOURCE_IO nor IORESOURCE_MEM set Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: alpha: use generic pci_enable_resources()Bjorn Helgaas
Use the generic pci_enable_resources() instead of the arch-specific code. Unlike this arch-specific code, the generic version: - skips resources unless requested in "mask" - skips ROM resources unless IORESOURCE_ROM_ENABLE is set - checks for resource collisions with "!r->parent" Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: x86: use generic pci_enable_resources()Bjorn Helgaas
Use the generic pci_enable_resources() instead of the arch-specific code. Unlike this arch-specific code, the generic version: - checks for resource collisions with "!r->parent" Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: add generic pci_enable_resources()Bjorn Helgaas
Each architecture has its own pcibios_enable_resources() implementation. These differ in many minor ways that have nothing to do with actual architectural differences. Follow-on patches will make most arches use this generic version instead. This version is based on powerpc, which seemed most up-to-date. The only functional difference from the x86 version is that this uses "!r->parent" to check for resource collisions instead of "!r->start && r->end". Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: add PCI Express ASPM supportShaohua Li
PCI Express ASPM defines a protocol for PCI Express components in the D0 state to reduce Link power by placing their Links into a low power state and instructing the other end of the Link to do likewise. This capability allows hardware-autonomous, dynamic Link power reduction beyond what is achievable by software-only controlled power management. However, The device should be configured by software appropriately. Enabling ASPM will save power, but will introduce device latency. This patch adds ASPM support in Linux. It introduces a global policy for ASPM, a sysfs file /sys/module/pcie_aspm/parameters/policy can control it. The interface can be used as a boot option too. Currently we have below setting: -default, BIOS default setting -powersave, highest power saving mode, enable all available ASPM state and clock power management -performance, highest performance, disable ASPM and clock power management By default, the 'default' policy is used currently. In my test, power difference between powersave mode and performance mode is about 1.3w in a system with 3 PCIE links. Note: some devices might not work well with aspm, either because chipset issue or device issue. The patch provide API (pci_disable_link_state), driver can disable ASPM for specific device. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: remove "pci=routeirq" noise from dmesgBjorn Helgaas
The "pci=routeirq" option was added in 2004, and I don't get any valid reports anymore. The option is still mentioned in kernel-parameters.txt. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: Include PCI domain in PCI bus names on x86/x86_64Gary Hade
The PCI bus names included in /proc/iomem and /proc/ioports are of the form 'PCI Bus #XX' where XX is the bus number. This patch changes the naming to 'PCI Bus XXXX:YY' where XXXX is the domain number and YY is the bus number. For example, PCI bus 14 in domain 0 will show as 'PCI Bus 0000:14' instead of 'PCI Bus #14'. This change makes the naming consistent with other architectures such as ia64 where multiple PCI domain support has been around longer. Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: #if 0 pci_cleanup_aer_correct_error_status()Adrian Bunk
#if 0 the no longer used pci_cleanup_aer_correct_error_status(). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: pcie AER: don't check _OSC when acpi is disabledYinghai Lu
[PATCH] pcie AER: don't check _OSC when acpi is disabled when acpi=off or pci=noacpi, get warning AER service couldn't init device 0000:00:0a.0:pcie01 - no _OSC support AER service couldn't init device 0000:00:0e.0:pcie01 - no _OSC support AER service couldn't init device 0000:00:0f.0:pcie01 - no _OSC support AER service couldn't init device 0000:80:0b.0:pcie01 - no _OSC support AER service couldn't init device 0000:80:0e.0:pcie01 - no _OSC support AER service couldn't init device 0000:80:0f.0:pcie01 - no _OSC support so don't check _OSC in aer_osc_setup Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: remove global list of PCI devicesGreg Kroah-Hartman
This patch finally removes the global list of PCI devices. We are relying entirely on the list held in the driver core now, and do not need a separate "shadow" list as no one uses it. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20PCI: remove parisc consumer of the pci global_listJames Bottomley
Remove the parisc usage of the global_list, as it's not needed anymore. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>