aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2007-10-23Remove old lguest I/O infrrasructure.Rusty Russell
This patch gets rid of the old lguest host I/O infrastructure and replaces it with a single hypercall "LHCALL_NOTIFY" which takes an address. The main change is the removal of io.c: that mainly did inter-guest I/O, which virtio doesn't yet support. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Remove old lguest bus and drivers.Rusty Russell
This gets rid of the lguest bus, drivers and DMA mechanism, to make way for a generic virtio mechanism. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Virtio helper routines for a descriptor ringbuffer implementationRusty Russell
These helper routines supply most of the virtqueue_ops for hypervisors which want to use a ring for virtio. Unlike the previous lguest implementation: 1) The rings are variable sized (2^n-1 elements). 2) They have an unfortunate limit of 65535 bytes per sg element. 3) The page numbers are always 64 bit (PAE anyone?) 4) They no longer place used[] on a separate page, just a separate cacheline. 5) We do a modulo on a variable. We could be tricky if we cared. 6) Interrupts and notifies are suppressed using flags within the rings. Users need only get the ring pages and provide a notify hook (KVM wants the guest to allocate the rings, lguest does it sanely). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dor Laor <dor.laor@qumranet.com>
2007-10-23Virtio console driverRusty Russell
This is an hvc-based virtio console driver. It's suboptimal becuase hvc expects to have raw access to interrupts and virtio doesn't assume that, so it currently polls. There are two solutions: expose hvc's "kick" interface, or wean off hvc. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Block driver using virtio.Rusty Russell
The block driver uses scatter-gather lists with sg[0] being the request information (struct virtio_blk_outhdr) with the type, sector and inbuf id. The next N sg entries are the bio itself, then the last sg is the status byte. Whether the N entries are in or out depends on whether it's a read or a write. We accept the normal (SCSI) ioctls: they get handed through to the other side which can then handle it or reply that it's unsupported. It's not clear that this actually works in general, since I don't know if blk_pc_request() requests have an accurate rq_data_dir(). Although we try to reply -ENOTTY on unsupported commands, ioctl(fd, CDROMEJECT) returns success to userspace. This needs a separate patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <jens.axboe@oracle.com>
2007-10-23Net driver using virtioRusty Russell
The network driver uses two virtqueues: one for input packets and one for output packets. This has nice locking properties (ie. we don't do any for recv vs send). TODO: 1) Big packets. 2) Multi-client devices (maybe separate driver?). 3) Resolve freeing of old xmit skbs (Christian Borntraeger) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: netdev@vger.kernel.org
2007-10-23Virtio interfaceRusty Russell
This attempts to implement a "virtual I/O" layer which should allow common drivers to be efficiently used across most virtual I/O mechanisms. It will no-doubt need further enhancement. The virtio drivers add buffers to virtio queues; as the buffers are consumed the driver "interrupt" callbacks are invoked. There is also a generic implementation of config space which drivers can query to get setup information from the host. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dor Laor <dor.laor@qumranet.com> Cc: Arnd Bergmann <arnd@arndb.de>
2007-10-23Boot with virtual == physical to get closer to native Linux.Rusty Russell
1) This allows us to get alot closer to booting bzImages. 2) It means we don't have to know page_offset. 3) The Guest needs to modify the boot pagetables to create the PAGE_OFFSET mapping before jumping to C code. 4) guest_pa() walks the page tables rather than using page_offset. 5) We don't use page_offset to figure out whether to emulate: it was always kinda quesationable, and won't work for instructions done before remapping (bzImage unpacking in particular). 6) We still want the kernel address for tlb flushing: have the initial hypercall give us that, too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Allow guest to specify syscall vector to use.Rusty Russell
(Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch). This patch allows Guests to specify what system call vector they want, and we try to reserve it. We only allow one non-Linux system call vector, to try to avoid DoS on the Host. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23lguest.h declares a struct timespec, make it include linux/time.hJes Sorensen
Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Make hypercalls arch-independent.Jes Sorensen
Clean up the hypercall code to make the code in hypercalls.c architecture independent. First process the common hypercalls and then call lguest_arch_do_hcall() if the call hasn't been handled. Rename struct hcall_ring to hcall_args. This patch requires the previous patch which reorganize the layout of struct lguest_regs on i386 so they match the layout of struct hcall_args. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Reorder guest saved regs to match hyperall orderJes Sorensen
Move eax next to ebx/ecx/edx in struct lguest_regs on i386, so they will be located together and allow it to map directly to a struct hcall_ring entry (which will be renamed struct hcall_args as in a subsequent patch). This is in preparation for making the code hcall code architecture independent. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Move i386 part of core.c to x86/core.c.Jes Sorensen
Separate i386 architecture specific from core.c and move it to x86/core.c and add x86/lguest.h header file to match. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Remove fixed limit on number of guests, and lguests array.Rusty Russell
Back when we had all the Guest state in the switcher, we had a fixed array of them. This is no longer necessary. If we switch the network code to using random_ether_addr (46 bits is enough to avoid clashes), we can get rid of the concept of "guest id" altogether. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Move lguest hcalls to arch-specific headerJes Sorensen
Move architecture specific portion of lg_hcall code to asm-i386/lg_hcall.h and have it included from linux/lguest.h. [Changed to asm-i386/lguest_hcall.h so documentation finds it -RR] Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jes Sorensen <jes@sgi.com>
2007-10-23Make lguest_launcher.h types userspace-friendlyRusty Russell
lguest_launcher.h uses "u32" not "__u32", which sets a bad example. Fix that, and include <linux/types.h>. This means we need to use -I on the Launcher build line so types.h is found. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Make asm-x86/bootparam.h includable from userspace.Rusty Russell
To actually write a bootloader (or, say, the lguest launcher) currently requires duplication of these structures. Making them includable from userspace is much nicer. We merge the common userspace-required definitions of e820_32/64.h into e820.h for export. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-22Expand hwif->host_flags so that it fits new flags.David Miller
Commit 238e4f142c33bb34440cc64029dde7b9fbc4e65f ("ide: add IDE_HFLAG_NO_LBA48 and IDE_HFLAG_NO_LBA48_DMA host flags") caused a regression because the host_flags in struct hwif_s wasn't expanded to cope with the fact that the host flags no longer fit in 16 bits. Signed-off-by: David S. Miller <davem@davemloft.net> [ I hate having to add good commit descriptions. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: appletouch - apply idle reset logic to all touchpads Input: usbtouchscreen - add support for GoTop tablet devices Input: bf54x-keys - return real error when request_irq() fails Input: i8042 - export i8042_command()
2007-10-22Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6Linus Torvalds
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [S390] 4level-fixup cleanup [S390] Cleanup page table definitions. [S390] Introduce follow_table in uaccess_pt.c [S390] Remove unused user_seg from thread structure. [S390] tlb flush fix. [S390] kernel: Fix dump on panic for DASDs under LPAR. [S390] struct class_device -> struct device conversion. [S390] cio: Fix incomplete commit for uevent suppression. [S390] cio: Use to_channelpath() for device to channel path conversion. [S390] Add per-cpu idle time / idle count sysfs attributes. [S390] Update default configuration.
2007-10-22Merge branch 'master' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits) [IPSEC] IPV6: Fix to add tunnel mode SA correctly. [NET]: Cut off the queue_mapping field from sk_buff [NET]: Hide the queue_mapping field inside netif_subqueue_stopped [NET]: Make and use skb_get_queue_mapping [NET]: Use the skb_set_queue_mapping where appropriate [INET]: Use MODULE_ALIAS_NET_PF_PROTO_TYPE where possible. [INET]: Let inet_diag and friends autoload [NIU]: Cleanup PAGE_SIZE checks a bit [NET]: Fix SKB_WITH_OVERHEAD calculation [ATM]: Fix clip module reload crash. [TG3]: Update version to 3.85 [TG3]: PCI command adjustment [TG3]: Add management FW version to ethtool report [TG3]: Add 5723 support [Bluetooth] Convert RFCOMM to use kthread API [Bluetooth] Add constant for Bluetooth socket options level [Bluetooth] Add support for handling simple eSCO links [Bluetooth] Add address and channel attribute to RFCOMM TTY device [Bluetooth] Fix wrong argument in debug code of HIDP [Bluetooth] Add generic driver for Bluetooth USB devices ...
2007-10-22Merge branch 'master' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: [POWERPC] Enable restart support for lite5200 board [POWERPC] Add restart support for mpc52xx based platforms [POWERPC] Update device tree binding for mpc5200 gpt [POWERPC] Add mpc52xx_find_and_map_path(), refactor utility functions [POWERPC] bestcomm: Restrict bus prefetch bugfix to original mpc5200 silicon.
2007-10-22Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: [MIPS] time: Make c0_compare_int_usable more bullet proof [MIPS] Kbuild: Use the new cc-cross-prefix feature. [MIPS] Fix include wrapper symbol to something sane. [MIPS] Malta: Delete dead code. [MIPS] time: Add GT641xx timer0 clockevent driver [MIPS] time: SMP-proofing of Sibyte clockevent/clocksource code. [MIPS] time: SMP/NUMA-proofing of IP27 HUB RT timer code. [MIPS] time: Fix calculation in clockevent_set_clock()
2007-10-22Merge branch 'master' of ↵Linus Torvalds
ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb * 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (37 commits) V4L/DVB (6382): saa7134: fix NULL dereference at suspend time for cards without IR receiver V4L/DVB (6380): ivtvfb: Removal of the 'osd_compat' module option V4L/DVB (6379): patch which improves GotView Saa7135 remote control V4L/DVB (6378b): Updates info about the removal of V4L1 at feature-removal-schedule.txt V4L/DVB (6378a): Removal of VIDIOC_[G|S]_MPEGCOMP from feature-removal-schedule.txt V4L/DVB (6378): DiB0700-device: Using 1.10 firmware V4L/DVB (6357): pvrusb2: Improve encoder chip health tracking V4L/DVB (6356): "while (!ca->wakeup)" breaks the CAM initialisation V4L/DVB (6352): ir-kbd-i2c: Missing break statement V4L/DVB (6350): V4L: possible leak in em28xx_init_isoc V4L/DVB (6348): ivtv: undo video mute when closing the radio V4L/DVB (6347): ivtv: fix video mute when radio is used V4L/DVB (6346): ivtvfb: YUV output size fix when ivtvfb is not loaded V4L/DVB (6345): ivtvfb: YUV handling of an image which is not visible in the display area V4L/DVB (6343): ivtvfb: check return value of unregister_framebuffer V4L/DVB (6342): ivtv: fix circular locking (bug 9037) V4L/DVB (6341): ivtv: fix resizing MPEG1 streams V4L/DVB (6340): ivtvfb: screen mode change sometimes goes wrong V4L/DVB (6339): ivtv: set the video color to black instead of green when capturing from the radio V4L/DVB (6338): ivtv: fix incorrect EBUSY return ...
2007-10-22ppc: fix AT_VECTOR_SIZE on arch/ppcGrant Likely
Commit 4f9a58d75bfe82ab2b8ba5b8506dfb190a267834 ("increase AT_VECTOR_SIZE to terminate saved_auxv properly") changes the size of AT_VECTOR_SIZE from hard coded '44' to a calculation based on the value of AT_VECTOR_SIZE_ARCH and AT_VECTOR_SIZE_BASE. The change works for arch/powerpc, but it breaks arch/ppc because the needed AT_VECTOR_SIZE_ARCH is not present in include/asm-ppc/system.h and a default value of 0 is used instead. This results in AT_VECTOR_SIZE being too small and it causes a kernel crash on loading init. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22Merge branch 'sg' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'sg' of git://git.kernel.dk/linux-2.6-block: Add CONFIG_DEBUG_SG sg validation Change table chaining layout Update arch/ to use sg helpers Update swiotlb to use sg helpers Update net/ to use sg helpers Update fs/ to use sg helpers [SG] Update drivers to use sg helpers [SG] Update crypto/ to sg helpers [SG] Update block layer to use sg helpers [SG] Add helpers for manipulating SG entries
2007-10-23Merge branch 'for-2.6.24' of git://git.secretlab.ca/git/linux-2.6-mpc52xxPaul Mackerras
2007-10-22[MIPS] Fix include wrapper symbol to something sane.Ralf Baechle
And why are there i8253.h and 8253pit.h ... Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-22[MIPS] time: Add GT641xx timer0 clockevent driverYoichi Yuasa
And make use of it for Cobalt. A few others such as the Malta could make use of it as well. Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-22[MIPS] time: SMP-proofing of Sibyte clockevent/clocksource code.Ralf Baechle
The BCM148 has 4 cores but there are also just 4 generic timers available so use the ZBbus cycle counter instead of it. In addition the ZBbus counter also offers a much higher resolution and 64-bit counting so I'm considering a later complete conversion to it once I figure out if all members of the Sibyte SOC family support it - the docs seem to agree but the headers files seem to disagree ... Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-22Add CONFIG_DEBUG_SG sg validationJens Axboe
Add a Kconfig entry which will toggle some sanity checks on the sg entry and tables. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22Change table chaining layoutJens Axboe
Change the page member of the scatterlist structure to be an unsigned long, and encode more stuff in the lower bits: - Bits 0 and 1 zero: this is a normal sg entry. Next sg entry is located at sg + 1. - Bit 0 set: this is a chain entry, the next real entry is at ->page_link with the two low bits masked off. - Bit 1 set: this is the final entry in the sg entry. sg_next() will return NULL when passed such an entry. It's thus important that sg table users use the proper accessors to get and set the page member. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22exportfs: update documentationChristoph Hellwig
Update documentation to the current state of affairs. Remove duplicated method descruptions in exportfs.h and point to Documentation/filesystems/ Exporting instead. Add a little file header comment in expfs.c describing what's going on and mentioning Neils and my copyright [1]. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22exportfs: make struct export_operations constChristoph Hellwig
Now that nfsd has stopped writing to the find_exported_dentry member we an mark the export_operations const Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22exportfs: remove old methodsChristoph Hellwig
Now that all filesystems are converted remove support for the old methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22reiserfs: new export opsChristoph Hellwig
Another nice little cleanup by using the new methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22efs: new export opsChristoph Hellwig
Trivial switch over to the new generic helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22exportfs: add new methodsChristoph Hellwig
Add the guts for the new filesystem API to exportfs. There's now a fh_to_dentry method that returns a dentry for the object looked for given a filehandle fragment, and a fh_to_parent operation that returns the dentry for the encoded parent directory in case the file handle contains it. There are default implementations for these methods that only take a callback for an nfs-enhanced iget variant and implement the rest of the semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22exportfs: add fid typeChristoph Hellwig
This patchset is a medium scale rewrite of the export operations interface. The goal is to make the interface less complex, and easier to understand from the filesystem side, aswell as preparing generic support for exporting of 64bit inode numbers. This touches all nfs exporting filesystems, and I've done testing on all of the filesystems I have here locally (xfs, ext2, ext3, reiserfs, jfs) This patch: Add a structured fid type so that we don't have to pass an array of u32 values around everywhere. It's a union of possible layouts. As a start there's only the u32 array and the traditional 32bit inode format, but there will be more in one of my next patchset when I start to document the various filehandle formats we have in lowlevel filesystems better. Also add an enum that gives the various filehandle types human- readable names. Note: Some people might think the struct containing an anonymous union is ugly, but I didn't want to pass around a raw union type. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22kexec: add BSS to resource treeBernhard Walle
Add the BSS to the resource tree just as kernel text and kernel data are in the resource tree. The main reason behind this is to avoid crashkernel reservation in that area. While it's not strictly necessary to have the BSS in the resource tree (the actual collision detection is done in the reserve_bootmem() function before), the usage of the BSS resource should be presented to the user in /proc/iomem just as Kernel data and Kernel code. Note: The patch currently is only implemented for x86 and ia64 (because efi_initialize_iomem_resources() has the same signature on i386 and ia64). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: <linux-arch@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22intel-iommu: fix for IOMMU early crashKeshavamurthy, Anil S
pci_dev's->sysdata is highly overloaded and currently IOMMU is broken due to IOMMU code depending on this field. This patch introduces new field in pci_dev's dev.archdata struct to hold IOMMU specific per device IOMMU private data. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Greg KH <greg@kroah.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22Intel IOMMU: DMAR fault handling supportKeshavamurthy, Anil S
MSI interrupt handler registrations and fault handling support for Intel-IOMMU hadrware. This patch enables the MSI interrupts for the DMA remapping units and in the interrupt handler read the fault cause and outputs the same on to the console. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christoph Lameter <clameter@sgi.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22Intel IOMMU: Intel IOMMU driverKeshavamurthy, Anil S
Actual intel IOMMU driver. Hardware spec can be found at: http://www.intel.com/technology/virtualization This driver sets X86_64 'dma_ops', so hook into standard DMA APIs. In this way, PCI driver will get virtual DMA address. This change is transparent to PCI drivers. [akpm@linux-foundation.org: remove unneeded cast] [akpm@linux-foundation.org: build fix] [bunk@stusta.de: fix duplicate CONFIG_DMAR Makefile line] Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christoph Lameter <clameter@sgi.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22Intel IOMMU: clflush_cache_range now takes size paramKeshavamurthy, Anil S
Introduce the size param for clflush_cache_range(). Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christoph Lameter <clameter@sgi.com> Cc: Greg KH <greg@kroah.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22Intel IOMMU: PCI generic helper functionKeshavamurthy, Anil S
When devices are under a p2p bridge, upstream transactions get replaced by the device id of the bridge as it owns the PCIE transaction. Hence its necessary to setup translations on behalf of the bridge as well. Due to this limitation all devices under a p2p share the same domain in a DMAR. We just cache the type of device, if its a native PCIe device or not for later use. [akpm@linux-foundation.org: BUG_ON -> WARN_ON+recover] Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christoph Lameter <clameter@sgi.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22Intel IOMMU: DMAR detection and parsing logicKeshavamurthy, Anil S
This patch supports the upcomming Intel IOMMU hardware a.k.a. Intel(R) Virtualization Technology for Directed I/O Architecture and the hardware spec for the same can be found here http://www.intel.com/technology/virtualization/index.htm FAQ! (questions from akpm, answers from ak) > So... what's all this code for? > > I assume that the intent here is to speed things up under Xen, etc? Yes in some cases, but not this code. That would be the Xen version of this code that could potentially assign whole devices to guests. I expect this to be only useful in some special cases though because most hardware is not virtualizable and you typically want an own instance for each guest. Ok at some point KVM might implement this too; i likely would use this code for this. > Do we > have any benchmark results to help us to decide whether a merge would be > justified? The main advantage for doing it in the normal kernel is not performance, but more safety. Broken devices won't be able to corrupt memory by doing random DMA. Unfortunately that doesn't work for graphics yet, for that need user space interfaces for the X server are needed. There are some potential performance benefits too: - When you have a device that cannot address the complete address range an IOMMU can remap its memory instead of bounce buffering. Remapping is likely cheaper than copying. - The IOMMU can merge sg lists into a single virtual block. This could potentially speed up SG IO when the device is slow walking SG lists. [I long ago benchmarked 5% on some block benchmark with an old MPT Fusion; but it probably depends a lot on the HBA] And you get better driver debugging because unexpected memory accesses from the devices will cause a trappable event. > > Does it slow anything down? It adds more overhead to each IO so yes. This patch: Add support for early detection and parsing of DMAR's (DMA Remapping) reported to OS via ACPI tables. DMA remapping(DMAR) devices support enables independent address translations for Direct Memory Access(DMA) from Devices. These DMA remapping devices are reported via ACPI tables and includes pci device scope covered by these DMA remapping device. For detailed info on the specification of "Intel(R) Virtualization Technology for Directed I/O Architecture" please see http://www.intel.com/technology/virtualization/index.htm Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christoph Lameter <clameter@sgi.com> Cc: Greg KH <greg@kroah.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22ext2: avoid rec_len overflow with 64KB block sizeJan Kara
With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry length. So we store 0xffff instead and convert the value when read from / written to disk. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22capabilities: clean up file capability readingSerge E. Hallyn
Simplify the vfs_cap_data structure. Also fix get_file_caps which was declaring __le32 v1caps[XATTR_CAPS_SZ] on the stack, but XATTR_CAPS_SZ is already * sizeof(__le32). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Andrew Morgan <morgan@kernel.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22memory hotplug: make kmem_cache_node for SLUB on memory online avoid panicYasunori Goto
Fix a panic due to access NULL pointer of kmem_cache_node at discard_slab() after memory online. When memory online is called, kmem_cache_nodes are created for all SLUBs for new node whose memory are available. slab_mem_going_online_callback() is called to make kmem_cache_node() in callback of memory online event. If it (or other callbacks) fails, then slab_mem_offline_callback() is called for rollback. In memory offline, slab_mem_going_offline_callback() is called to shrink all slub cache, then slab_mem_offline_callback() is called later. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: locking fix] [akpm@linux-foundation.org: build fix] Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22memory hotplug: rearrange memory hotplug notifierYasunori Goto
Current memory notifier has some defects yet. (Fortunately, nothing uses it.) This patch is to fix and rearrange for them. - Add information of start_pfn, nr_pages, and node id if node status is changes from/to memoryless node for callback functions. Callbacks can't do anything without those information. - Add notification going-online status. It is necessary for creating per node structure before the node's pages are available. - Move GOING_OFFLINE status notification after page isolation. It is good place for return memory like cache for callback, because returned page is not used again. - Make CANCEL events for rollingback when error occurs. - Delete MEM_MAPPING_INVALID notification. It will be not used. - Fix compile error of (un)register_memory_notifier(). Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>