aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2009-06-11fsnotify: unified filesystem notification backendEric Paris
fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-06-11Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c
2009-06-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (266 commits) sh: Tie sparseirq in to Kconfig. sh: Wire up sys_rt_tgsigqueueinfo. sh: Fix sys_pwritev() syscall table entry for sh32. sh: Fix sh4a llsc-based cmpxchg() sh: sh7724: Add JPU support sh: sh7724: INTC setting update sh: sh7722 clock framework rewrite sh: sh7366 clock framework rewrite sh: sh7343 clock framework rewrite sh: sh7724 clock framework rewrite V3 sh: sh7723 clock framework rewrite V2 sh: add enable()/disable()/set_rate() to div6 code sh: add AP325RXA mode pin configuration sh: add Migo-R mode pin configuration sh: sh7722 mode pin definitions sh: sh7724 mode pin comments sh: sh7723 mode pin V2 sh: rework mode pin code sh: clock div6 helper code sh: clock div4 frequency table offset fix ...
2009-06-11Merge branch 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits) KVM: Prevent overflow in largepages calculation KVM: Disable large pages on misaligned memory slots KVM: Add VT-x machine check support KVM: VMX: Rename rmode.active to rmode.vm86_active KVM: Move "exit due to NMI" handling into vmx_complete_interrupts() KVM: Disable CR8 intercept if tpr patching is active KVM: Do not migrate pending software interrupts. KVM: inject NMI after IRET from a previous NMI, not before. KVM: Always request IRQ/NMI window if an interrupt is pending KVM: Do not re-execute INTn instruction. KVM: skip_emulated_instruction() decode instruction if size is not known KVM: Remove irq_pending bitmap KVM: Do not allow interrupt injection from userspace if there is a pending event. KVM: Unprotect a page if #PF happens during NMI injection. KVM: s390: Verify memory in kvm run KVM: s390: Sanity check on validity intercept KVM: s390: Unlink vcpu on destroy - v2 KVM: s390: optimize float int lock: spin_lock_bh --> spin_lock KVM: s390: use hrtimer for clock wakeup from idle - v2 KVM: s390: Fix memory slot versus run - v3 ...
2009-06-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (44 commits) nommu: Provide mmap_min_addr definition. TOMOYO: Add description of lists and structures. TOMOYO: Remove unused field. integrity: ima audit dentry_open failure TOMOYO: Remove unused parameter. security: use mmap_min_addr indepedently of security models TOMOYO: Simplify policy reader. TOMOYO: Remove redundant markers. SELinux: define audit permissions for audit tree netlink messages TOMOYO: Remove unused mutex. tomoyo: avoid get+put of task_struct smack: Remove redundant initialization. integrity: nfsd imbalance bug fix rootplug: Remove redundant initialization. smack: do not beyond ARRAY_SIZE of data integrity: move ima_counts_get integrity: path_check update IMA: Add __init notation to ima functions IMA: Minimal IMA policy and boot param for TCB IMA policy selinux: remove obsolete read buffer limit from sel_read_bool ...
2009-06-11Merge branch 'for-2.6.31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (28 commits) ide-tape: fix debug call alim15x3: Remove historical hacks, re-enable init_hwif for PowerPC ide-dma: don't reset request fields on dma_timeout_retry() ide: drop rq->data handling from ide_map_sg() ide-atapi: kill unused fields and callbacks ide-tape: simplify read/write functions ide-tape: use byte size instead of sectors on rw issue functions ide-tape: unify r/w init paths ide-tape: kill idetape_bh ide-tape: use standard data transfer mechanism ide-tape: use single continuous buffer ide-atapi,tape,floppy: allow ->pc_callback() to change rq->data_len ide-tape,floppy: fix failed command completion after request sense ide-pm: don't abuse rq->data ide-cd,atapi: use bio for internal commands ide-atapi: convert ide-{floppy,tape} to using preallocated sense buffer ide-cd: convert to using generic sense request ide: add helpers for preparing sense requests ide-cd: don't abuse rq->buffer ide-atapi: don't abuse rq->buffer ...
2009-06-11Merge branch 'serial-from-alan'Linus Torvalds
* serial-from-alan: (79 commits) moxa: prevent opening unavailable ports imx: serial: use tty_encode_baud_rate to set true rate imx: serial: add IrDA support to serial driver imx: serial: use rational library function lib: isolate rational fractions helper function imx: serial: handle initialisation failure correctly imx: serial: be sure to stop xmit upon shutdown imx: serial: notify higher layers in case xmit IRQ was not called imx: serial: fix one bit field type imx: serial: fix whitespaces (no changes in functionality) tty: use prepare/finish_wait tty: remove sleep_on sierra: driver interface blacklisting sierra: driver urb handling improvements tty: resolve some sierra breakage timbuart: Fix the termios logic serial: Added Timberdale UART driver tty: Add URL for ttydev queue devpts: unregister the file system on error tty: Untangle termios and mm mutex dependencies ...
2009-06-11lib: isolate rational fractions helper functionOskar Schirmer
Provide a helper function to determine optimum numerator denominator value pairs taking into account restricted register size. Useful especially with PLL and other clock configurations. Signed-off-by: Oskar Schirmer <os@emlix.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11serial: Added Timberdale UART driverRichard Röjfors
Driver for the UART found in the Timberdale FPGA Signed-off-by: Richard Röjfors <richard.rojfors.ext@mocean-labs.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11serial: add support for the TI AR7 internal UARTFlorian Fainelli
This patch adds support for the TI AR7 internal UART. Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11tty: rewrite the ldisc lockingAlan Cox
There are several pretty much unfixable races in the old ldisc code, especially with respect to pty behaviour and also to hangup. It's easier to rewrite the code than simply try and patch it up. This patch - splits the ldisc from the tty (so we will be able to refcount it more cleanly later) - introduces a mutex lock for ldisc changing on an active device - fixes the complete mess that hangup caused - implements hopefully correct setldisc/close/hangup locking There are still some problems around pty pairs that have always been there but at least it is now possible to understand the code and fix further problems. This fixes the following known bugs - hang up can leak ldisc references - hang up may not call open/close on ldisc in a matched way - pty/tty pairs can deadlock during an ldisc change - reading the ldisc proc files can cause every ldisc to be loaded and probably a few other of the mysterious ldisc race reports. I'm sure it also adds the odd new one. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11tty: Extract various bits of ldisc codeAlan Cox
Before trying to tackle the ldisc bugs the code needs to be a good deal more readable, so do the simple extractions of routines first. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11tty: throttling race fixAlan Cox
The tty throttling code can race due to the lock drops. It takes very high loads but this has been observed and verified by Rob Duncan. The basic problem is that on an SMP box we can go CPU #1 CPU #2 need to throttle ? suppose we should buffer space cleared are we throttled yes ? - unthrottle call throttle method This changeet take the termios lock to protect against this. The termios lock isn't the initial obvious candidate but many implementations of throttle methods already need to poke around their own termios structures (and nobody really locks them against a racing change of flow control). This does mean that anyone who is setting tty->low_latency = 1 and then calling tty_flip_buffer_push from their unthrottle method is going to end up collapsing in a pile of locks. However we've removed all the known bogus users of low_latency = 1 and such use isn't safe anyway for other reasons so catching it would be an improvement. Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-118250_pci: add the OXCB950 chip to the 8250 PCI driver.Andre Przywara
This adds support for the following serial controller chip: Oxford Semiconductor OXCB950 for PCI Cardbus interface http://www.transdimension.com/products/serial/OXCB950.html on this card: ExSys EX-1370 1 port high-speed serial card for ExpressCard/34 slot Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11serial: refactor ASYNC_ flagsJiri Slaby
Define ASYNCB_* flags which are bit numbers of the ASYNC_* flags. This is useful for {test,set,clear}_bit. Also convert each ASYNC_% to be (1 << ASYNCB_%) and define masks with the macros, not constants. Tested with: #include "PATH_TO_KERNEL/include/linux/serial.h" static struct { unsigned int new, old; } as[] = { { ASYNC_HUP_NOTIFY, 0x0001 }, { ASYNC_FOURPORT, 0x0002 }, ... { ASYNC_BOOT_ONLYMCA, 0x00400000 }, { ASYNC_INTERNAL_FLAGS, 0xFFC00000 } }; ... for (a = 0; a < ARRAY_SIZE(as); a++) if (as[a].old != as[a].new) printf("%.8x != %.8x\n", as[a].old, as[a].new); Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11tty: cyclades, remove typedefsJiri Slaby
They are unused anyway. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11tty: cyclades, cache HW versionJiri Slaby
Store HW version locally to not read it all the time in interrupts and alike. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11tty: cyclades, plx9060 casts cleanupJiri Slaby
Remove ugly all-over-the-code casts of ctl_addr to 9060 space. Add an union to the cyclades_card structure, which contains a pointer to both 9050 and 9060 spaces. The 9050 space layout is unknown, so let it still as a void __iomem pointer. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11tty: Bring the usb tty port structure into more useAlan Cox
This allows us to clean stuff up, but is probably also going to cause some app breakage with buggy apps as we now implement proper POSIX behaviour for USB ports matching all the other ports. This does also mean other apps that break on USB will now work properly. Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11tty: Implement a drain delay in the tty portAlan Cox
We need this for devices that cannot flush and wait, but which do not order data and modem events. Without it we will hang up before all the data clears the hardware. Needed for the USB changes. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11tty: Add carrier processing on close to the tty_port coreAlan Cox
Some drivers implement this internally, others miss it out. Push the behaviour into the core code as that way everyone will do it consistently. Update the dtr rts method to raise or lower depending upon flags. Having a single method in this style fits most of the implementations more cleanly than two funtions. We need this in place before we tackle the USB side Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11block: add request clone interface (v2)Kiyoshi Ueda
This patch adds the following 2 interfaces for request-stacking drivers: - blk_rq_prep_clone(struct request *clone, struct request *orig, struct bio_set *bs, gfp_t gfp_mask, int (*bio_ctr)(struct bio *, struct bio*, void *), void *data) * Clones bios in the original request to the clone request (bio_ctr is called for each cloned bios.) * Copies attributes of the original request to the clone request. The actual data parts (e.g. ->cmd, ->buffer, ->sense) are not copied. - blk_rq_unprep_clone(struct request *clone) * Frees cloned bios from the clone request. Request stacking drivers (e.g. request-based dm) need to make a clone request for a submitted request and dispatch it to other devices. To allocate request for the clone, request stacking drivers may not be able to use blk_get_request() because the allocation may be done in an irq-disabled context. So blk_rq_prep_clone() takes a request allocated by the caller as an argument. For each clone bio in the clone request, request stacking drivers should be able to set up their own completion handler. So blk_rq_prep_clone() takes a callback function which is called for each clone bio, and a pointer for private data which is passed to the callback. NOTE: blk_rq_prep_clone() doesn't copy any actual data of the original request. Pages are shared between original bios and cloned bios. So caller must not complete the original request before the clone request. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-11Merge branch 'master' of ↵Paul Mundt
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
2009-06-10Merge branch 'tracing-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...
2009-06-10Merge branch 'signal-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: hookup sys_rt_tgsigqueueinfo signals: implement sys_rt_tgsigqueueinfo signals: split do_tkill
2009-06-10Merge branch 'rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: rcu_sched_grace_period(): kill the bogus flush_signals() rculist: use list_entry_rcu in places where it's appropriate rculist.h: introduce list_entry_rcu() and list_first_entry_rcu() rcu: Update RCU tracing documentation for __rcu_pending rcu: Add __rcu_pending tracing to hierarchical RCU RCU: make treercu be default
2009-06-11Merge branch 'next' into for-linusJames Morris
2009-06-10Merge branch 'locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: spinlock: Add missing __raw_spin_lock_flags() stub for UP mutex: add atomic_dec_and_mutex_lock(), fix locking, rtmutex.c: Documentation cleanup mutex: add atomic_dec_and_mutex_lock()
2009-06-10Merge branch 'iommu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits) amd-iommu: remove unnecessary "AMD IOMMU: " prefix amd-iommu: detach device explicitly before attaching it to a new domain amd-iommu: remove BUS_NOTIFY_BOUND_DRIVER handling dma-debug: simplify logic in driver_filter() dma-debug: disable/enable irqs only once in device_dma_allocations dma-debug: use pr_* instead of printk(KERN_* ...) dma-debug: code style fixes dma-debug: comment style fixes dma-debug: change hash_bucket_find from first-fit to best-fit x86: enable GART-IOMMU only after setting up protection methods amd_iommu: fix lock imbalance dma-debug: add documentation for the driver filter dma-debug: add dma_debug_driver kernel command line dma-debug: add debugfs file for driver filter dma-debug: add variables and checks for driver filter dma-debug: fix debug_dma_sync_sg_for_cpu and debug_dma_sync_sg_for_device dma-debug: use sg_dma_len accessor dma-debug: use sg_dma_address accessor instead of using dma_address directly amd-iommu: don't free dma adresses below 512MB with CONFIG_IOMMU_STRESS amd-iommu: don't preallocate page tables with CONFIG_IOMMU_STRESS ...
2009-06-10Merge branch 'futexes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: fix restart in wait_requeue_pi futex: fix restart for early wakeup in futex_wait_requeue_pi() futex: cleanup error exit futex: remove the wait queue futex: add requeue-pi documentation futex: remove FUTEX_REQUEUE_PI (non CMP) futex: fix futex_wait_setup key handling sparc64: extend TI_RESTART_BLOCK space by 8 bytes futex: fixup unlocked requeue pi case futex: add requeue_pi functionality futex: split out futex value validation code futex: distangle futex_requeue() futex: add FUTEX_HAS_TIMEOUT flag to restart.futex.flags rt_mutex: add proxy lock routines futex: split out fixup owner logic from futex_lock_pi() futex: split out atomic logic from futex_lock_pi() futex: add helper to find the top prio waiter of a futex futex: separate futex_wait_queue_me() logic from futex_wait()
2009-06-10Merge branch 'x86-xen-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (42 commits) xen: cache cr0 value to avoid trap'n'emulate for read_cr0 xen/x86-64: clean up warnings about IST-using traps xen/x86-64: fix breakpoints and hardware watchpoints xen: reserve Xen start_info rather than e820 reserving xen: add FIX_TEXT_POKE to fixmap lguest: update lazy mmu changes to match lguest's use of kvm hypercalls xen: honour VCPU availability on boot xen: add "capabilities" file xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet xen/sys/hypervisor: change writable_pt to features xen: add /sys/hypervisor support xen/xenbus: export xenbus_dev_changed xen: use device model for suspending xenbus devices xen: remove suspend_cancel hook xen/dev-evtchn: clean up locking in evtchn xen: export ioctl headers to userspace xen: add /dev/xen/evtchn driver xen: add irq_from_evtchn xen: clean up gate trap/interrupt constants xen: set _PAGE_NX in __supported_pte_mask before pagetable construction ...
2009-06-10Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits) x86: fix system without memory on node0 x86, mm: Fix node_possible_map logic mm, x86: remove MEMORY_HOTPLUG_RESERVE related code x86: make sparse mem work in non-NUMA mode x86: process.c, remove useless headers x86: merge process.c a bit x86: use sparse_memory_present_with_active_regions() on UMA x86: unify 64-bit UMA and NUMA paging_init() x86: Allow 1MB of slack between the e820 map and SRAT, not 4GB x86: Sanity check the e820 against the SRAT table using e820 map only x86: clean up and and print out initial max_pfn_mapped x86/pci: remove rounding quirk from e820_setup_gap() x86, e820, pci: reserve extra free space near end of RAM x86: fix typo in address space documentation x86: 46 bit physical address support on 64 bits x86, mm: fault.c, use printk_once() in is_errata93() x86: move per-cpu mmu_gathers to mm/init.c x86: move max_pfn_mapped and max_low_pfn_mapped to setup.c x86: unify noexec handling x86: remove (null) in /sys kernel_page_tables ...
2009-06-10Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: fix typo in sched-rt-group.txt file ftrace: fix typo about map of kernel priority in ftrace.txt file. sched: properly define the sched_group::cpumask and sched_domain::span fields sched, timers: cleanup avenrun users sched, timers: move calc_load() to scheduler sched: Don't export sched_mc_power_savings on multi-socket single core system sched: emit thread info flags with stack trace sched: rt: document the risk of small values in the bandwidth settings sched: Replace first_cpu() with cpumask_first() in ILB nomination code sched: remove extra call overhead for schedule() sched: use group_first_cpu() instead of cpumask_first(sched_group_cpus()) wait: don't use __wake_up_common() sched: Nominate a power-efficient ilb in select_nohz_balancer() sched: Nominate idle load balancer from a semi-idle package. sched: remove redundant hierarchy walk in check_preempt_wakeup
2009-06-10Merge branch 'irq-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (76 commits) x86, apic: Fix dummy apic read operation together with broken MP handling x86, apic: Restore irqs on fail paths x86: Print real IOAPIC version for x86-64 x86: enable_update_mptable should be a macro sparseirq: Allow early irq_desc allocation x86, io-apic: Don't mark pin_programmed early x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabled x86, irq: update_mptable needs pci_routeirq x86: don't call read_apic_id if !cpu_has_apic x86, apic: introduce io_apic_irq_attr x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector(), fix x86: read apic ID in the !acpi_lapic case x86: apic: Fixmap apic address even if apic disabled x86: display extended apic registers with print_local_APIC and cpu_debug code x86: read apic ID in the !acpi_lapic case x86: clean up and fix setup_clear/force_cpu_cap handling x86: apic: Check rev 3 fadt correctly for physical_apic bit x86/pci: update pirq_enable_irq() to setup io apic routing x86/acpi: move setup io apic routing out of CONFIG_ACPI scope x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector() ...
2009-06-10block: prevent possible io_context->refcount overflowNikanth Karthikesan
Currently io_context has an atomic_t(32-bit) as refcount. In the case of cfq, for each device against whcih a task does I/O, a reference to the io_context would be taken. And when there are multiple process sharing io_contexts(CLONE_IO) would also have a reference to the same io_context. Theoretically the possible maximum number of processes sharing the same io_context + the number of disks/cfq_data referring to the same io_context can overflow the 32-bit counter on a very high-end machine. Even though it is an improbable case, let us make it atomic_long_t. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-10tracing: do not translate event helper macros in print formatSteven Rostedt
By moving the macro that creates the print format code above the defining of the event macro helpers (__get_str, __print_symbolic, and __get_dynamic_array), we get a little cleaner print format. Instead of: (char *)((void *)REC + REC->__data_loc_name) we get: __get_str(name) Instead of: ({ static const struct trace_print_flags symbols[] = { { HI_SOFTIRQ, "HI" }, { we get: __print_symbolic(REC->vec, { HI_SOFTIRQ, "HI" }, { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-10tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCKLi Zefan
Fix building failures when CONFIG_BLOCK == n. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A2F1520.8020003@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-10spinlock: Add missing __raw_spin_lock_flags() stub for UPBenjamin Herrenschmidt
This was only defined with CONFIG_DEBUG_SPINLOCK set, but some obscure arch/powerpc code wants it always. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-10KVM: protect assigned dev workqueue, int handler and irq ackerMarcelo Tosatti
kvm_assigned_dev_ack_irq is vulnerable to a race condition with the interrupt handler function. It does: if (dev->host_irq_disabled) { enable_irq(dev->host_irq); dev->host_irq_disabled = false; } If an interrupt triggers before the host->dev_irq_disabled assignment, it will disable the interrupt and set dev->host_irq_disabled to true. On return to kvm_assigned_dev_ack_irq, dev->host_irq_disabled is set to false, and the next kvm_assigned_dev_ack_irq call will fail to reenable it. Other than that, having the interrupt handler and work handlers run in parallel sounds like asking for trouble (could not spot any obvious problem, but better not have to, its fragile). CC: sheng.yang@intel.com Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: use smp_send_reschedule in kvm_vcpu_kickMarcelo Tosatti
KVM uses a function call IPI to cause the exit of a guest running on a physical cpu. For virtual interrupt notification there is no need to wait on IPI receival, or to execute any function. This is exactly what the reschedule IPI does, without the overhead of function IPI. So use it instead of smp_call_function_single in kvm_vcpu_kick. Also change the "guest_mode" variable to a bit in vcpu->requests, and use that to collapse multiple IPI's that would be issued between the first one and zeroing of guest mode. This allows kvm_vcpu_kick to called with interrupts disabled. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Enable snooping control for supported hardwareSheng Yang
Memory aliases with different memory type is a problem for guest. For the guest without assigned device, the memory type of guest memory would always been the same as host(WB); but for the assigned device, some part of memory may be used as DMA and then set to uncacheable memory type(UC/WC), which would be a conflict of host memory type then be a potential issue. Snooping control can guarantee the cache correctness of memory go through the DMA engine of VT-d. [avi: fix build on ia64] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Make kvm header C++ friendlynathan binkert
Two things needed fixing: 1) g++ does not allow a named structure type within an anonymous union and 2) Avoid name clash between two padding fields within the same struct by giving them different names as is done elsewhere in the header. Signed-off-by: Nathan Binkert <nate@binkert.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Fix interrupt unhalting a vcpu when it shouldn'tGleb Natapov
kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking if interrupt window is actually opened. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Device assignment framework reworkSheng Yang
After discussion with Marcelo, we decided to rework device assignment framework together. The old problems are kernel logic is unnecessary complex. So Marcelo suggest to split it into a more elegant way: 1. Split host IRQ assign and guest IRQ assign. And userspace determine the combination. Also discard msi2intx parameter, userspace can specific KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to enable MSI to INTx convertion. 2. Split assign IRQ and deassign IRQ. Import two new ioctls: KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ. This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the old interface). [avi: replace homemade bitcount() by hweight_long()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: APIC: get rid of deliver_bitmaskGleb Natapov
Deliver interrupt during destination matching loop. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10KVM: consolidate ioapic/ipi interrupt delivery logicGleb Natapov
Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in apic_send_ipi() to figure out ipi destination instead of reimplementing the logic. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10KVM: ioapic/msi interrupt delivery consolidationGleb Natapov
ioapic_deliver() and kvm_set_msi() have code duplication. Move the code into ioapic_deliver_entry() function and call it from both places. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10KVM: declare ioapic functions only on affected hardwareChristian Borntraeger
Since "KVM: Unify the delivery of IOAPIC and MSI interrupts" I get the following warnings: CC [M] arch/s390/kvm/kvm-s390.o In file included from arch/s390/kvm/kvm-s390.c:22: include/linux/kvm_host.h:357: warning: 'struct kvm_ioapic' declared inside parameter list include/linux/kvm_host.h:357: warning: its scope is only this definition or declaration, which is probably not what you want This patch limits IOAPIC functions for architectures that have one. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Enable MSI-X for KVM assigned deviceSheng Yang
This patch finally enable MSI-X. What we need for MSI-X: 1. Intercept one page in MMIO region of device. So that we can get guest desired MSI-X table and set up the real one. Now this have been done by guest, and transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY. 2. Information for incoming interrupt. Now one device can have more than one interrupt, and they are all handled by one workqueue structure. So we need to identify them. The previous patch enable gsi_msg_pending_bitmap get this done. 3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X message address/data. We used same entry number for the host and guest here, so that it's easy to find the correlated guest gsi. What we lack for now: 1. The PCI spec said nothing can existed with MSI-X table in the same page of MMIO region, except pending bits. The patch ignore pending bits as the first step (so they are always 0 - no pending). 2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch didn't support this, and Linux also don't work in this way. 3. The patch didn't implement MSI-X mask all and mask single entry. I would implement the former in driver/pci/msi.c later. And for single entry, userspace should have reposibility to handle it. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Add MSI-X interrupt injection logicSheng Yang
We have to handle more than one interrupt with one handler for MSI-X. Avi suggested to use a flag to indicate the pending. So here is it. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>