aboutsummaryrefslogtreecommitdiff
path: root/virt/kvm
AgeCommit message (Collapse)Author
2009-12-22anonfd: Allow making anon files read-onlyRoland Dreier
It seems a couple places such as arch/ia64/kernel/perfmon.c and drivers/infiniband/core/uverbs_main.c could use anon_inode_getfile() instead of a private pseudo-fs + alloc_file(), if only there were a way to get a read-only file. So provide this by having anon_inode_getfile() create a read-only file if we pass O_RDONLY in flags. Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-09Merge commit 'origin/master' into nextBenjamin Herrenschmidt
Conflicts: include/linux/kvm.h
2009-12-03KVM: Allow internal errors reported to userspace to carry extra dataAvi Kivity
Usually userspace will freeze the guest so we can inspect it, but some internal state is not available. Add extra data to internal error reporting so we can expose it to the debugger. Extra data is specific to the suberror. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: only clear irq_source_id if irqchip is presentMarcelo Tosatti
Otherwise kvm might attempt to dereference a NULL pointer. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Enable 32bit dirty log pointers on 64bit hostArnd Bergmann
With big endian userspace, we can't quite figure out if a pointer is 32 bit (shifted >> 32) or 64 bit when we read a 64 bit pointer. This is what happens with dirty logging. To get the pointer interpreted correctly, we thus need Arnd's patch to implement a compat layer for the ioctl: A better way to do this is to add a separate compat_ioctl() method that converts this for you. Based on initial patch from Arnd Bergmann. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: fix irq_source_id size verificationMarcelo Tosatti
find_first_zero_bit works with bit numbers, not bytes. Fixes https://sourceforge.net/tracker/?func=detail&aid=2847560&group_id=180599&atid=893831 Reported-by: "Xu, Jiajun" <jiajun.xu@intel.com> Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03KVM: introduce kvm_vcpu_on_spinZhai, Edwin
Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing once the cpu detects pause-based looping. Signed-off-by: "Zhai, Edwin" <edwin.zhai@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03KVM: fix lock imbalance in kvm_*_irq_source_id()Jiri Slaby
Stanse found 2 lock imbalances in kvm_request_irq_source_id and kvm_free_irq_source_id. They omit to unlock kvm->irq_lock on fail paths. Fix that by adding unlock labels at the end of the functions and jump there from the fail paths. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Activate Virtualization On DemandAlexander Graf
X86 CPUs need to have some magic happening to enable the virtualization extensions on them. This magic can result in unpleasant results for users, like blocking other VMMs from working (vmx) or using invalid TLB entries (svm). Currently KVM activates virtualization when the respective kernel module is loaded. This blocks us from autoloading KVM modules without breaking other VMMs. To circumvent this problem at least a bit, this patch introduces on demand activation of virtualization. This means, that instead virtualization is enabled on creation of the first virtual machine and disabled on destruction of the last one. So using this, KVM can be easily autoloaded, while keeping other hypervisors usable. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Move assigned device code to own fileAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Drop kvm->irq_lock lock from irq injection pathGleb Natapov
The only thing it protects now is interrupt injection into lapic and this can work lockless. Even now with kvm->irq_lock in place access to lapic is not entirely serialized since vcpu access doesn't take kvm->irq_lock. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Move IO APIC to its own lockGleb Natapov
The allows removal of irq_lock from the injection path. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Convert irq notifiers lists to RCU lockingGleb Natapov
Use RCU locking for mask/ack notifiers lists. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Move irq ack notifier list to arch independent codeGleb Natapov
Mask irq notifier list is already there. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Move irq routing data structure to rcu lockingGleb Natapov
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Maintain back mapping from irqchip/pin to gsiGleb Natapov
Maintain back mapping from irqchip/pin to gsi to speedup interrupt acknowledgment notifications. [avi: build fix on non-x86/ia64] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Change irq routing table to use gsi indexed arrayGleb Natapov
Use gsi indexed array instead of scanning all entries on each interrupt injection. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Move irq sharing information to irqchip levelGleb Natapov
This removes assumptions that max GSIs is smaller than number of pins. Sharing is tracked on pin level not GSI level. [avi: no PIC on ia64] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Don't wrap schedule() with vcpu_put()/vcpu_load()Avi Kivity
Preemption notifiers will do that for us automatically. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-11-05Use Little Endian for Dirty BitmapAlexander Graf
We currently use host endian long types to store information in the dirty bitmap. This works reasonably well on Little Endian targets, because the u32 after the first contains the next 32 bits. On Big Endian this breaks completely though, forcing us to be inventive here. So Ben suggested to always use Little Endian, which looks reasonable. We only have dirty bitmap implemented in Little Endian targets so far and since PowerPC would be the first Big Endian platform, we can just as well switch to Little Endian always with little effort without breaking existing targets. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-16KVM: Prevent kvm_init from corrupting debugfs structuresDarrick J. Wong
I'm seeing an oops condition when kvm-intel and kvm-amd are modprobe'd during boot (say on an Intel system) and then rmmod'd: # modprobe kvm-intel kvm_init() kvm_init_debug() kvm_arch_init() <-- stores debugfs dentries internally (success, etc) # modprobe kvm-amd kvm_init() kvm_init_debug() <-- second initialization clobbers kvm's internal pointers to dentries kvm_arch_init() kvm_exit_debug() <-- and frees them # rmmod kvm-intel kvm_exit() kvm_exit_debug() <-- double free of debugfs files! *BOOM* If execution gets to the end of kvm_init(), then the calling module has been established as the kvm provider. Move the debugfs initialization to the end of the function, and remove the now-unnecessary call to kvm_exit_debug() from the error path. That way we avoid trampling on the debugfs entries and freeing them twice. Cc: stable@kernel.org Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-04KVM: add support for change_pte mmu notifiersIzik Eidus
this is needed for kvm if it want ksm to directly map pages into its shadow page tables. [marcelo: cast pfn assignment to u64] Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-01const: constify remaining file_operationsAlexey Dobriyan
[akpm@linux-foundation.org: fix KVM] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27const: mark struct vm_struct_operationsAlexey Dobriyan
* mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24cpumask: use zalloc_cpumask_var() where possibleLi Zefan
Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-10KVM: correct error-handling codeJulia Lawall
This code is not executed before file has been initialized to the result of calling eventfd_fget. This function returns an ERR_PTR value in an error case instead of NULL. Thus the test that file is not NULL is always true. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @match exists@ expression x, E; statement S1, S2; @@ x = eventfd_fget(...) ... when != x = E ( * if (x == NULL || ...) S1 else S2 | * if (x == NULL && ...) S1 else S2 ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: fix compile warnings on s390Heiko Carstens
CC arch/s390/kvm/../../../virt/kvm/kvm_main.o arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function '__kvm_set_memory_region': arch/s390/kvm/../../../virt/kvm/kvm_main.c:485: warning: unused variable 'j' arch/s390/kvm/../../../virt/kvm/kvm_main.c:484: warning: unused variable 'lpages' arch/s390/kvm/../../../virt/kvm/kvm_main.c:483: warning: unused variable 'ugfn' Cc: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10KVM: Fix coalesced interrupt reporting in IOAPICGleb Natapov
This bug was introduced by b4a2f5e723e4f7df467. Cc: stable@kernel.org Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Move #endif KVM_CAP_IRQ_ROUTING to correct placeAvi Kivity
The symbol only controls irq routing, not MSI-X. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: fix kvm_init() error handlingXiao Guangrong
Remove debugfs file if kvm_arch_init() return error Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Drop obsolete cpu_get/put in make_all_cpus_requestJan Kiszka
spin_lock disables preemption, so we can simply read the current cpu. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10KVM: Reduce runnability interface with arch support codeGleb Natapov
Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from interface between general code and arch code. kvm_arch_vcpu_runnable() checks for interrupts instead. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: add ioeventfd supportGregory Haskins
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Host userspace can register any arbitrary IO address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called "doorbell". This module has a function called "doorbell_ring()" which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via ioeventfd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 110000 iops, 9.09us rtt ioeventfd-mmio: 200100 iops, 5.00us rtt ioeventfd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region with qemu's device model, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt ioeventfd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. -------------------- Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: make io_bus interface more robustGregory Haskins
Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON if it fails. We want to create dynamic MMIO/PIO entries driven from userspace later in the series, so we need to enhance the code to be more robust with the following changes: 1) Add a return value to the registration function 2) Fix up all the callsites to check the return code, handle any failures, and percolate the error up to the caller. 3) Add an unregister function that collapses holes in the array Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Add trace points in irqchip codeGleb Natapov
Add tracepoint in msi/ioapic/pic set_irq() functions, in IPI sending and in the point where IRQ is placed into apic's IRR. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: ignore msi request if !levelMichael S. Tsirkin
Irqfd sets level for interrupt to 1 and then to 0. For MSI, check level so that a single message is sent. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Use temporary variable to shorten lines.Gleb Natapov
Cosmetic only. No logic is changed by this patch. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Trace irq level and source idAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: fix lock imbalanceJiri Slaby
There is a missing unlock on one fail path in ioapic_mmio_write, fix that. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: document lock nesting ruleMichael S. Tsirkin
Document kvm->lock nesting within kvm->slots_lock Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: remove in_range from io devicesMichael S. Tsirkin
This changes bus accesses to use high-level kvm_io_bus_read/kvm_io_bus_write functions. in_range now becomes unused so it is removed from device ops in favor of read/write callbacks performing range checks internally. This allows aliasing (mostly for in-kernel virtio), as well as better error handling by making it possible to pass errors up to userspace. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: convert bus to slots_lockMichael S. Tsirkin
Use slots_lock to protect device list on the bus. slots_lock is already taken for read everywhere, so we only need to take it for write when registering devices. This is in preparation to removing in_range and kvm->lock around it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: switch coalesced mmio changes to slots_lockMichael S. Tsirkin
switch coalesced mmio slots_lock. slots_lock is already taken for read everywhere, so we only need to take it for write when changing zones. This is in preparation to removing in_range and kvm->lock around it. [avi: fix build] Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: document locking for kvm_io_device_opsMichael S. Tsirkin
slots_lock is taken everywhere when device ops are called. Document this as we will use this to rework locking for io. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: remove old KVMTRACE support codeMarcelo Tosatti
Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl pathsMarcelo Tosatti
Correct missing locking in a few places in x86's vm_ioctl handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Prepare memslot data structures for multiple hugepage sizesJoerg Roedel
[avi: fix build on non-x86] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: s390: Fix memslot initialization for userspace_addr != 0Christian Borntraeger
Since commit 854b5338196b1175706e99d63be43a4f8d8ab607 Author: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> KVM: s390: streamline memslot handling s390 uses the values of the memslot instead of doing everything in the arch ioctl handler of the KVM_SET_USER_MEMORY_REGION. Unfortunately we missed to set the userspace_addr of our memslot due to our s390 ifdef in __kvm_set_memory_region. Old s390 userspace launchers did not notice, since they started the guest at userspace address 0. Because of CONFIG_DEFAULT_MMAP_MIN_ADDR we now put the guest at 1M userspace, which does not work. This patch makes sure that new.userspace_addr is set on s390. This fix should go in quickly. Nevertheless, looking at the code we should clean up that ifdef in the long term. Any kernel janitors? Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: convert custom marker based tracing to event tracesMarcelo Tosatti
This allows use of the powerful ftrace infrastructure. See Documentation/trace/ for usage information. [avi, stephen: various build fixes] [sheng: fix control register breakage] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: VMX: conditionally disable 2M pagesMarcelo Tosatti
Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled. [avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>