aboutsummaryrefslogtreecommitdiff
path: root/virt/kvm
AgeCommit message (Collapse)Author
2009-09-10KVM: make io_bus interface more robustGregory Haskins
Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON if it fails. We want to create dynamic MMIO/PIO entries driven from userspace later in the series, so we need to enhance the code to be more robust with the following changes: 1) Add a return value to the registration function 2) Fix up all the callsites to check the return code, handle any failures, and percolate the error up to the caller. 3) Add an unregister function that collapses holes in the array Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Add trace points in irqchip codeGleb Natapov
Add tracepoint in msi/ioapic/pic set_irq() functions, in IPI sending and in the point where IRQ is placed into apic's IRR. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: ignore msi request if !levelMichael S. Tsirkin
Irqfd sets level for interrupt to 1 and then to 0. For MSI, check level so that a single message is sent. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Use temporary variable to shorten lines.Gleb Natapov
Cosmetic only. No logic is changed by this patch. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Trace irq level and source idAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: fix lock imbalanceJiri Slaby
There is a missing unlock on one fail path in ioapic_mmio_write, fix that. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: document lock nesting ruleMichael S. Tsirkin
Document kvm->lock nesting within kvm->slots_lock Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: remove in_range from io devicesMichael S. Tsirkin
This changes bus accesses to use high-level kvm_io_bus_read/kvm_io_bus_write functions. in_range now becomes unused so it is removed from device ops in favor of read/write callbacks performing range checks internally. This allows aliasing (mostly for in-kernel virtio), as well as better error handling by making it possible to pass errors up to userspace. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: convert bus to slots_lockMichael S. Tsirkin
Use slots_lock to protect device list on the bus. slots_lock is already taken for read everywhere, so we only need to take it for write when registering devices. This is in preparation to removing in_range and kvm->lock around it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: switch coalesced mmio changes to slots_lockMichael S. Tsirkin
switch coalesced mmio slots_lock. slots_lock is already taken for read everywhere, so we only need to take it for write when changing zones. This is in preparation to removing in_range and kvm->lock around it. [avi: fix build] Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: document locking for kvm_io_device_opsMichael S. Tsirkin
slots_lock is taken everywhere when device ops are called. Document this as we will use this to rework locking for io. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: remove old KVMTRACE support codeMarcelo Tosatti
Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl pathsMarcelo Tosatti
Correct missing locking in a few places in x86's vm_ioctl handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Prepare memslot data structures for multiple hugepage sizesJoerg Roedel
[avi: fix build on non-x86] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: s390: Fix memslot initialization for userspace_addr != 0Christian Borntraeger
Since commit 854b5338196b1175706e99d63be43a4f8d8ab607 Author: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> KVM: s390: streamline memslot handling s390 uses the values of the memslot instead of doing everything in the arch ioctl handler of the KVM_SET_USER_MEMORY_REGION. Unfortunately we missed to set the userspace_addr of our memslot due to our s390 ifdef in __kvm_set_memory_region. Old s390 userspace launchers did not notice, since they started the guest at userspace address 0. Because of CONFIG_DEFAULT_MMAP_MIN_ADDR we now put the guest at 1M userspace, which does not work. This patch makes sure that new.userspace_addr is set on s390. This fix should go in quickly. Nevertheless, looking at the code we should clean up that ifdef in the long term. Any kernel janitors? Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: convert custom marker based tracing to event tracesMarcelo Tosatti
This allows use of the powerful ftrace infrastructure. See Documentation/trace/ for usage information. [avi, stephen: various build fixes] [sheng: fix control register breakage] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: VMX: conditionally disable 2M pagesMarcelo Tosatti
Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled. [avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Use macro to iterate over vcpus.Gleb Natapov
[christian: remove unused variables on s390] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Break dependency between vcpu index in vcpus array and vcpu_id.Gleb Natapov
Archs are free to use vcpu_id as they see fit. For x86 it is used as vcpu's apic id. New ioctl is added to configure boot vcpu id that was assumed to be 0 till now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Introduce kvm_vcpu_is_bsp() function.Gleb Natapov
Use it instead of open code "vcpu_id zero is BSP" assumption. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: switch irq injection/acking data structures to irq_lockMarcelo Tosatti
Protect irq injection/acking data structures with a separate irq_lock mutex. This fixes the following deadlock: CPU A CPU B kvm_vm_ioctl_deassign_dev_irq() mutex_lock(&kvm->lock); worker_thread() -> kvm_deassign_irq() -> kvm_assigned_dev_interrupt_work_handler() -> deassign_host_irq() mutex_lock(&kvm->lock); -> cancel_work_sync() [blocked] [gleb: fix ia64 path] Reported-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: introduce irq_lock, use it to protect ioapicMarcelo Tosatti
Introduce irq_lock, and use to protect ioapic data structures. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: move coalesced_mmio locking to its own deviceMarcelo Tosatti
Move coalesced_mmio locking to its own device, instead of relying on kvm->lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Calculate available entries in coalesced mmio ringAvi Kivity
Instead of checking whether we'll wrap around, calculate how many entries are available, and check whether we have enough (just one) for the pending mmio. By itself, this doesn't change anything, but it paves the way for making this function lockless. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: cleanup io_device codeGregory Haskins
We modernize the io_device code so that we use container_of() instead of dev->private, and move the vtable to a separate ops structure (theoretically allows better caching for multiple instances of the same ops structure) Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Clean up coalesced_mmio destructionGregory Haskins
We invoke kfree() on a data member instead of the structure. This works today because the kvm_io_device is the first element of the private structure, but this could change in the future, so lets clean this up. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: No disable_irq for MSI/MSI-X interrupt on device assignmentSheng Yang
Disable interrupt at interrupt handler and enable it when guest ack is for the level triggered interrupt, to prevent reinjected interrupt. MSI/MSI-X don't need it. One possible problem is multiply same vector interrupt injected between irq handler and scheduled work handler would be merged as one for MSI/MSI-X. But AFAIK, the drivers handle it well. The patch fixed the oplin card performance issue(MSI-X performance is half of MSI/INTx). Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: irqfdGregory Haskins
KVM provides a complete virtual system environment for guests, including support for injecting interrupts modeled after the real exception/interrupt facilities present on the native platform (such as the IDT on x86). Virtual interrupts can come from a variety of sources (emulated devices, pass-through devices, etc) but all must be injected to the guest via the KVM infrastructure. This patch adds a new mechanism to inject a specific interrupt to a guest using a decoupled eventfd mechnanism: Any legal signal on the irqfd (using eventfd semantics from either userspace or kernel) will translate into an injected interrupt in the guest at the next available interrupt window. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Move common KVM Kconfig items to new file virt/kvm/KconfigAvi Kivity
Reduce Kconfig code duplication. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-08-09KVM: Avoid redelivery of edge interrupt before next edgeGleb Natapov
The check for an edge is broken in current ioapic code. ioapic->irr is cleared on each edge interrupt by ioapic_service() and this makes old_irr != ioapic->irr condition in kvm_ioapic_set_irq() to be always true. The patch fixes the code to properly recognise edge. Some HW emulation calls set_irq() without level change. If each such call is propagated to an OS it may confuse a device driver. This is the case with keyboard device emulation and Windows XP x64 installer on SMP VM. Each keystroke produce two interrupts (down/up) one interrupt is submitted to CPU0 and another to CPU1. This confuses Windows somehow and it ignores keystrokes. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-08-05KVM: fix ack not being delivered when msi presentMichael S. Tsirkin
kvm_notify_acked_irq does not check irq type, so that it sometimes interprets msi vector as irq. As a result, ack notifiers are not called, which typially hangs the guest. The fix is to track and check irq type. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-28KVM: protect concurrent make_all_cpus_requestMarcelo Tosatti
make_all_cpus_request contains a race condition which can trigger false request completed status, as follows: CPU0 CPU1 if (test_and_set_bit(req,&vcpu->requests)) .... if (test_and_set_bit(req,&vcpu->requests)) .. return proceed to smp_call_function_many(wait=1) Use a spinlock to serialize concurrent CPUs. Cc: stable@kernel.org Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-28KVM: Fix dirty bit tracking for slots with large pagesIzik Eidus
When slot is already allocated and being asked to be tracked we need to break the large pages. This code flush the mmu when someone ask a slot to start dirty bit tracking. Cc: stable@kernel.org Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-11kvm: remove the duplicated cpumask_clearYinghai Lu
zalloc_cpumask_var already cleared it. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-10KVM: Prevent overflow in largepages calculationAvi Kivity
If userspace specifies a memory slot that is larger than 8 petabytes, it could overflow the largepages variable. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Disable large pages on misaligned memory slotsAvi Kivity
If a slots guest physical address and host virtual address unequal (mod large page size), then we would erronously try to back guest large pages with host large pages. Detect this misalignment and diable large page support for the trouble slot. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: take mmu_lock when updating a deleted slotMarcelo Tosatti
kvm_handle_hva relies on mmu_lock protection to safely access the memslot structures. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: protect assigned dev workqueue, int handler and irq ackerMarcelo Tosatti
kvm_assigned_dev_ack_irq is vulnerable to a race condition with the interrupt handler function. It does: if (dev->host_irq_disabled) { enable_irq(dev->host_irq); dev->host_irq_disabled = false; } If an interrupt triggers before the host->dev_irq_disabled assignment, it will disable the interrupt and set dev->host_irq_disabled to true. On return to kvm_assigned_dev_ack_irq, dev->host_irq_disabled is set to false, and the next kvm_assigned_dev_ack_irq call will fail to reenable it. Other than that, having the interrupt handler and work handlers run in parallel sounds like asking for trouble (could not spot any obvious problem, but better not have to, its fragile). CC: sheng.yang@intel.com Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Trivial format fix in setup_routing_entry()Chris Wright
Remove extra tab. Signed-off-by: Chris Wright <chrisw@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: VMX: Disable VMX when system shutdownSheng Yang
Intel TXT(Trusted Execution Technology) required VMX off for all cpu to work when system shutdown. CC: Joseph Cihula <joseph.cihula@intel.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Enable snooping control for supported hardwareSheng Yang
Memory aliases with different memory type is a problem for guest. For the guest without assigned device, the memory type of guest memory would always been the same as host(WB); but for the assigned device, some part of memory may be used as DMA and then set to uncacheable memory type(UC/WC), which would be a conflict of host memory type then be a potential issue. Snooping control can guarantee the cache correctness of memory go through the DMA engine of VT-d. [avi: fix build on ia64] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Fix interrupt unhalting a vcpu when it shouldn'tGleb Natapov
kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking if interrupt window is actually opened. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Timer event should not unconditionally unhalt vcpu.Gleb Natapov
Currently timer events are processed before entering guest mode. Move it to main vcpu event loop since timer events should be processed even while vcpu is halted. Timer may cause interrupt/nmi to be injected and only then vcpu will be unhalted. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: MMU: do not free active mmu pages in free_mmu_pages()Gleb Natapov
free_mmu_pages() should only undo what alloc_mmu_pages() does. Free mmu pages from the generic VM destruction function, kvm_destroy_vm(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Device assignment framework reworkSheng Yang
After discussion with Marcelo, we decided to rework device assignment framework together. The old problems are kernel logic is unnecessary complex. So Marcelo suggest to split it into a more elegant way: 1. Split host IRQ assign and guest IRQ assign. And userspace determine the combination. Also discard msi2intx parameter, userspace can specific KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to enable MSI to INTx convertion. 2. Split assign IRQ and deassign IRQ. Import two new ioctls: KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ. This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the old interface). [avi: replace homemade bitcount() by hweight_long()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: APIC: get rid of deliver_bitmaskGleb Natapov
Deliver interrupt during destination matching loop. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10KVM: change the way how lowest priority vcpu is calculatedGleb Natapov
The new way does not require additional loop over vcpus to calculate the one with lowest priority as one is chosen during delivery bitmap construction. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10KVM: consolidate ioapic/ipi interrupt delivery logicGleb Natapov
Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in apic_send_ipi() to figure out ipi destination instead of reimplementing the logic. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10KVM: ioapic/msi interrupt delivery consolidationGleb Natapov
ioapic_deliver() and kvm_set_msi() have code duplication. Move the code into ioapic_deliver_entry() function and call it from both places. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10KVM: APIC: kvm_apic_set_irq deliver all kinds of interruptsGleb Natapov
Get rid of ioapic_inj_irq() and ioapic_inj_nmi() functions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>