aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)Author
2008-04-27KVM: SVM: force a new asid when initializing the vmcbAvi Kivity
Shutdown interception clears the vmcb, leaving the asid at zero (which is illegal. so force a new asid on vmcb initialization. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: fix kvm_vcpu_kick vs __vcpu_run raceMarcelo Tosatti
There is a window open between testing of pending IRQ's and assignment of guest_mode in __vcpu_run. Injection of IRQ's can race with __vcpu_run as follows: CPU0 CPU1 kvm_x86_ops->run() vcpu->guest_mode = 0 SET_IRQ_LINE ioctl .. kvm_x86_ops->inject_pending_irq kvm_cpu_has_interrupt() apic_test_and_set_irr() kvm_vcpu_kick if (vcpu->guest_mode) send_ipi() vcpu->guest_mode = 1 So move guest_mode=1 assignment before ->inject_pending_irq, and make sure that it won't reorder after it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: add ioctls to save/store mpstateMarcelo Tosatti
So userspace can save/restore the mpstate during migration. [avi: export the #define constants describing the value] [christian: add s390 stubs] [avi: ditto for ia64] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*Avi Kivity
We wish to export it to userspace, so move it into the kvm namespace. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: hlt emulation should take in-kernel APIC/PIT timers into accountMarcelo Tosatti
Timers that fire between guest hlt and vcpu_block's add_wait_queue() are ignored, possibly resulting in hangs. Also make sure that atomic_inc and waitqueue_active tests happen in the specified order, otherwise the following race is open: CPU0 CPU1 if (waitqueue_active(wq)) add_wait_queue() if (!atomic_read(pit_timer->pending)) schedule() atomic_inc(pit_timer->pending) Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: SVM: do not intercept task switch with NPTJoerg Roedel
When KVM uses NPT there is no reason to intercept task switches. This patch removes the intercept for it in that case. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add kvm trace userspace interfaceFeng(Eric) Liu
This interface allows user a space application to read the trace of kvm related events through relayfs. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add trace markersFeng (Eric) Liu
Trace markers allow userspace to trace execution of a virtual machine in order to monitor its performance. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: SVM: add intercept for machine check exceptionJoerg Roedel
To properly forward a MCE occured while the guest is running to the host, we have to intercept this exception and call the host handler by hand. This is implemented by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: SVM: align shadow CR4.MCE with hostJoerg Roedel
This patch aligns the host version of the CR4.MCE bit with the CR4 active in the guest. This is necessary to get MCE exceptions when the guest is running. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: SVM: indent svm_set_cr4 with tabs instead of spacesJoerg Roedel
The svm_set_cr4 function is indented with spaces. This patch replaces them with tabs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: Don't assume struct page for x86Anthony Liguori
This patch introduces a gfn_to_pfn() function and corresponding functions like kvm_release_pfn_dirty(). Using these new functions, we can modify the x86 MMU to no longer assume that it can always get a struct page for any given gfn. We don't want to eliminate gfn_to_page() entirely because a number of places assume they can do gfn_to_page() and then kmap() the results. When we support IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will succeed. This does not implement support for avoiding reference counting for reserved RAM or for IO memory. However, it should make those things pretty straight forward. Since we're only introducing new common symbols, I don't think it will break the non-x86 architectures but I haven't tested those. I've tested Intel, AMD, NPT, and hugetlbfs with Windows and Linux guests. [avi: fix overflow when shifting left pfns by adding casts] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: prepopulate guest pages after write-protectingMarcelo Tosatti
Zdenek reported a bug where a looping "dmsetup status" eventually hangs on SMP guests. The problem is that kvm_mmu_get_page() prepopulates the shadow MMU before write protecting the guest page tables. By doing so, it leaves a window open where the guest can mark a pte as present while the host has shadow cached such pte as "notrap". Accesses to such address will fault in the guest without the host having a chance to fix the situation. Fix by moving the write protection before the pte prefetch. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: Only mark_page_accessed() if the page was accessed by the guestAvi Kivity
If the accessed bit is not set, the guest has never accessed this page (at least through this spte), so there's no need to mark the page accessed. This provides more accurate data for the eviction algortithm. Noted by Andrea Arcangeli. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Free apic access page on vm destructionAvi Kivity
Noticed by Marcelo Tosatti. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: allow the vm to shrink the kvm mmu shadow cachesIzik Eidus
Allow the Linux memory manager to reclaim memory in the kvm shadow cache. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: unify slots_lock usageMarcelo Tosatti
Unify slots_lock acquision around vcpu_run(). This is simpler and less error-prone. Also fix some callsites that were not grabbing the lock properly. [avi: drop slots_lock while in guest mode to avoid holding the lock for indefinite periods] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: VMX: Enable MSR Bitmap featureSheng Yang
MSR Bitmap controls whether the accessing of an MSR causes VM Exit. Eliminating exits on automatically saved and restored MSRs yields a small performance gain. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86: hardware task switching supportIzik Eidus
This emulates the x86 hardware task switch mechanism in software, as it is unsupported by either vmx or svm. It allows operating systems which use it, like freedos, to run as kvm guests. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86: add functions to get the cpl of vcpuIzik Eidus
Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: VMX: Add module option to disable flexpriorityAvi Kivity
Useful for debugging. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: no longer EXPERIMENTALAvi Kivity
Long overdue. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: Introduce and use spte_to_page()Avi Kivity
Encapsulate the pte mask'n'shift in a function. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: fix dirty bit setting when removing write permissionsIzik Eidus
When mmu_set_spte() checks if a page related to spte should be release as dirty or clean, it check if the shadow pte was writeble, but in case rmap_write_protect() is called called it is possible for shadow ptes that were writeble to become readonly and therefor mmu_set_spte will release the pages as clean. This patch fix this issue by marking the page as dirty inside rmap_write_protect(). Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: Set the accessed bit on non-speculative shadow ptesAvi Kivity
If we populate a shadow pte due to a fault (and not speculatively due to a pte write) then we can set the accessed bit on it, as we know it will be set immediately on the next guest instruction. This saves a read-modify-write operation. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: hypercall based pte updates and TLB flushesMarcelo Tosatti
Hypercall based pte updates are faster than faults, and also allow use of the lazy MMU mode to batch operations. Don't report the feature if two dimensional paging is enabled. [avi: - one mmu_op hypercall instead of one per op - allow 64-bit gpa on hypercall - don't pass host errors (-ENOMEM) to guest] [akpm: warning fix on i386] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Provide unlocked version of emulator_write_phys()Avi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: add basic paravirt supportMarcelo Tosatti
Add basic KVM paravirt support. Avoid vm-exits on IO delays. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add reset support for in kernel PITSheng Yang
Separate the reset part and prepare for reset support. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add save/restore supporting of in kernel PITSheng Yang
Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: In kernel PIT modelSheng Yang
The patch moves the PIT model from userspace to kernel, and increases the timer accuracy greatly. [marcelo: make last_injected_time per-guest] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Remove pointless desc_ptr #ifdefAvi Kivity
The desc_struct changes left an unnecessary #ifdef; remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: VMX: Don't adjust tsc offset forwardAvi Kivity
Most Intel hosts have a stable tsc, and playing with the offset only reduces accuracy. By limiting tsc offset adjustment only to forward updates, we effectively disable tsc offset adjustment on these hosts. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: replace remaining __FUNCTION__ occurancesHarvey Harrison
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: detect if VCPU triple faultsJoerg Roedel
In the current inject_page_fault path KVM only checks if there is another PF pending and injects a DF then. But it has to check for a pending DF too to detect a shutdown condition in the VCPU. If this is not detected the VCPU goes to a PF -> DF -> PF loop when it should triple fault. This patch detects this condition and handles it with an KVM_SHUTDOWN exit to userspace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Prefix control register accessors with kvm_ to avoid namespace pollutionAvi Kivity
Names like 'set_cr3()' look dangerously close to affecting the host. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: large page supportMarcelo Tosatti
Create large pages mappings if the guest PTE's are marked as such and the underlying memory is hugetlbfs backed. If the largepage contains write-protected pages, a large pte is not used. Gives a consistent 2% improvement for data copies on ram mounted filesystem, without NPT/EPT. Anthony measures a 4% improvement on 4-way kernbench, with NPT. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: ignore zapped root pagetablesMarcelo Tosatti
Mark zapped root pagetables as invalid and ignore such pages during lookup. This is a problem with the cr3-target feature, where a zapped root table fools the faulting code into creating a read-only mapping. The result is a lockup if the instruction can't be emulated. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Implement dummy values for MSR_PERF_STATUSAlexander Graf
Darwin relies on this and ceases to work without. Signed-off-by: Alexander Graf <alex@csgraf.de> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: sparse fixes for kvm/x86.cHarvey Harrison
In two case statements, use the ever popular 'i' instead of index: arch/x86/kvm/x86.c:1063:7: warning: symbol 'index' shadows an earlier one arch/x86/kvm/x86.c:1000:9: originally declared here arch/x86/kvm/x86.c:1079:7: warning: symbol 'index' shadows an earlier one arch/x86/kvm/x86.c:1000:9: originally declared here Make it static. arch/x86/kvm/x86.c:1945:24: warning: symbol 'emulate_ops' was not declared. Should it be static? Drop the return statements. arch/x86/kvm/x86.c:2878:2: warning: returning void-valued expression arch/x86/kvm/x86.c:2944:2: warning: returning void-valued expression Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: SVM: make iopm_base staticHarvey Harrison
Fixes sparse warning as well. arch/x86/kvm/svm.c:69:15: warning: symbol 'iopm_base' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86 emulator: fix sparse warnings in x86_emulate.cHarvey Harrison
Nesting __emulate_2op_nobyte inside__emulate_2op produces many shadowed variable warnings on the internal variable _tmp used by both macros. Change the outer macro to use __tmp. Avoids a sparse warning like the following at every call site of __emulate_2op arch/x86/kvm/x86_emulate.c:1091:3: warning: symbol '_tmp' shadows an earlier one arch/x86/kvm/x86_emulate.c:1091:3: originally declared here [18 more warnings suppressed] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add stat counter for hypercallsAmit Shah
Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Use x86's segment descriptor struct instead of private definitionAvi Kivity
The x86 desc_struct unification allows us to remove segment_descriptor.h. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add API for determining the number of supported memory slotsAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add API to retrieve the number of supported vcpus per vmAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86 emulator: make register_address_increment and JMP_REL static inlinesHarvey Harrison
Change jmp_rel() to a function as well. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86 emulator: make register_address, address_mask static inlinesHarvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86 emulator: add ad_mask static inlineHarvey Harrison
Replaces open-coded mask calculation in macros. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: paravirtualized clocksource: host partGlauber de Oliveira Costa
This is the host part of kvm clocksource implementation. As it does not include clockevents, it is a fairly simple implementation. We only have to register a per-vcpu area, and start writing to it periodically. The area is binary compatible with xen, as we use the same shadow_info structure. [marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME] [avi: save full value of the msr, even if enable bit is clear] [avi: clear previous value of time_page] Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>