aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)Author
2010-03-01KVM: modify memslots layout in struct kvmMarcelo Tosatti
Have a pointer to an allocated region inside struct kvm. [alex: fix ppc book 3s] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: Add KVM_MMIO kconfig itemAvi Kivity
s390 doesn't have mmio, this will simplify ifdefing it out. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: SVM: Adjust tsc_offset only if tsc_unstableJoerg Roedel
The tsc_offset adjustment in svm_vcpu_load is executed unconditionally even if Linux considers the host tsc as stable. This causes a Linux guest detecting an unstable tsc in any case. This patch removes the tsc_offset adjustment if the host tsc is stable. The guest will now get the benefit of a stable tsc too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: VMX: Add instruction rdtscp support for guestSheng Yang
Before enabling, execution of "rdtscp" in guest would result in #UD. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: Add cpuid_update() callback to kvm_x86_opsSheng Yang
Sometime, we need to adjust some state in order to reflect guest CPUID setting, e.g. if we don't expose rdtscp to guest, we won't want to enable it on hardware. cpuid_update() is introduced for this purpose. Also export kvm_find_cpuid_entry() for later use. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: Extended shared_msr_global to per CPUSheng Yang
shared_msr_global saved host value of relevant MSRs, but it have an assumption that all MSRs it tracked shared the value across the different CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX. Extend it to per CPU to provide the support of MSR_TSC_AUX, and more alike MSRs. Notice now the shared_msr_global still have one assumption: it can only deal with the MSRs that won't change in host after KVM module loaded. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: VMX: Remove redundant variableSheng Yang
It's no longer necessary. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: VMX: Fold ept_update_paging_mode_cr4() into its callerAvi Kivity
ept_update_paging_mode_cr4() accesses vcpu->arch.cr4 directly, which usually needs to be accessed via kvm_read_cr4(). In this case, we can't, since cr4 is in the process of being updated. Instead of adding inane comments, fold the function into its caller (vmx_set_cr4), so it can use the not-yet-committed cr4 directly. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: VMX: When using ept, allow the guest to own cr4.pgeAvi Kivity
We make no use of cr4.pge if ept is enabled, but the guest does (to flush global mappings, as with vmap()), so give the guest ownership of this bit. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: VMX: Make guest cr4 mask more conservativeAvi Kivity
Instead of specifying the bits which we want to trap on, specify the bits which we allow the guest to change transparently. This is safer wrt future changes to cr4. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: Add accessor for reading cr4 (or some bits of cr4)Avi Kivity
Some bits of cr4 can be owned by the guest on vmx, so when we read them, we copy them to the vcpu structure. In preparation for making the set of guest-owned bits dynamic, use helpers to access these bits so we don't need to know where the bit resides. No changes to svm since all bits are host-owned there. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: VMX: Move some cr[04] related constants to vmx.cAvi Kivity
They have no place in common code. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: VMX: Trap and invalid MWAIT/MONITOR instructionSheng Yang
We don't support these instructions, but guest can execute them even if the feature('monitor') haven't been exposed in CPUID. So we would trap and inject a #UD if guest try this way. Cc: stable@kernel.org Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: MMU: Report spte not found in rmap before BUG()Avi Kivity
In the past we've had errors of single-bit in the other two cases; the printk() may confirm it for the third case (many->many). Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: x86: raise TSS exception for NULL CS and SS segmentsMarcelo Tosatti
Windows 2003 uses task switch to triple fault and reboot (the other exception being reserved pdptrs bits). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: x86: make double/triple fault promotion generic to all exceptionsEddie Dong
Move Double-Fault generation logic out of page fault exception generating function to cover more generic case. Signed-off-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-09KVM: PIT: control word is write-onlyMarcelo Tosatti
PIT control word (address 0x43) is write-only, reads are undefined. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-09kvmclock: count total_sleep_time when updating guest clockJason Wang
Current kvm wallclock does not consider the total_sleep_time which could cause wrong wallclock in guest after host suspend/resume. This patch solve this issue by counting total_sleep_time to get the correct host boot time. Cc: stable@kernel.org Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-25KVM: x86: Fix leak of free lapic date in kvm_arch_vcpu_init()Wei Yongjun
In function kvm_arch_vcpu_init(), if the memory malloc for vcpu->arch.mce_banks is fail, it does not free the memory of lapic date. This patch fixed it. Cc: stable@kernel.org Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-25KVM: x86: Fix probable memory leak of vcpu->arch.mce_banksWei Yongjun
vcpu->arch.mce_banks is malloc in kvm_arch_vcpu_init(), but never free in any place, this may cause memory leak. So this patch fixed to free it in kvm_arch_vcpu_uninit(). Cc: stable@kernel.org Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-25KVM: MMU: bail out pagewalk on kvm_read_guest errorMarcelo Tosatti
Exit the guest pagetable walk loop if reading gpte failed. Otherwise its possible to enter an endless loop processing the previous present pte. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-25KVM: x86: Fix host_mapping_level()Sheng Yang
When found a error hva, should not return PAGE_SIZE but the level... Also clean up the coding style of the following loop. Cc: stable@kernel.org Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-25KVM: Fix race between APIC TMR and IRRAvi Kivity
When we queue an interrupt to the local apic, we set the IRR before the TMR. The vcpu can pick up the IRR and inject the interrupt before setting the TMR, and perhaps even EOI it, causing incorrect behaviour. The race is really insignificant since it can only occur on the first interrupt (usually following interrupts will not change TMR), but it's better closed than open. Fixed by reordering setting the TMR vs IRR. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-27KVM: x86: Extend KVM_SET_VCPU_EVENTS with selective updatesJan Kiszka
User space may not want to overwrite asynchronously changing VCPU event states on write-back. So allow to skip nmi.pending and sipi_vector by setting corresponding bits in the flags field of kvm_vcpu_events. [avi: advertise the bits in KVM_GET_VCPU_EVENTS] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-27KVM: LAPIC: make sure IRR bitmap is scanned after vm loadMarcelo Tosatti
The vcpus are initialized with irr_pending set to false, but loading the LAPIC registers with pending IRR fails to reset the irr_pending variable. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-27KVM: MMU: remove prefault from invlpg handlerMarcelo Tosatti
The invlpg prefault optimization breaks Windows 2008 R2 occasionally. The visible effect is that the invlpg handler instantiates a pte which is, microseconds later, written with a different gfn by another vcpu. The OS could have other mechanisms to prevent a present translation from being used, which the hypervisor is unaware of. While the documentation states that the cpu is at liberty to prefetch tlb entries, it looks like this is not heeded, so remove tlb prefetch from invlpg. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c
2009-12-10x86: i8254.c: Add pr_fmt(fmt)Joe Perches
- Add pr_fmt(fmt) "pit: " fmt - Strip pit: prefixes from pr_debug Signed-off-by: Joe Perches <joe@perches.com> LKML-Reference: <bbd4de532f18bb7c11f64ba20d224c08291cb126.1260383912.git.joe@perches.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-08Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (84 commits) KVM: VMX: Fix comparison of guest efer with stale host value KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c KVM: Drop user return notifier when disabling virtualization on a cpu KVM: VMX: Disable unrestricted guest when EPT disabled KVM: x86 emulator: limit instructions to 15 bytes KVM: s390: Make psw available on all exits, not just a subset KVM: x86: Add KVM_GET/SET_VCPU_EVENTS KVM: VMX: Report unexpected simultaneous exceptions as internal errors KVM: Allow internal errors reported to userspace to carry extra data KVM: Reorder IOCTLs in main kvm.h KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG KVM: only clear irq_source_id if irqchip is present KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic KVM: x86: disallow multiple KVM_CREATE_IRQCHIP KVM: VMX: Remove vmx->msr_offset_efer KVM: MMU: update invlpg handler comment KVM: VMX: move CR3/PDPTR update to vmx_set_cr3 KVM: remove duplicated task_switch check KVM: powerpc: Fix BUILD_BUG_ON condition KVM: VMX: Use shared msr infrastructure ... Trivial conflicts due to new Kconfig options in arch/Kconfig and kernel/Makefile
2009-12-03KVM: VMX: Fix comparison of guest efer with stale host valueAvi Kivity
update_transition_efer() masks out some efer bits when deciding whether to switch the msr during guest entry; for example, NX is emulated using the mmu so we don't need to disable it, and LMA/LME are handled by the hardware. However, with shared msrs, the comparison is made against a stale value; at the time of the guest switch we may be running with another guest's efer. Fix by deferring the mask/compare to the actual point of guest entry. Noted by Marcelo. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Drop user return notifier when disabling virtualization on a cpuAvi Kivity
This way, we don't leave a dangling notifier on cpu hotunplug or module unload. In particular, module unload leaves the notifier pointing into freed memory. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03KVM: VMX: Disable unrestricted guest when EPT disabledSheng Yang
Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest supported processors. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03KVM: x86 emulator: limit instructions to 15 bytesAvi Kivity
While we are never normally passed an instruction that exceeds 15 bytes, smp games can cause us to attempt to interpret one, which will cause large latencies in non-preempt hosts. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: x86: Add KVM_GET/SET_VCPU_EVENTSJan Kiszka
This new IOCTL exports all yet user-invisible states related to exceptions, interrupts, and NMIs. Together with appropriate user space changes, this fixes sporadic problems of vmsave/restore, live migration and system reset. [avi: future-proof abi by adding a flags field] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: VMX: Report unexpected simultaneous exceptions as internal errorsAvi Kivity
These happen when we trap an exception when another exception is being delivered; we only expect these with MCEs and page faults. If something unexpected happens, things probably went south and we're better off reporting an internal error and freezing. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Allow internal errors reported to userspace to carry extra dataAvi Kivity
Usually userspace will freeze the guest so we can inspect it, but some internal state is not available. Add extra data to internal error reporting so we can expose it to the debugger. Extra data is specific to the suberror. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUGJan Kiszka
Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance, avoid triggering the WARN_ON in kvm_queue_exception if there is already an exception pending and reject such invalid requests. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapicMarcelo Tosatti
Otherwise kvm might attempt to dereference a NULL pointer. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: x86: disallow multiple KVM_CREATE_IRQCHIPMarcelo Tosatti
Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP. Also serialize multiple accesses with kvm->lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: VMX: Remove vmx->msr_offset_eferAvi Kivity
This variable is used to communicate between a caller and a callee; switch to a function argument instead. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: MMU: update invlpg handler commentMarcelo Tosatti
Large page translations are always synchronized (either in level 3 or level 2), so its not necessary to properly deal with them in the invlpg handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: VMX: move CR3/PDPTR update to vmx_set_cr3Marcelo Tosatti
GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from outside guest context. Similarly pdptrs are updated via load_pdptrs. Let kvm_set_cr3 perform the update, removing it from the vcpu_run fast path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Acked-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: remove duplicated task_switch checkGleb Natapov
Probably introduced by a bad merge. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: VMX: Use shared msr infrastructureAvi Kivity
Instead of reloading syscall MSRs on every preemption, use the new shared msr infrastructure to reload them at the last possible minute (just before exit to userspace). Improves vcpu/idle/vcpu switches by about 2000 cycles (when EFER needs to be reloaded as well). [jan: fix slot index missing indirection] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: x86 shared msr infrastructureAvi Kivity
The various syscall-related MSRs are fairly expensive to switch. Currently we switch them on every vcpu preemption, which is far too often: - if we're switching to a kernel thread (idle task, threaded interrupt, kernel-mode virtio server (vhost-net), for example) and back, then there's no need to switch those MSRs since kernel threasd won't be exiting to userspace. - if we're switching to another guest running an identical OS, most likely those MSRs will have the same value, so there's little point in reloading them. - if we're running the same OS on the guest and host, the MSRs will have identical values and reloading is unnecessary. This patch uses the new user return notifiers to implement last-minute switching, and checks the msr values to avoid unnecessary reloading. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr areaAvi Kivity
Currently MSR_KERNEL_GS_BASE is saved and restored as part of the guest/host msr reloading. Since we wish to lazy-restore all the other msrs, save and reload MSR_KERNEL_GS_BASE explicitly instead of using the common code. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: SVM: init_vmcb(): remove redundant save->cr0 initializationEduardo Habkost
The svm_set_cr0() call will initialize save->cr0 properly even when npt is enabled, clearing the NW and CD bits as expected, so we don't need to initialize it manually for npt_enabled anymore. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: SVM: Reset cr0 properly on vcpu resetEduardo Habkost
svm_vcpu_reset() was not properly resetting the contents of the guest-visible cr0 register, causing the following issue: https://bugzilla.redhat.com/show_bug.cgi?id=525699 Without resetting cr0 properly, the vcpu was running the SIPI bootstrap routine with paging enabled, making the vcpu get a pagefault exception while trying to run it. Instead of setting vmcb->save.cr0 directly, the new code just resets kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0(). kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: VMX: Use macros instead of hex value on cr0 initializationEduardo Habkost
This should have no effect, it is just to make the code clearer. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: allow userspace to adjust kvmclock offsetGlauber Costa
When we migrate a kvm guest that uses pvclock between two hosts, we may suffer a large skew. This is because there can be significant differences between the monotonic clock of the hosts involved. When a new host with a much larger monotonic time starts running the guest, the view of time will be significantly impacted. Situation is much worse when we do the opposite, and migrate to a host with a smaller monotonic clock. This proposed ioctl will allow userspace to inform us what is the monotonic clock value in the source host, so we can keep the time skew short, and more importantly, never goes backwards. Userspace may also need to trigger the current data, since from the first migration onwards, it won't be reflected by a simple call to clock_gettime() anymore. [marcelo: future-proof abi with a flags field] [jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it] Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>