aboutsummaryrefslogtreecommitdiff
path: root/drivers/kvm/kvm.h
AgeCommit message (Collapse)Author
2007-10-13KVM: Emulate hlt in the kernelEddie Dong
By sleeping in the kernel when hlt is executed, we simplify the in-kernel guest interrupt path considerably. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: In-kernel I/O APIC modelEddie Dong
This allows in-kernel host-side device drivers to raise guest interrupts without going to userspace. [avi: fix level-triggered interrupt redelivery on eoi] [avi: add missing #include] [avi: avoid redelivery of edge-triggered interrupt] [avi: implement polarity] [avi: don't deliver edge-triggered interrupts when unmasking] [avi: fix host oops on invalid guest access] Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Emulate local APIC in kernelEddie Dong
Because lightweight exits (exits which don't involve userspace) are many times faster than heavyweight exits, it makes sense to emulate high usage devices in the kernel. The local APIC is one such device, especially for Windows and for SMP, so we add an APIC model to kvm. It also allows in-kernel host-side drivers to inject interrupts without going through userspace. [compile fix on i386 from Jindrich Makovicka] Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Define and use cr8 access functionsEddie Dong
This patch is to wrap APIC base register and CR8 operation which can provide a unique API for user level irqchip and kernel irqchip. This is a preparation of merging lapic/ioapic patch. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Add support for in-kernel PIC emulationEddie Dong
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Support more memory slotsIzik Eidus
Needed for mapping memory at 4GB. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Clean up kvm_setup_pio()Laurent Vivier
Split kvm_setup_pio() into two functions, one to setup in/out pio (kvm_emulate_pio()) and one to setup ins/outs pio (kvm_emulate_pio_string()). Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Add and use pr_unimpl for standard formatting of unimplemented featuresRusty Russell
All guest-invokable printks should be ratelimited to prevent malicious guests from flooding logs. This is a start. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: VMX: Add cpu consistency checkYang, Sheng
All the physical CPUs on the board should support the same VMX feature set. Add check_processor_compatibility to kvm_arch_ops for the consistency check. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Use alignment properties of vcpu to simplify FPU opsRusty Russell
Now we use a kmem cache for allocating vcpus, we can get the 16-byte alignment required by fxsave & fxrstor instructions, and avoid manually aligning the buffer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Use kmem cache for allocating vcpusRusty Russell
Avi wants the allocations of vcpus centralized again. The easiest way is to add a "size" arg to kvm_init_arch, and expose the thus-prepared cache to the modules. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Remove kvm_{read,write}_guest()Laurent Vivier
... in favor of the more general emulator_{read,write}_*. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Convert vm lock to a mutexShaohua Li
This allows the kvm mmu to perform sleepy operations, such as memory allocation. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Use the scheduler preemption notifiers to make kvm preemptibleAvi Kivity
Current kvm disables preemption while the new virtualization registers are in use. This of course is not very good for latency sensitive workloads (one use of virtualization is to offload user interface and other latency insensitive stuff to a container, so that it is easier to analyze the remaining workload). This patch re-enables preemption for kvm; preemption is now only disabled when switching the registers in and out, and during the switch to guest mode and back. Contains fixes from Shaohua Li <shaohua.li@intel.com>. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Dynamically allocate vcpusRusty Russell
This patch converts the vcpus array in "struct kvm" to a pointer array, and changes the "vcpu_create" and "vcpu_setup" hooks into one "vcpu_create" call which does the allocation and initialization of the vcpu (calling back into the kvm_vcpu_init core helper). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Remove arch specific components from the general codeGregory Haskins
struct kvm_vcpu has vmx-specific members; remove them to a private structure. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Move gfn_to_page out of kmap/unmap pairsShaohua Li
gfn_to_page might sleep with swap support. Move it out of the kmap calls. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Trivial: Use standard BITMAP macros, open-code userspace-exposed headerRusty Russell
Creating one's own BITMAP macro seems suboptimal: if we use manual arithmetic in the one place exposed to userspace, we can use standard macros elsewhere. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Use standard CR4 flags, tighten checkingRusty Russell
On this machine (Intel), writing to the CR4 bits 0x00000800 and 0x00001000 cause a GPF. The Intel manual is a little unclear, but AFIACT they're reserved, too. Also fix spelling of CR4_RESEVED_BITS. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Use standard CR3 flags, tighten checkingRusty Russell
The kernel now has asm/cpu-features.h: use those macros instead of inventing our own. Also spell out definition of CR3_RESEVED_BITS, fix spelling and tighten it for the non-PAE case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Trivial: Use standard CR0 flags macros from asm/cpu-features.hRusty Russell
The kernel now has asm/cpu-features.h: use those macros instead of inventing our own. Also spell out definition of CR0_RESEVED_BITS (no code change) and fix typo. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: SMP: Add vcpu_id field in struct vcpuQing He
This patch adds a `vcpu_id' field in `struct vcpu', so we can differentiate BSP and APs without pointer comparison or arithmetic. Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-14KVM: MMU: Fix rare oops on guest context switchAvi Kivity
A guest context switch to an uncached cr3 can require allocation of shadow pages, but we only recycle shadow pages in kvm_mmu_page_fault(). Move shadow page recycling to mmu_topup_memory_caches(), which is called from both the page fault handler and from guest cr3 reload. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-20KVM: x86 emulator: implement rdmsr and wrmsrAvi Kivity
Allow real-mode emulation of rdmsr and wrmsr. This allows smp Windows to boot, presumably for its sipi trampoline. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-20KVM: Fix memory slot management functions for guest smpAvi Kivity
The memory slot management functions were oriented against vcpu 0, where they should be kvm-wide. This causes hangs starting X on guest smp. Fix by making the functions (and resultant tail in the mmu) non-vcpu-specific. Unfortunately this reduces the efficiency of the mmu object cache a bit. We may have to revisit this later. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-20KVM: MMU: Store nx bit for large page shadowsAvi Kivity
We need to distinguish between large page shadows which have the nx bit set and those which don't. The problem shows up when booting a newer smp Linux kernel, where the trampoline page (which is in real mode, which uses the same shadow pages as large pages) is using the same mapping as a kernel data page, which is mapped using nx, causing kvm to spin on that page. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Add support for in-kernel pio handlersEddie Dong
Useful for the PIC and PIT. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Adds support for in-kernel mmio handlersGregory Haskins
Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Flush remote tlbs when reducing shadow pte permissionsAvi Kivity
When a vcpu causes a shadow tlb entry to have reduced permissions, it must also clear the tlb on remote vcpus. We do that by: - setting a bit on the vcpu that requests a tlb flush before the next entry - if the vcpu is currently executing, we send an ipi to make sure it exits before we continue Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Keep an upper bound of initialized vcpusAvi Kivity
That way, we don't need to loop for KVM_MAX_VCPUS for a single vcpu vm. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Emulate hlt on real mode for IntelAvi Kivity
This has two use cases: the bios can't boot from disk, and guest smp bootstrap. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Move duplicate halt handling code into kvm_main.cAvi Kivity
Will soon have a thid user. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Enable guest smpAvi Kivity
As we don't support guest tlb shootdown yet, this is only reliable for real-mode guests. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Lazy guest cr3 switchingAvi Kivity
Switch guest paging context may require us to allocate memory, which might fail. Instead of wiring up error paths everywhere, make context switching lazy and actually do the switch before the next guest entry, where we can return an error if allocation fails. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: MMU: Use slab caches for shadow pages and their headersAvi Kivity
Use slab caches instead of a simple custom list. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Fix includesMarkus Rechberger
KVM compilation fails for some .configs. This fixes it. Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexitEddie Dong
MSR_EFER.LME/LMA bits are automatically save/restored by VMX hardware, KVM only needs to save NX/SCE bits at time of heavy weight VM Exit. But clearing NX bits in host envirnment may cause system hang if the host page table is using EXB bits, thus we leave NX bits as it is. If Host NX=1 and guest NX=0, we can do guest page table EXB bits check before inserting a shadow pte (though no guest is expecting to see this kind of gp fault). If host NX=0, we present guest no Execute-Disable feature to guest, thus no host NX=0, guest NX=1 combination. This patch reduces raw vmexit time by ~27%. Me: fix compile warnings on i386. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: VMX: Avoid saving and restoring msrs on lightweight vmexitEddie Dong
In a lightweight exit (where we exit and reenter the guest without scheduling or exiting to userspace in between), we don't need various msrs on the host, and avoiding shuffling them around reduces raw exit time by 8%. i386 compile fix by Daniel Hecken <dh@bahntechnik.de>. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: MMU: Store shadow page tables as kernel virtual addresses, not physicalAvi Kivity
Simpifies things a bit. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Set cr0.mp for guestsAvi Kivity
This allows fwait instructions to be trapped when the guest fpu is not loaded. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Consolidate guest fpu activation and deactivationAvi Kivity
Easier to keep track of where the fpu is this way. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Fix potential guest state leak into hostAvi Kivity
The lightweight vmexit path avoids saving and reloading certain host state. However in certain cases lightweight vmexit handling can schedule() which requires reloading the host state. So we store the host state in the vcpu structure, and reloaded it if we relinquish the vcpu. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Increase mmu shadow cache to 1024 pagesAvi Kivity
This improves kbuild times by about 10%, bringing it within a respectable 25% of native. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Unify kvm_mmu_pre_write() and kvm_mmu_post_write()Avi Kivity
Instead of calling two functions and repeating expensive checks, call one function and provide it with before/after information. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16KVM: Avoid saving and restoring some host CPU state on lightweight vmexitAvi Kivity
Many msrs and the like will only be used by the host if we schedule() or return to userspace. Therefore, we avoid saving them if we handle the exit within the kernel, and if a reschedule is not requested. Based on a patch from Eddie Dong <eddie.dong@intel.com> with a couple of fixes by me. Signed-off-by: Yaozu(Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-06-15KVM: Prevent guest fpu state from leaking into the hostAvi Kivity
The lazy fpu changes did not take into account that some vmexit handlers can sleep. Move loading the guest state into the inner loop so that it can be reloaded if necessary, and move loading the host state into vmx_vcpu_put() so it can be performed whenever we relinquish the vcpu. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-21Detach sched.h from mm.hAlexey Dobriyan
First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-03KVM: Remove extraneous guest entry on mmio readAvi Kivity
When emulating an mmio read, we actually emulate twice: once to determine the physical address of the mmio, and, after we've exited to userspace to get the mmio value, we emulate again to place the value in the result register and update any flags. But we don't really need to enter the guest again for that, only to take an immediate vmexit. So, if we detect that we're doing an mmio read, emulate a single instruction before entering the guest again. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: VMX: Properly shadow the CR0 register in the vcpu structAnthony Liguori
Set all of the host mask bits for CR0 so that we can maintain a proper shadow of CR0. This exposes CR0.TS, paving the way for lazy fpu handling. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Lazy FPU support for SVMAnthony Liguori
Avoid saving and restoring the guest fpu state on every exit. This shaves ~100 cycles off the guest/host switch. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>