aboutsummaryrefslogtreecommitdiff
path: root/drivers/kvm/mmu.c
AgeCommit message (Collapse)Author
2007-03-18KVM: MMU: Fix host memory corruption on i386 with >= 4GB ramAvi Kivity
PAGE_MASK is an unsigned long, so using it to mask physical addresses on i386 (which are 64-bit wide) leads to truncation. This can result in page->private of unrelated memory pages being modified, with disasterous results. Fix by not using PAGE_MASK for physical addresses; instead calculate the correct value directly from PAGE_SIZE. Also fix a similar BUG_ON(). Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-18KVM: MMU: Fix guest writes to nonpae pdeAvi Kivity
KVM shadow page tables are always in pae mode, regardless of the guest setting. This means that a guest pde (mapping 4MB of memory) is mapped to two shadow pdes (mapping 2MB each). When the guest writes to a pte or pde, we intercept the write and emulate it. We also remove any shadowed mappings corresponding to the write. Since the mmu did not account for the doubling in the number of pdes, it removed the wrong entry, resulting in a mismatch between shadow page tables and guest page tables, followed shortly by guest memory corruption. This patch fixes the problem by detecting the special case of writing to a non-pae pde and adjusting the address and number of shadow pdes zapped accordingly. Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Use page_private()/set_page_private() apisMarkus Rechberger
Besides using an established api, this allows using kvm in older kernels. Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-02-09[PATCH] misc NULL noise removalAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] KVM: MMU: Report nx faults to the guestAvi Kivity
With the recent guest page fault change, we perform access checks on our own instead of relying on the cpu. This means we have to perform the nx checks as well. Software like the google toolbar on windows appears to rely on this somehow. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] KVM: MMU: Perform access checks in walk_addr()Avi Kivity
Check pte permission bits in walk_addr(), instead of scattering the checks all over the code. This has the following benefits: 1. We no longer set the accessed bit for accessed which fail permission checks. 2. Setting the accessed bit is simplified. 3. Under some circumstances, we used to pretend a page fault was fixed when it would actually fail the access checks. This caused an unnecessary vmexit. 4. The error code for guest page faults is now correct. The fix helps netbsd further along booting, and allows kvm to pass the new mmu testsuite. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-05[PATCH] KVM: Simplify mmu_alloc_roots()Ingo Molnar
Small optimization/cleanup: page == page_header(page->page_hpa) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: Avoid oom on cr3 switchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: add audit code to check mappings, etc are correctAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Flush guest tlb when reducing permissions on a pteAvi Kivity
If we reduce permissions on a pte, we must flush the cached copy of the pte from the guest's tlb. This is implemented at the moment by flushing the entire guest tlb, and can be improved by flushing just the relevant virtual address, if it is known. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Detect oom conditions and propagate error to userspaceAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Replace atomic allocations by preallocated objectsAvi Kivity
The mmu sometimes needs memory for reverse mapping and parent pte chains. however, we can't allocate from within the mmu because of the atomic context. So, move the allocations to a central place that can be executed before the main mmu machinery, where we can bail out on failure before any damage is done. (error handling is deffered for now, but the basic structure is there) Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Free pages on kvm destructionAvi Kivity
Because mmu pages have attached rmap and parent pte chain structures, we need to zap them before freeing so the attached structures are freed. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Never free a shadow page actively serving as a rootAvi Kivity
We always need cr3 to point to something valid, so if we detect that we're freeing a root page, simply push it back to the top of the active list. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Page table write flood protectionAvi Kivity
In fork() (or when we protect a page that is no longer a page table), we can experience floods of writes to a page, which have to be emulated. This is expensive. So, if we detect such a flood, zap the page so subsequent writes can proceed natively. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: If an empty shadow page is not empty, report more infoAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Ensure freed shadow pages are cleanAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: <ove is_empty_shadow_page() above kvm_mmu_free_page()Avi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Handle misaligned accesses to write protected guest page ↵Avi Kivity
tables A misaligned access affects two shadow ptes instead of just one. Since a misaligned access is unlikely to occur on a real page table, just zap the page out of existence, avoiding further trouble. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Remove release_pt_page_64()Avi Kivity
Unused. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Remove invlpg interceptionAvi Kivity
Since we write protect shadowed guest page tables, there is no need to trap page invalidations (the guest will always change the mapping before issuing the invlpg instruction). Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: oom handlingAvi Kivity
When beginning to process a page fault, make sure we have enough shadow pages available to service the fault. If not, free some pages. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: kvm_mmu_put_page() only removes one link to the pageAvi Kivity
... and so must not free it unconditionally. Move the freeing to kvm_mmu_zap_page(). Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Implement child shadow unlinkingAvi Kivity
When removing a page table, we must maintain the parent_pte field all child shadow page tables. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: If emulating an instruction fails, try unprotecting the pageAvi Kivity
A page table may have been recycled into a regular page, and so any instruction can be executed on it. Unprotect the page and let the cpu do its thing. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Zap shadow page table entries on writes to guest page tablesAvi Kivity
Iterate over all shadow pages which correspond to a the given guest page table and remove the mappings. A subsequent page fault will reestablish the new mapping. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Support emulated writes into RAMAvi Kivity
As the mmu write protects guest page table, we emulate those writes. Since they are not mmio, there is no need to go to userspace to perform them. So, perform the writes in the kernel if possible, and notify the mmu about them so it can take the approriate action. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Let the walker extract the target page gfn from the pteAvi Kivity
This fixes a problem where set_pte_common() looked for shadowed pages based on the page directory gfn (a huge page) instead of the actual gfn being mapped. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Write protect guest pages when a shadow is created for themAvi Kivity
When we cache a guest page table into a shadow page table, we need to prevent further access to that page by the guest, as that would render the cache incoherent. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Shadow page table cachingAvi Kivity
Define a hashtable for caching shadow page tables. Look up the cache on context switch (cr3 change) or during page faults. The key to the cache is a combination of - the guest page table frame number - the number of paging levels in the guest * we can cache real mode, 32-bit mode, pae, and long mode page tables simultaneously. this is useful for smp bootup. - the guest page table table * some kernels use a page as both a page table and a page directory. this allows multiple shadow pages to exist for that page, one per level - the "quadrant" * 32-bit mode page tables span 4MB, whereas a shadow page table spans 2MB. similarly, a 32-bit page directory spans 4GB, while a shadow page directory spans 1GB. the quadrant allows caching up to 4 shadow page tables for one guest page in one level. - a "metaphysical" bit * for real mode, and for pse pages, there is no guest page table, so set the bit to avoid write protecting the page. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Make kvm_mmu_alloc_page() return a kvm_mmu_page pointerAvi Kivity
This allows further manipulation on the shadow page table. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MU: Special treatment for shadow pae root pagesAvi Kivity
Since we're not going to cache the pae-mode shadow root pages, allocate a single pae shadow that will hold the four lower-level pages, which will act as roots. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] KVM: MMU: Implement simple reverse mappingAvi Kivity
Keep in each host page frame's page->private a pointer to the shadow pte which maps it. If there are multiple shadow ptes mapping the page, set bit 0 of page->private, and use the rest as a pointer to a linked list of all such mappings. Reverse mappings are needed because we when we cache shadow page tables, we must protect the guest page tables from being modified by the guest, as that would invalidate the cached ptes. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30[PATCH] kvm: fix GFP_KERNEL allocation in atomic section in ↵Ingo Molnar
kvm_dev_ioctl_create_vcpu() fix an GFP_KERNEL allocation in atomic section: kvm_dev_ioctl_create_vcpu() called kvm_mmu_init(), which calls alloc_pages(), while holding the vcpu. The fix is to set up the MMU state in two phases: kvm_mmu_create() and kvm_mmu_setup(). (NOTE: free_vcpus does an kvm_mmu_destroy() call so there's no need for any extra teardown branch on allocation/init failure here.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30[PATCH] KVM: Simplify is_long_mode()Avi Kivity
Instead of doing tricky stuff with the arch dependent virtualization registers, take a peek at the guest's efer. This simlifies some code, and fixes some confusion in the mmu branch. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22[PATCH] KVM: Use more traditional error handling in kvm_mmu_init()Avi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13[PATCH] KVM: MMU: Ignore pcd, pwt, and pat bits on ptesAvi Kivity
The pcd, pwt, and pat bits on page table entries affect the cpu cache. Since the cache is a host resource, the guest should not be able to control it. Moreover, the meaning of these bits changes depending on whether pat is enabled or not. So, force these bits to zero on shadow page table entries at all times. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] kvm: userspace interfaceAvi Kivity
web site: http://kvm.sourceforge.net mailing list: kvm-devel@lists.sourceforge.net (http://lists.sourceforge.net/lists/listinfo/kvm-devel) The following patchset adds a driver for Intel's hardware virtualization extensions to the x86 architecture. The driver adds a character device (/dev/kvm) that exposes the virtualization capabilities to userspace. Using this driver, a process can run a virtual machine (a "guest") in a fully virtualized PC containing its own virtual hard disks, network adapters, and display. Using this driver, one can start multiple virtual machines on a host. Each virtual machine is a process on the host; a virtual cpu is a thread in that process. kill(1), nice(1), top(1) work as expected. In effect, the driver adds a third execution mode to the existing two: we now have kernel mode, user mode, and guest mode. Guest mode has its own address space mapping guest physical memory (which is accessible to user mode by mmap()ing /dev/kvm). Guest mode has no access to any I/O devices; any such access is intercepted and directed to user mode for emulation. The driver supports i386 and x86_64 hosts and guests. All combinations are allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae and non-pae paging modes are supported. SMP hosts and UP guests are supported. At the moment only Intel hardware is supported, but AMD virtualization support is being worked on. Performance currently is non-stellar due to the naive implementation of the mmu virtualization, which throws away most of the shadow page table entries every context switch. We plan to address this in two ways: - cache shadow page tables across tlb flushes - wait until AMD and Intel release processors with nested page tables Currently a virtual desktop is responsive but consumes a lot of CPU. Under Windows I tried playing pinball and watching a few flash movies; with a recent CPU one can hardly feel the virtualization. Linux/X is slower, probably due to X being in a separate process. In addition to the driver, you need a slightly modified qemu to provide I/O device emulation and the BIOS. Caveats (akpm: might no longer be true): - The Windows install currently bluescreens due to a problem with the virtual APIC. We are working on a fix. A temporary workaround is to use an existing image or install through qemu - Windows 64-bit does not work. That's also true for qemu, so it's probably a problem with the device model. [bero@arklinux.org: build fix] [simon.kagstrom@bth.se: build fix, other fixes] [uril@qumranet.com: KVM: Expose interrupt bitmap] [akpm@osdl.org: i386 build fix] [mingo@elte.hu: i386 fixes] [rdreier@cisco.com: add log levels to all printks] [randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings] [anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support] Signed-off-by: Yaniv Kamay <yaniv@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Simon Kagstrom <simon.kagstrom@bth.se> Cc: Bernhard Rosenkraenzer <bero@arklinux.org> Signed-off-by: Uri Lublin <uril@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Anthony Liguori <anthony@codemonkey.ws> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>