aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kvm/mmu.c
AgeCommit message (Collapse)Author
2010-04-20KVM: MMU: fix kvm_mmu_zap_page() and its calling pathXiao Guangrong
This patch fix: - calculate zapped page number properly in mmu_zap_unsync_children() - calculate freeed page number properly kvm_mmu_change_mmu_pages() - if zapped children page it shoud restart hlist walking KVM-Stable-Tag. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-01KVM: x86 emulator: fix memory access during x86 emulationGleb Natapov
Currently when x86 emulator needs to access memory, page walk is done with broadest permission possible, so if emulated instruction was executed by userspace process it can still access kernel memory. Fix that by providing correct memory access to page walker during emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: MMU: Add tracepoint for guest page agingAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: VMX: emulate accessed bit for EPTRik van Riel
Currently KVM pretends that pages with EPT mappings never got accessed. This has some side effects in the VM, like swapping out actively used guest pages and needlessly breaking up actively used hugepages. We can avoid those very costly side effects by emulating the accessed bit for EPT PTEs, which should only be slightly costly because pages pass through page_referenced infrequently. TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young(). This seems to help prevent KVM guests from being swapped out when they should not on my system. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: Introduce kvm_host_page_sizeJoerg Roedel
This patch introduces a generic function to find out the host page size for a given gfn. This function is needed by the kvm iommu code. This patch also simplifies the x86 host_mapping_level function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: MMU: Remove some useless code from alloc_mmu_pages()Wei Yongjun
If we fail to alloc page for vcpu->arch.mmu.pae_root, call to free_mmu_pages() is unnecessary, which just do free the page malloc for vcpu->arch.mmu.pae_root. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: Rename vcpu->shadow_efer to eferAvi Kivity
None of the other registers have the shadow_ prefix. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: Move cr0/cr4/efer related helpers to x86.hAvi Kivity
They have more general scope than the mmu. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: rename is_writeble_pte() to is_writable_pte()Takuya Yoshikawa
There are two spellings of "writable" in arch/x86/kvm/mmu.c and paging_tmpl.h . This patch renames is_writeble_pte() to is_writable_pte() and makes grepping easy. New name is consistent with the definition of itself: return pte & PT_WRITABLE_MASK; Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: Replace read accesses of vcpu->arch.cr0 by an accessorAvi Kivity
Since we'd like to allow the guest to own a few bits of cr0 at times, we need to know when we access those bits. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: VMX: Enable EPT 1GB page supportSheng Yang
Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: x86: Moving PT_*_LEVEL to mmu.hSheng Yang
We can use them in x86.c and vmx.c now... Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: switch vcpu context to use SRCUMarcelo Tosatti
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU updateMarcelo Tosatti
Use two steps for memslot deletion: mark the slot invalid (which stops instantiation of new shadow pages for that slot, but allows destruction), then instantiate the new empty slot. Also simplifies kvm_handle_hva locking. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: modify memslots layout in struct kvmMarcelo Tosatti
Have a pointer to an allocated region inside struct kvm. [alex: fix ppc book 3s] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: MMU: Report spte not found in rmap before BUG()Avi Kivity
In the past we've had errors of single-bit in the other two cases; the printk() may confirm it for the third case (many->many). Signed-off-by: Avi Kivity <avi@redhat.com>
2010-01-25KVM: x86: Fix host_mapping_level()Sheng Yang
When found a error hva, should not return PAGE_SIZE but the level... Also clean up the coding style of the following loop. Cc: stable@kernel.org Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03KVM: Allow internal errors reported to userspace to carry extra dataAvi Kivity
Usually userspace will freeze the guest so we can inspect it, but some internal state is not available. Add extra data to internal error reporting so we can expose it to the debugger. Extra data is specific to the suberror. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Don't pass kvm_run argumentsAvi Kivity
They're just copies of vcpu->run, which is readily accessible. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-10-16KVM: MMU: fix pointer castFrederik Deweerdt
On a 32 bits compile, commit 3da0dd433dc399a8c0124d0614d82a09b6a49bce introduced the following warnings: arch/x86/kvm/mmu.c: In function ‘kvm_set_pte_rmapp’: arch/x86/kvm/mmu.c:770: warning: cast to pointer from integer of different size arch/x86/kvm/mmu.c: In function ‘kvm_set_spte_hva’: arch/x86/kvm/mmu.c:849: warning: cast from pointer to integer of different size The following patch uses 'unsigned long' instead of u64 to match the pointer size on both arches. Signed-off-by: Frederik Deweerdt <frederik.deweerdt@xprog.eu> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-04KVM: add support for change_pte mmu notifiersIzik Eidus
this is needed for kvm if it want ksm to directly map pages into its shadow page tables. [marcelo: cast pfn assignment to u64] Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-04KVM: MMU: add SPTE_HOST_WRITEABLE flag to the shadow ptesIzik Eidus
this flag notify that the host physical page we are pointing to from the spte is write protected, and therefore we cant change its access to be write unless we run get_user_pages(write = 1). (this is needed for change_pte support in kvm) Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-04KVM: MMU: dont hold pagecount reference for mapped sptes pagesIzik Eidus
When using mmu notifiers, we are allowed to remove the page count reference tooken by get_user_pages to a specific page that is mapped inside the shadow page tables. This is needed so we can balance the pagecount against mapcount checking. (Right now kvm increase the pagecount and does not increase the mapcount when mapping page into shadow page table entry, so when comparing pagecount against mapcount, you have no reliable result.) Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10KVM: Optimize kvm_mmu_unprotect_page_virt() for tdpAvi Kivity
We know no pages are protected, so we can short-circuit the whole thing (including fairly nasty guest memory accesses). Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: fix bogus alloc_mmu_pages assignmentMarcelo Tosatti
Remove the bogus n_free_mmu_pages assignment from alloc_mmu_pages. It breaks accounting of mmu pages, since n_free_mmu_pages is modified but the real number of pages remains the same. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: make __kvm_mmu_free_some_pages handle empty listIzik Eidus
First check if the list is empty before attempting to look at list entries. Cc: stable@kernel.org Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: shadow support for 1gb pagesJoerg Roedel
This patch adds support for shadow paging to the 1gb page table code in KVM. With this code the guest can use 1gb pages even if the host does not support them. [ Marcelo: fix shadow page collision on pmd level if a guest 1gb page is mapped with 4kb ptes on host level ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: make page walker aware of mapping levelsJoerg Roedel
The page walker may be used with nested paging too when accessing mmio areas. Make it support the additional page-level too. [ Marcelo: fix reserved bit check for 1gb pte ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: make direct mapping paths aware of mapping levelsJoerg Roedel
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: rename is_largepage_backed to mapping_levelJoerg Roedel
With the new name and the corresponding backend changes this function can now support multiple hugepage sizes. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: make rmap code aware of mapping levelsJoerg Roedel
This patch removes the largepage parameter from the rmap_add function. Together with rmap_remove this function now uses the role.level field to find determine if the page is a huge page. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: fix missing locking in alloc_mmu_pagesMarcelo Tosatti
n_requested_mmu_pages/n_free_mmu_pages are used by kvm_mmu_change_mmu_pages to calculate the number of pages to zap. alloc_mmu_pages, called from the vcpu initialization path, modifies this variables without proper locking, which can result in a negative value in kvm_mmu_change_mmu_pages (say, with cpu hotplug). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10KVM: Discard unnecessary kvm_mmu_flush_tlb() in kvm_mmu_load()Sheng Yang
set_cr3() should already cover the TLB flushing. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10KVM: MMU: Fix MMU_DEBUG compile breakageJoerg Roedel
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Trace shadow page lifecycleAvi Kivity
Create, sync, unsync, zap. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: Trace guest pagetable walkerAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Prepare memslot data structures for multiple hugepage sizesJoerg Roedel
[avi: fix build on non-x86] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: add kvm_mmu_get_spte_hierarchy helperMarcelo Tosatti
Required by EPT misconfiguration handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: make for_each_shadow_entry aware of largepagesMarcelo Tosatti
This way there is no need to add explicit checks in every for_each_shadow_entry user. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU audit: largepage handlingMarcelo Tosatti
Make the audit code aware of largepages. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU audit: audit_mappings tweaksMarcelo Tosatti
- Fail early in case gfn_to_pfn returns is_error_pfn. - For the pre pte write case, avoid spurious "gva is valid but spte is notrap" messages (the emulation code does the guest write first, so this particular case is OK). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU audit: nontrapping ptes in nonleaf levelMarcelo Tosatti
It is valid to set non leaf sptes as notrap. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU audit: update audit_write_protectionMarcelo Tosatti
- Unsync pages contain writable sptes in the rmap. - rmaps do not exclusively contain writable sptes anymore. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU audit: update count_writable_mappings / count_rmapsMarcelo Tosatti
Under testing, count_writable_mappings returns a value that is 2 integers larger than what count_rmaps returns. Suspicion is that either of the two functions is counting a duplicate (either positively or negatively). Modifying check_writable_mappings_rmap to check for rmap existance on all present MMU pages fails to trigger an error, which should keep Avi happy. Also introduce mmu_spte_walk to invoke a callback on all present sptes visible to the current vcpu, might be useful in the future. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: introduce is_last_spte helperMarcelo Tosatti
Hiding some of the last largepage / level interaction (which is useful for gbpages and for zero based levels). Also merge the PT_PAGE_TABLE_LEVEL clearing loop in unlink_children. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Return to userspace on emulation failureAvi Kivity
Instead of mindlessly retrying to execute the instruction, report the failure to userspace. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Use macro to iterate over vcpus.Gleb Natapov
[christian: remove unused variables on s390] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: s/shadow_pte/spte/Avi Kivity
We use shadow_pte and spte inconsistently, switch to the shorter spelling. Rename set_shadow_pte() to __set_spte() to avoid a conflict with the existing set_spte(), and to indicate its lowlevelness. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pteAvi Kivity
Since the guest and host ptes can have wildly different format, adjust the pte accessor names to indicate on which type of pte they operate on. No functional changes. Signed-off-by: Avi Kivity <avi@redhat.com>