aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2008-01-30KVM: x86 emulator: Only allow VMCALL/VMMCALL trapped by #UDSheng Yang
When executing a test program called "crashme", we found the KVM guest cannot survive more than ten seconds, then encounterd kernel panic. The basic concept of "crashme" is generating random assembly code and trying to execute it. After some fixes on emulator insn validity judgment, we found it's hard to get the current emulator handle the invalid instructions correctly, for the #UD trap for hypercall patching caused troubles. The problem is, if the opcode itself was OK, but combination of opcode and modrm_reg was invalid, and one operand of the opcode was memory (SrcMem or DstMem), the emulator will fetch the memory operand first rather than checking the validity, and may encounter an error there. For example, ".byte 0xfe, 0x34, 0xcd" has this problem. In the patch, we simply check that if the invalid opcode wasn't vmcall/vmmcall, then return from emulate_instruction() and inject a #UD to guest. With the patch, the guest had been running for more than 12 hours. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: MMU: Switch to mmu spinlockMarcelo Tosatti
Convert the synchronization of the shadow handling to a separate mmu_lock spinlock. Also guard fetch() by mmap_sem in read-mode to protect against alias and memslot changes. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: MMU: Avoid calling gfn_to_page() in mmu_set_spte()Avi Kivity
Since gfn_to_page() is a sleeping function, and we want to make the core mmu spinlocked, we need to pass the page from the walker context (which can sleep) to the shadow context (which cannot). [marcelo: avoid recursive locking of mmap_sem] Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Add kvm_read_guest_atomic()Marcelo Tosatti
In preparation for a mmu spinlock, add kvm_read_guest_atomic() and use it in fetch() and prefetch_page(). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Disable vapic support on Intel machines with FlexPriorityAvi Kivity
FlexPriority accelerates the tpr without any patching. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Accelerated apic supportAvi Kivity
This adds a mechanism for exposing the virtual apic tpr to the guest, and a protocol for letting the guest update the tpr without causing a vmexit if conditions allow (e.g. there is no interrupt pending with a higher priority than the new tpr). Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: local APIC TPR access reporting facilityAvi Kivity
Add a facility to report on accesses to the local apic tpr even if the local apic is emulated in the kernel. This is basically a hack that allows userspace to patch Windows which tends to bang on the tpr a lot. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: MMU: Add cache miss statisticAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Expose ioapic to ia64 save/restore APIsZhang Xiantao
IA64 also needs to see ioapic structure in irqchip. Signed-off-by: xiantao.zhang@intel.com <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Move kvm_vcpu_kick() to x86.cZhang Xiantao
Moving kvm_vcpu_kick() to x86.c. Since it should be common for all archs, put its declarations in <linux/kvm_host.h> Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Move arch dependent files to new directory arch/x86/kvm/Avi Kivity
This paves the way for multiple architecture support. Note that while ioapic.c could potentially be shared with ia64, it is also moved. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Export include/linux/kvm.h only if $ARCH actually supports KVMAvi Kivity
Currently, make headers_check barfs due to <asm/kvm.h>, which <linux/kvm.h> includes, not existing. Rather than add a zillion <asm/kvm.h>s, export kvm.h only if the arch actually supports it. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Add ifdef in irqchip struct for x86 only structuresJerone Young
This patch fixes a small issue where sturctures: kvm_pic_state kvm_ioapic_state are defined inside x86 specific code and may or may not be defined in anyway for other architectures. The problem caused is one cannot compile userspace apps (ex. libkvm) for other archs since a size cannot be determined for these structures. Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Enhance guest cpuid managementDan Kenigsberg
The current cpuid management suffers from several problems, which inhibit passing through the host feature set to the guest: - No way to tell which features the host supports While some features can be supported with no changes to kvm, others need explicit support. That means kvm needs to vet the feature set before it is passed to the guest. - No support for indexed or stateful cpuid entries Some cpuid entries depend on ecx as well as on eax, or on internal state in the processor (running cpuid multiple times with the same input returns different output). The current cpuid machinery only supports keying on eax. - No support for save/restore/migrate The internal state above needs to be exposed to userspace so it can be saved or migrated. This patch adds extended cpuid support by means of three new ioctls: - KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm) supports - KVM_SET_CPUID2: sets the vcpu's cpuid table - KVM_GET_CPUID2: gets the vcpu's cpuid table, including hidden state [avi: fix original KVM_SET_CPUID not removing nx on non-nx hosts as it did before] Signed-off-by: Dan Kenigsberg <danken@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Export include/asm-x86/kvm.hAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Move cpuid structures to <asm/kvm.h>Jerone Young
This patch moves structures: kvm_cpuid_entry kvm_cpuid from include/linux/kvm.h to include/asm-x86/kvm.h Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Move kvm_sregs and msr structures to <asm/kvm.h>Jerone Young
Move structures: kvm_sregs kvm_msr_entry kvm_msrs kvm_msr_list from include/linux/kvm.h to include/asm-x86/kvm.h Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Move kvm_segment & kvm_dtable structure to <asm/kvm.h>Jerone Young
This patch moves structures: kvm_segment kvm_dtable from include/linux/kvm.h to include/asm-x86/kvm.h Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Move structure lapic_state to <asm/kvm.h>Jerone Young
This patch moves structure lapic_state from include/linux/kvm.h to include/asm-x86/kvm.h Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Move kvm_regs to <asm/kvm.h>Jerone Young
This patch moves structure kvm_regs to include/asm-x86/kvm.h. Each architecture will need to create there own version of this structure. Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Move x86 pic strutcturesJerone Young
This patch moves structures: kvm_pic_state kvm_ioapic_state to inclue/asm-x86/kvm.h. Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Move kvm_memory_alias to asm/kvm.hJerone Young
This patch moves sturct kvm_memory_alias from include/linux/kvm.h to include/asm-x86/kvm.h. Also have include/linux/kvm.h include include/asm/kvm.h. Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Add ioctl to tss address from userspace,Izik Eidus
Currently kvm has a wart in that it requires three extra pages for use as a tss when emulating real mode on Intel. This patch moves the allocation internally, only requiring userspace to tell us where in the physical address space we can place the tss. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Per-architecture hypercall definitionsChristian Borntraeger
Currently kvm provides hypercalls only for x86* architectures. To provide hypercall infrastructure for other kvm architectures I split kvm_para.h into a generic header file and architecture specific definitions. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Support assigning userspace memory to the guestIzik Eidus
Instead of having the kernel allocate memory to the guest, let userspace allocate it and pass the address to the kernel. This is required for s390 support, but also enables features like memory sharing and using hugetlbfs backed memory. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Allow dynamic allocation of the mmu shadow cache sizeIzik Eidus
The user is now able to set how many mmu pages will be allocated to the guest. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Refactor hypercall infrastructure (v3)Anthony Liguori
This patch refactors the current hypercall infrastructure to better support live migration and SMP. It eliminates the hypercall page by trapping the UD exception that would occur if you used the wrong hypercall instruction for the underlying architecture and replacing it with the right one lazily. A fall-out of this patch is that the unhandled hypercalls no longer trap to userspace. There is very little reason though to use a hypercall to communicate with userspace as PIO or MMIO can be used. There is no code in tree that uses userspace hypercalls. [avi: fix #ud injection on vmx] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30x86: use the same pgd_list for PAE and 64-bitJeremy Fitzhardinge
Use a standard list threaded through page->lru for maintaining the pgd list on PAE. This is the same as 64-bit, and seems saner than using a non-standard list via page->index. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: defer cr3 reload when doing pud_clear()Jeremy Fitzhardinge
PAE mode requires that we reload cr3 in order to guarantee that changes to the pgd will be noticed by the processor. This means that in principle pud_clear needs to reload cr3 every time. However, because reloading cr3 implies a tlb flush, we want to avoid it where possible. pud_clear() is only used in a couple of places: - in free_pmd_range(), when pulling down a range of process address space, and - huge_pmd_unshare() In both cases, the calling code will do a a tlb flush anyway, so there's no need to do it within pud_clear(). In free_pmd_range(), the pud_clear is immediately followed by pmd_free_tlb(); we can hook that to make the mmu_gather do an unconditional full flush to make sure cr3 gets reloaded. In huge_pmd_unshare, it is followed by flush_tlb_range, which always results in a full cr3-reload tlb flush. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: William Irwin <wli@holomorphy.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: early boot debugging via FireWire (ohci1394_dma=early)Bernhard Kaindl
This patch adds a new configuration option, which adds support for a new early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch() to decide wether OHCI-1394 FireWire controllers should be initialized and enabled for physical DMA access to allow remote debugging of early problems like issues ACPI or other subsystems which are executed very early. If the config option is not enabled, no code is changed, and if the boot paramenter is not given, no new code is executed, and independent of that, all new code is freed after boot, so the config option can be even enabled in standard, non-debug kernels. With specialized tools, it is then possible to get debugging information from machines which have no serial ports (notebooks) such as the printk buffer contents, or any data which can be referenced from global pointers, if it is stored below the 4GB limit and even memory dumps of of the physical RAM region below the 4GB limit can be taken without any cooperation from the CPU of the host, so the machine can be crashed early, it does not matter. In the extreme, even kernel debuggers can be accessed in this way. I wrote a small kgdb module and an accompanying gdb stub for FireWire which allows to gdb to talk to kgdb using remote remory reads and writes over FireWire. An version of the gdb stub fore FireWire is able to read all global data from a system which is running a a normal kernel without any kernel debugger, without any interruption or support of the system's CPU. That way, e.g. the task struct and so on can be read and even manipulated when the physical DMA access is granted. A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt and I've put a copy online at ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt It also has links to all the tools which are available to make use of it another copy of it is online at: ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff Signed-Off-By: Bernhard Kaindl <bk@suse.de> Tested-By: Thomas Renninger <trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: don't special-case pmd allocations as muchJeremy Fitzhardinge
In x86 PAE mode, stop treating pmds as a special case. Previously they were always allocated and freed with the pgd. The modifies the code to be the same as 64-bit mode, where they are allocated on demand. This is a step on the way to unifying 32/64-bit pagetable allocation as much as possible. There is a complicating wart, however. When you install a new reference to a pmd in the pgd, the processor isn't guaranteed to see it unless you reload cr3. Since reloading cr3 also has the side-effect of flushing the tlb, this is an expense that we want to avoid whereever possible. This patch simply avoids reloading cr3 unless the update is to the current pagetable. Later patches will optimise this further. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: William Irwin <wli@holomorphy.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: arch/x86/mm/init_32.c cleanupIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: make ioremap() UC by defaultIngo Molnar
Yes! A mere 120 c_p_a() fixing and rewriting patches later, we are now confident that we can enable UC by default for ioremap(), on x86 too. Every other architectures was doing this already. Doing so makes Linux more robust against MTRR mixups (which might go unnoticed if BIOS writers test other OSs only - where PAT might override bad MTRRs defaults). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: cpa: fix the self-testIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: fix clflush_page_range logicIngo Molnar
only present ptes must be flushed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: add testcases for RODATA and NX protections/attributesArjan van de Ven
Latest update; I now have 4 NX tests, but 2 fail so they're #if 0'd. I also cleaned up the NX test code quite a bit, and got rid of the ugly exception table sorting stuff. From: Arjan van de Ven <arjan@linux.intel.com> This patch adds testcases for the CONFIG_DEBUG_RODATA configuration option as well as the NX CPU feature/mappings. Both testcases can move to tests/ once that patch gets merged into mainline. (I'm half considering moving the rodata test into mm/init.c but I'll wait with that until init.c is unified) As part of this I had to fix a not-quite-right alignment in the vmlinux.lds.h for the RODATA sections, which lead to 1 page less being marked read only. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: remove flush_agp_mappings()Ingo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: cpa: move flush to cpaThomas Gleixner
The set_memory_* and set_pages_* family of API's currently requires the callers to do a global tlb flush after the function call; forgetting this is a very nasty deathtrap. This patch moves the global tlb flush into each of the callers Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: make various pageattr.c functions staticArjan van de Ven
change_page_attr_add is only used in pageattr.c now, so we can make this function static. change_page_attr() isn't used anywere at all anymore; this function is a really bad API anyway so just remove the bloat entirely. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: cpa: set_memory_notpresent()Ingo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: fix ioremap APIThomas Gleixner
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: fix the missing BIOS area check in page_is_ramThomas Gleixner
page_is_ram has a FIXME since ages, which reminds to sanity check the BIOS area between 640k and 1M, which is sometimes falsely reported as RAM in the e820 tables. Implement the sanity check. Move the BIOS range defines from pageattr.c into e820.h to avoid duplicate defines. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: move page_is_ram() functionThomas Gleixner
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: deprecate change_page_attr() for driversArjan van de Ven
With the introduction of the new API, no driver or non-archcore code needs to use c-p-a anymore, so this patch also deprecates the EXPORT_SYMBOL of CPA (it's a horrible API after all). Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: convert CPA users to the new set_page_ APIArjan van de Ven
This patch converts various users of change_page_attr() to the new, more intent driven set_page_*/set_memory_* API set. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: a new API for drivers/etc to control cache and other page attributesArjan van de Ven
Right now, if drivers or other code want to change, say, a cache attribute of a page, the only API they have is change_page_attr(). c-p-a is a really bad API for this, because it forces the caller to know *ALL* the attributes he wants for the page, not just the 1 thing he wants to change. So code that wants to set a page uncachable, needs to be aware of the NX status as well etc etc etc. This patch introduces a set of new APIs for this, set_pages_<attr> and set_memory_<attr>, that offer a logical change to the user, and leave all attributes not implied by the requested logical change alone. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: introduce max_pfn_mappedThomas Gleixner
64bit uses end_pfn_map and 32bit uses max_low_pfn. There are several files which have #ifdef'ed defines which map either to end_pfn_map or max_low_pfn. Replace this by a universal define and clean up all the other instances. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: add PAGE_KERNEL_EXEC_NOCACHEIngo Molnar
add PAGE_KERNEL_EXEC_NOCACHE. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: add PG_LEVEL enumThomas Gleixner
this way PG_LEVEL_1GB will be an easy change. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: clean up lookup_address() declarationsThomas Gleixner
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>