Age | Commit message (Collapse) | Author |
|
This patch adds a tracepoint for every #vmexit we get from a
nested guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
This patch adds a dedicated kvm tracepoint for a nested
vmrun.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
The nested SVM code emulates a #vmexit caused by a request
to open the irq window right in the request function. This
is a bug because the request function runs with preemption
and interrupts disabled but the #vmexit emulation might
sleep. This can cause a schedule()-while-atomic bug and is
fixed with this patch.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
If event_inj is valid on a #vmexit the host CPU would write
the contents to exit_int_info, so the hypervisor knows that
the event wasn't injected.
We don't do this in nested SVM by now which is a bug and
fixed by this patch.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
For a while now, we are issuing a rdmsr instruction to find out which
msrs in our save list are really supported by the underlying machine.
However, it fails to account for kvm-specific msrs, such as the pvclock
ones.
This patch moves then to the beginning of the list, and skip testing them.
Cc: stable@kernel.org
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.
Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Now, also remove pre_task_link setting in save_state_to_tss16.
commit b237ac37a149e8b56436fabf093532483bff13b0
Author: Gleb Natapov <gleb@redhat.com>
Date: Mon Mar 30 16:03:24 2009 +0300
KVM: Fix task switch back link handling.
CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Both VMX and SVM require per-cpu memory allocation, which is done at module
init time, for only online cpus.
Backend was not allocating enough structure for all possible CPUs, so
new CPUs coming online could not be hardware enabled.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
They are globals, not clearly protected by any ordering or locking, and
vulnerable to various startup races.
Instead, for variable TSC machines, register the cpufreq notifier and get
the TSC frequency directly from the cpufreq machinery. Not only is it
always right, it is also perfectly accurate, as no error prone measurement
is required.
On such machines, when a new CPU online is brought online, it isn't clear what
frequency it will start with, and it may not correspond to the reference, thus
in hardware_enable we clear the cpu_tsc_khz variable to zero and make sure
it is set before running on a VCPU.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
This patch replaces them with native_read_tsc() which can
also be used in expressions and saves a variable on the
stack in this case.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
The exit_int_info field is only written by the hardware and
never read. So it does not need to be copied on a vmrun
emulation.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
This patch reorganizes the logic in svm_interrupt_allowed to
make it better to read. This is important because the logic
is a lot more complicated with Nested SVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Remove duplicated #include('s) in
arch/x86/kvm/lapic.c
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).
Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.
To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.
So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since
gfn_to_page / get_user_pages are responsible for it.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
- Change returned handle_invalid_guest_state() to return relevant exit codes
- Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
- Return to userspace instead of repeatedly trying to emulate instructions that have already failed
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
This adds pusha and popa instructions (opcodes 0x60-0x61), this enables booting
MINIX with invalid guest state emulation on.
[marcelo: remove unused variable]
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Add missing decoder flags for or instructions (0xc-0xd).
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Not the incorrect -EINVAL.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
The only thing it protects now is interrupt injection into lapic and
this can work lockless. Even now with kvm->irq_lock in place access
to lapic is not entirely serialized since vcpu access doesn't take
kvm->irq_lock.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
The allows removal of irq_lock from the injection path.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.
[avi: no PIC on ia64]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Also move call of ack notifiers after pic state change.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
They're just copies of vcpu->run, which is readily accessible.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Introduces a new decode option "No64", which is used for instructions that are
invalid in long mode.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
[avi: avoid buffer overflow]
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Conflicts:
arch/x86/kernel/kprobes.c
kernel/trace/Makefile
Merge reason: hw-breakpoints perf integration is looking
good in testing and in reviews, plus conflicts
are mounting up - so merge & resolve.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Wrap in the cpu dr7 check that tells if we have active
breakpoints that need to be restored in the cpu.
This wrapper makes the check more self-explainable and also
reusable for any further other uses.
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
|
|
This patch rebase the implementation of the breakpoints API on top of
perf events instances.
Each breakpoints are now perf events that handle the
register scheduling, thread/cpu attachment, etc..
The new layering is now made as follows:
ptrace kgdb ftrace perf syscall
\ | / /
\ | / /
/
Core breakpoint API /
/
| /
| /
Breakpoints perf events
|
|
Breakpoints PMU ---- Debug Register constraints handling
(Part of core breakpoint API)
|
|
Hardware debug registers
Reasons of this rewrite:
- Use the centralized/optimized pmu registers scheduling,
implying an easier arch integration
- More powerful register handling: perf attributes (pinned/flexible
events, exclusive/non-exclusive, tunable period, etc...)
Impact:
- New perf ABI: the hardware breakpoints counters
- Ptrace breakpoints setting remains tricky and still needs some per
thread breakpoints references.
Todo (in the order):
- Support breakpoints perf counter events for perf tools (ie: implement
perf_bpcounter_event())
- Support from perf tools
Changes in v2:
- Follow the perf "event " rename
- The ptrace regression have been fixed (ptrace breakpoint perf events
weren't released when a task ended)
- Drop the struct hw_breakpoint and store generic fields in
perf_event_attr.
- Separate core and arch specific headers, drop
asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
- Use new generic len/type for breakpoint
- Handle off case: when breakpoints api is not supported by an arch
Changes in v3:
- Fix broken CONFIG_KVM, we need to propagate the breakpoint api
changes to kvm when we exit the guest and restore the bp registers
to the host.
Changes in v4:
- Drop the hw_breakpoint_restore() stub as it is only used by KVM
- EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
module
- Restore the breakpoints unconditionally on kvm guest exit:
TIF_DEBUG_THREAD doesn't anymore cover every cases of running
breakpoints and vcpu->arch.switch_db_regs might not always be
set when the guest used debug registers.
(Waiting for a reliable optimization)
Changes in v5:
- Split-up the asm-generic/hw-breakpoint.h moving to
linux/hw_breakpoint.h into a separate patch
- Optimize the breakpoints restoring while switching from kvm guest
to host. We only want to restore the state if we have active
breakpoints to the host, otherwise we don't care about messed-up
address registers.
- Add asm/hw_breakpoint.h to Kbuild
- Fix bad breakpoint type in trace_selftest.c
Changes in v6:
- Fix wrong header inclusion in trace.h (triggered a build
error with CONFIG_FTRACE_SELFTEST
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kiszka <jan.kiszka@web.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
|
|
If TSS we are switching to resides in high memory task switch will fail
since address will be truncated. Windows2k3 does this sometimes when
running with more then 4G
Cc: stable@kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
We only allocate memory for 32 MCE banks (KVM_MAX_MCE_BANKS) but we
allow user space to fill up to 255 on setup (mcg_cap & 0xff), corrupting
kernel memory. Catch these overflows.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
This patch updates percpu related symbols in x86 such that percpu
symbols are unique and don't clash with local symbols. This serves
two purposes of decreasing the possibility of global percpu symbol
collision and allowing dropping per_cpu__ prefix from percpu symbols.
* arch/x86/kernel/cpu/common.c: rename local variable to avoid collision
* arch/x86/kvm/svm.c: s/svm_data/sd/ for local variables to avoid collision
* arch/x86/kernel/cpu/cpu_debug.c: s/cpu_arr/cpud_arr/
s/priv_arr/cpud_priv_arr/
s/cpu_priv_count/cpud_priv_count/
* arch/x86/kernel/cpu/intel_cacheinfo.c: s/cpuid4_info/ici_cpuid4_info/
s/cache_kobject/ici_cache_kobject/
s/index_kobject/ici_index_kobject/
* arch/x86/kernel/ds.c: s/cpu_context/cpu_ds_context/
Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: (kvm) Avi Kivity <avi@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: x86@kernel.org
|
|
Conflicts:
kernel/Makefile
kernel/trace/Makefile
kernel/trace/trace.h
samples/Makefile
Merge reason: We need to be uptodate with the perf events development
branch because we plan to rewrite the breakpoints API on top of
perf events.
|
|
On a 32 bits compile, commit 3da0dd433dc399a8c0124d0614d82a09b6a49bce
introduced the following warnings:
arch/x86/kvm/mmu.c: In function ‘kvm_set_pte_rmapp’:
arch/x86/kvm/mmu.c:770: warning: cast to pointer from integer of different size
arch/x86/kvm/mmu.c: In function ‘kvm_set_spte_hva’:
arch/x86/kvm/mmu.c:849: warning: cast from pointer to integer of different size
The following patch uses 'unsigned long' instead of u64 to match the
pointer size on both arches.
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@xprog.eu>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
hrtimer->base can be temporarily NULL due to racing hrtimer_start.
See switch_hrtimer_base/lock_hrtimer_base.
Use hrtimer_get_remaining which is robust against it.
CC: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
this is needed for kvm if it want ksm to directly map pages into its
shadow page tables.
[marcelo: cast pfn assignment to u64]
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
this flag notify that the host physical page we are pointing to from
the spte is write protected, and therefore we cant change its access
to be write unless we run get_user_pages(write = 1).
(this is needed for change_pte support in kvm)
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
When using mmu notifiers, we are allowed to remove the page count
reference tooken by get_user_pages to a specific page that is mapped
inside the shadow page tables.
This is needed so we can balance the pagecount against mapcount
checking.
(Right now kvm increase the pagecount and does not increase the
mapcount when mapping page into shadow page table entry,
so when comparing pagecount against mapcount, you have no
reliable result.)
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
The number of entries is multiplied by the entry size, which can
overflow on 32-bit hosts. Bound the entry count instead.
Reported-by: David Wagner <daw@cs.berkeley.edu>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
It is possible that stale EPTP-tagged mappings are used, if a
vcpu migrates to a different pcpu.
Set KVM_REQ_TLB_FLUSH in vmx_vcpu_load, when switching pcpus, which
will invalidate both VPID and EPT mappings on the next vm-entry.
Cc: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Don't overflow when computing the 64-bit period from 32-bit registers.
Fixes sourceforge bug #2826486.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Cc: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
When running nested we need to touch the l1 guests
tsc_offset. Otherwise changes will be lost or a wrong value
be read.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
When svm_vcpu_load is called while the vcpu is running in
guest mode the tsc adjustment made there is lost on the next
emulated #vmexit. This causes the tsc running backwards in
the guest. This patch fixes the issue by also adjusting the
tsc_offset in the emulated hsave area so that it will not
get lost.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Conflicts:
arch/x86/kernel/process_64.c
Semantic conflict fixed in:
arch/x86/kvm/x86.c
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (202 commits)
MAINTAINERS: update KVM entry
KVM: correct error-handling code
KVM: fix compile warnings on s390
KVM: VMX: Check cpl before emulating debug register access
KVM: fix misreporting of coalesced interrupts by kvm tracer
KVM: x86: drop duplicate kvm_flush_remote_tlb calls
KVM: VMX: call vmx_load_host_state() only if msr is cached
KVM: VMX: Conditionally reload debug register 6
KVM: Use thread debug register storage instead of kvm specific data
KVM guest: do not batch pte updates from interrupt context
KVM: Fix coalesced interrupt reporting in IOAPIC
KVM guest: fix bogus wallclock physical address calculation
KVM: VMX: Fix cr8 exiting control clobbering by EPT
KVM: Optimize kvm_mmu_unprotect_page_virt() for tdp
KVM: Document KVM_CAP_IRQCHIP
KVM: Protect update_cr8_intercept() when running without an apic
KVM: VMX: Fix EPT with WP bit change during paging
KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors
KVM: x86 emulator: Add adc and sbb missing decoder flags
KVM: Add missing #include
...
|