Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2005-09-10	[PATCH] spinlock consolidation	Ingo Molnar
	This patch (written by me and also containing many suggestions of Arjan van de Ven) does a major cleanup of the spinlock code. It does the following things: - consolidates and enhances the spinlock/rwlock debugging code - simplifies the asm/spinlock.h files - encapsulates the raw spinlock type and moves generic spinlock features (such as ->break_lock) into the generic code. - cleans up the spinlock code hierarchy to get rid of the spaghetti. Most notably there's now only a single variant of the debugging code, located in lib/spinlock_debug.c. (previously we had one SMP debugging variant per architecture, plus a separate generic one for UP builds) Also, i've enhanced the rwlock debugging facility, it will now track write-owners. There is new spinlock-owner/CPU-tracking on SMP builds too. All locks have lockup detection now, which will work for both soft and hard spin/rwlock lockups. The arch-level include files now only contain the minimally necessary subset of the spinlock code - all the rest that can be generalized now lives in the generic headers: include/asm-i386/spinlock_types.h \| 16 include/asm-x86_64/spinlock_types.h \| 16 I have also split up the various spinlock variants into separate files, making it easier to see which does what. The new layout is: SMP \| UP ----------------------------\|----------------------------------- asm/spinlock_types_smp.h \| linux/spinlock_types_up.h linux/spinlock_types.h \| linux/spinlock_types.h asm/spinlock_smp.h \| linux/spinlock_up.h linux/spinlock_api_smp.h \| linux/spinlock_api_up.h linux/spinlock.h \| linux/spinlock.h /* * here's the role of the various spinlock/rwlock related include files: * * on SMP builds: * * asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the * initializers * * linux/spinlock_types.h: * defines the generic type and initializers * * asm/spinlock.h: contains the __raw_spin_()/etc. lowlevel implementations, mostly inline assembly code * * (also included on UP-debug builds:) * * linux/spinlock_api_smp.h: * contains the prototypes for the _spin_() APIs. * linux/spinlock.h: builds the final spin_() APIs. * on UP builds: * * linux/spinlock_type_up.h: * contains the generic, simplified UP spinlock type. * (which is an empty structure on non-debug builds) * * linux/spinlock_types.h: * defines the generic type and initializers * * linux/spinlock_up.h: * contains the __raw_spin_()/etc. version of UP builds. (which are NOPs on non-debug, non-preempt * builds) * * (included on UP-non-debug builds:) * * linux/spinlock_api_up.h: * builds the _spin_() APIs. * linux/spinlock.h: builds the final spin_() APIs. / All SMP and UP architectures are converted by this patch. arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via crosscompilers. m32r, mips, sh, sparc, have not been tested yet, but should be mostly fine. From: Grant Grundler <grundler@parisc-linux.org> Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU). Builds 32-bit SMP kernel (not booted or tested). I did not try to build non-SMP kernels. That should be trivial to fix up later if necessary. I converted bit ops atomic_hash lock to raw_spinlock_t. Doing so avoids some ugly nesting of linux/.h and asm/.h files. Those particular locks are well tested and contained entirely inside arch specific code. I do NOT expect any new issues to arise with them. If someone does ever need to use debug/metrics with them, then they will need to unravel this hairball between spinlocks, atomic ops, and bit ops that exist only because parisc has exactly one atomic instruction: LDCW (load and clear word). From: "Luck, Tony" <tony.luck@intel.com> ia64 fix Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjanv@infradead.org> Signed-off-by: Grant Grundler <grundler@parisc-linux.org> Cc: Matthew Wilcox <willy@debian.org> Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se> Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09	Merge master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild	Linus Torvalds

2005-09-09	[PATCH] fix reboot via keyboard controller reset	Truxton Fulton
	I have a system (Biostar IDEQ210M mini-pc with a VIA chipset) which will not reboot unless a keyboard is plugged in to it. I have tried all combinations of the kernel "reboot=x,y" flags to no avail. Rebooting by any method will leave the system in a wedged state (at the "Restarting system" message). I finally tracked the problem down to the machine's refusal to fully reboot unless the keyboard controller status register had bit 2 set. This is the "System flag" which when set, indicates successful completion of the keyboard controller self-test (Basic Assurance Test, BAT). I suppose that something is trying to protect against sporadic reboots unless the keyboard controller is in a good state (a keyboard is present), but I need this machine to be headless. I found that setting the system flag (via the command byte) before giving the "pulse reset line" command will allow the reboot to proceed. The patch is simple, and I think it should be fine for everybody whether they have this type of machine or not. This affects the "hard" reboot (as done when the kernel boot flags "reboot=c,h" are used). Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09	[PATCH] i386: CONFIG_ACPI_SRAT typo fix	Magnus Damm
	Fix a typo involving CONFIG_ACPI_SRAT. Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09	kbuild: full dependency check on asm-offsets.h	Sam Ravnborg
	Building asm-offsets.h has been moved to a seperate Kbuild file located in the top-level directory. This allow us to share the functionality across the architectures. The old rules in architecture specific Makefiles will die in subsequent patches. Furhtermore the usual kbuild dependency tracking is now used when deciding to rebuild asm-offsets.s. So we no longer risk to fail a rebuild caused by asm-offsets.c dependencies being touched. With this common rule-set we now force the same name across all architectures. Following patches will fix the rest. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2005-09-08	Merge linux-2.6 with linux-acpi-2.6	Len Brown

2005-09-07	[PATCH] Clean up struct flock64 definitions	Stephen Rothwell
	This patch gathers all the struct flock64 definitions (and the operations), puts them under !CONFIG_64BIT and cleans up the arch files. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07	[PATCH] Clean up struct flock definitions	Stephen Rothwell
	This patch just gathers together all the struct flock definitions except xtensa into asm-generic/fcntl.h. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07	[PATCH] Clean up the fcntl operations	Stephen Rothwell
	This patch puts the most popular of each fcntl operation/flag into asm-generic/fcntl.h and cleans up the arch files. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07	[PATCH] Clean up the open flags	Stephen Rothwell
	This patch puts the most popular of each open flag into asm-generic/fcntl.h and cleans up the arch files. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07	[PATCH] Create asm-generic/fcntl.h	Stephen Rothwell
	This set of patches creates asm-generic/fcntl.h and consolidates as much as possible from the asm-/fcntl.h files into it. This patch just gathers all the identical bits of the asm-/fcntl.h files into asm-generic/fcntl.h. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07	[PATCH] remove verify_area(): remove verify_area() from various uaccess.h ↵	Jesper Juhl
	headers Remove the deprecated (and unused) verify_area() from various uaccess.h headers. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07	[PATCH] remove asm-*/hdreg.h	Christoph Hellwig
	unused and useless.. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07	[PATCH] auxiliary vector cleanups	H. J. Lu
	The size of auxiliary vector is fixed at 42 in linux/sched.h. But it isn't very obvious when looking at linux/elf.h. This patch adds AT_VECTOR_SIZE so that we can change it if necessary when a new vector is added. Because of include file ordering problems, doing this necessitated the extraction of the AT_* symbols into a standalone header file. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07	[PATCH] FUTEX_WAKE_OP: pthread_cond_signal() speedup	Jakub Jelinek
	ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter (which at least on UP usually means an immediate context switch to one of the waiter threads). This waiter wakes up and after a few instructions it attempts to acquire the cv internal lock, but that lock is still held by the thread calling pthread_cond_signal. So it goes to sleep and eventually the signalling thread is scheduled in, unlocks the internal lock and wakes the waiter again. Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal to avoid this performance issue, but it was removed when locks were redesigned to the 3 state scheme (unlocked, locked uncontended, locked contended). Following scenario shows why simply using FUTEX_REQUEUE in pthread_cond_signal together with using lll_mutex_unlock_force in place of lll_mutex_unlock is not enough and probably why it has been disabled at that time: The number is value in cv->__data.__lock. thr1 thr2 thr3 0 pthread_cond_wait 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) 0 lll_futex_wait (&cv->__data.__futex, futexval) 0 pthread_cond_signal 1 lll_mutex_lock (cv->__data.__lock) 1 pthread_cond_signal 2 lll_mutex_lock (cv->__data.__lock) 2 lll_futex_wait (&cv->__data.__lock, 2) 2 lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock) # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE 2 lll_mutex_unlock_force (cv->__data.__lock) 0 cv->__data.__lock = 0 0 lll_futex_wake (&cv->__data.__lock, 1) 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) # Here, lll_mutex_unlock doesn't know there are threads waiting # on the internal cv's lock Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal, but it will cost us not one, but 2 extra syscalls and, what's worse, one of these extra syscalls will be done for every single waiting loop in pthread_cond_wait. We would need to use lll_mutex_unlock_force in pthread_cond_signal after requeue and lll_mutex_cond_lock in pthread_cond_wait after lll_futex_wait. Another alternative is to do the unlocking pthread_cond_signal needs to do (the lock can't be unlocked before lll_futex_wake, as that is racy) in the kernel. I have implemented both variants, futex-requeue-glibc.patch is the first one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel. The kernel interface allows userland to specify how exactly an unlocking operation should look like (some atomic arithmetic operation with optional constant argument and comparison of the previous futex value with another constant). It has been implemented just for ppc*, x86_64 and i?86, for other architectures I'm including just a stub header which can be used as a starting point by maintainers to write support for their arches and ATM will just return -ENOSYS for FUTEX_WAKE_OP. The requeue patch has been (lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running 32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL. With the following benchmark on UP x86-64 I get: for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \ for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done time elf/ld.so --library-path .:nptl-orig /tmp/bench real 0m0.655s user 0m0.253s sys 0m0.403s real 0m0.657s user 0m0.269s sys 0m0.388s time elf/ld.so --library-path .:nptl-requeue /tmp/bench real 0m0.496s user 0m0.225s sys 0m0.271s real 0m0.531s user 0m0.242s sys 0m0.288s time elf/ld.so --library-path .:nptl-wake_op /tmp/bench real 0m0.380s user 0m0.176s sys 0m0.204s real 0m0.382s user 0m0.175s sys 0m0.207s The benchmark is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt Older futex-requeue-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt Older futex-wake_op-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt Will post a new version (just x86-64 fixes so that the patch applies against pthread_cond_signal.S) to libc-hacker ml soon. Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded testcase that will not test the atomicity of the operation, but at least check if the threads that should have been woken up are woken up and whether the arithmetic operation in the kernel gave the expected results. Acked-by: Ingo Molnar <mingo@redhat.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Jamie Lokier <jamie@shareable.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] UML Support - Ptrace: adds the host SYSEMU support, for UML and ↵	Laurent Vivier
	general usage Jeff Dike <jdike@addtoit.com>, Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>, Bodo Stroesser <bstroesser@fujitsu-siemens.com> Adds a new ptrace(2) mode, called PTRACE_SYSEMU, resembling PTRACE_SYSCALL except that the kernel does not execute the requested syscall; this is useful to improve performance for virtual environments, like UML, which want to run the syscall on their own. In fact, using PTRACE_SYSCALL means stopping child execution twice, on entry and on exit, and each time you also have two context switches; with SYSEMU you avoid the 2nd stop and so save two context switches per syscall. Also, some architectures don't have support in the host for changing the syscall number via ptrace(), which is currently needed to skip syscall execution (UML turns any syscall into getpid() to avoid it being executed on the host). Fixing that is hard, while SYSEMU is easier to implement. * This version of the patch includes some suggestions of Jeff Dike to avoid adding any instructions to the syscall fast path, plus some other little changes, by myself, to make it work even when the syscall is executed with SYSENTER (but I'm unsure about them). It has been widely tested for quite a lot of time. * Various fixed were included to handle the various switches between various states, i.e. when for instance a syscall entry is traced with one of PT_SYSCALL / _SYSEMU / _SINGLESTEP and another one is used on exit. Basically, this is done by remembering which one of them was used even after the call to ptrace_notify(). * We're combining TIF_SYSCALL_EMU with TIF_SYSCALL_TRACE or TIF_SINGLESTEP to make do_syscall_trace() notice that the current syscall was started with SYSEMU on entry, so that no notification ought to be done in the exit path; this is a bit of a hack, so this problem is solved in another way in next patches. * Also, the effects of the patch: "Ptrace - i386: fix Syscall Audit interaction with singlestep" are cancelled; they are restored back in the last patch of this series. Detailed descriptions of the patches doing this kind of processing follow (but I've already summed everything up). * Fix behaviour when changing interception kind #1. In do_syscall_trace(), we check the status of the TIF_SYSCALL_EMU flag only after doing the debugger notification; but the debugger might have changed the status of this flag because he continued execution with PTRACE_SYSCALL, so this is wrong. This patch fixes it by saving the flag status before calling ptrace_notify(). * Fix behaviour when changing interception kind #2: avoid intercepting syscall on return when using SYSCALL again. A guest process switching from using PTRACE_SYSEMU to PTRACE_SYSCALL crashes. The problem is in arch/i386/kernel/entry.S. The current SYSEMU patch inhibits the syscall-handler to be called, but does not prevent do_syscall_trace() to be called after this for syscall completion interception. The appended patch fixes this. It reuses the flag TIF_SYSCALL_EMU to remember "we come from PTRACE_SYSEMU and now are in PTRACE_SYSCALL", since the flag is unused in the depicted situation. * Fix behaviour when changing interception kind #3: avoid intercepting syscall on return when using SINGLESTEP. When testing 2.6.9 and the skas3.v6 patch, with my latest patch and had problems with singlestepping on UML in SKAS with SYSEMU. It looped receiving SIGTRAPs without moving forward. EIP of the traced process was the same for all SIGTRAPs. What's missing is to handle switching from PTRACE_SYSCALL_EMU to PTRACE_SINGLESTEP in a way very similar to what is done for the change from PTRACE_SYSCALL_EMU to PTRACE_SYSCALL_TRACE. I.e., after calling ptrace(PTRACE_SYSEMU), on the return path, the debugger is notified and then wake ups the process; the syscall is executed (or skipped, when do_syscall_trace() returns 0, i.e. when using PTRACE_SYSEMU), and do_syscall_trace() is called again. Since we are on the return path of a SYSEMU'd syscall, if the wake up is performed through ptrace(PTRACE_SYSCALL), we must still avoid notifying the parent of the syscall exit. Now, this behaviour is extended even to resuming with PTRACE_SINGLESTEP. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] add suspend/resume for timer	Shaohua Li
	The timers lack .suspend/.resume methods. Because of this, jiffies got a big compensation after a S3 resume. And then softlockup watchdog reports an oops. This occured with HPET enabled, but it's also possible for other timers. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] i386 boottime for_each_cpu broken	Zwane Mwaikambo
	for_each_cpu walks through all processors in cpu_possible_map, which is defined as cpu_callout_map on i386 and isn't initialised until all processors have been booted. This breaks things which do for_each_cpu iterations early during boot. So, define cpu_possible_map as a bitmap with NR_CPUS bits populated. This was triggered by a patch i'm working on which does alloc_percpu before bringing up secondary processors. From: Alexander Nyberg <alexn@telia.com> i386-boottime-for_each_cpu-broken.patch i386-boottime-for_each_cpu-broken-fix.patch The SMP version of __alloc_percpu checks the cpu_possible_map before allocating memory for a certain cpu. With the above patches the BSP cpuid is never set in cpu_possible_map which breaks CONFIG_SMP on uniprocessor machines (as soon as someone tries to dereference something allocated via __alloc_percpu, which in fact is never allocated since the cpu is not set in cpu_possible_map). Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> Signed-off-by: Alexander Nyberg <alexn@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] i386: encapsulate copying of pgd entries	Zachary Amsden
	Add a clone operation for pgd updates. This helps complete the encapsulation of updates to page tables (or pages about to become page tables) into accessor functions rather than using memcpy() to duplicate them. This is both generally good for consistency and also necessary for running in a hypervisor which requires explicit updates to page table entries. The new function is: clone_pgd_range(pgd_t dst, pgd_t src, int count); dst - pointer to pgd range anwhere on a pgd page src - "" count - the number of pgds to copy. dst and src can be on the same page, but the range must not overlap and must not cross a page boundary. Note that I ommitted using this call to copy pgd entries into the software suspend page root, since this is not technically a live paging structure, rather it is used on resume from suspend. CC'ing Pavel in case he has any feedback on this. Thanks to Chris Wright for noticing that this could be more optimal in PAE compiles by eliminating the memset. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] x86 NMI: better support for debuggers	George Anzinger
	This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. We also change the nmi watchdog to carry on if die_nmi returns. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with "other" cpus held. Signed-off-by: George Anzinger<george@mvista.com> Cc: Keith Owens <kaos@ocs.com.au> Signed-off-by: George Anzinger <george@mvista.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] x86: introduce a write acessor for updating the current LDT	Zachary Amsden
	Introduce a write acessor for updating the current LDT. This is required for hypervisors like Xen that do not allow LDT pages to be directly written. Testing - here's a fun little LDT test that can be trivially modified to test limits as well. /* * Copyright (c) 2005, Zachary Amsden (zach@vmware.com) * This is licensed under the GPL. / #include <stdio.h> #include <signal.h> #include <asm/ldt.h> #include <asm/segment.h> #include <sys/types.h> #include <unistd.h> #include <sys/mman.h> #define __KERNEL__ #include <asm/page.h> void main(void) { struct user_desc desc; char code; unsigned long long tsc; code = (char )mmap(0, 8192, PROT_EXEC\|PROT_READ\|PROT_WRITE, MAP_PRIVATE \| MAP_ANONYMOUS, -1, 0); desc.entry_number = 0; desc.base_addr = code; desc.limit = 1; desc.seg_32bit = 1; desc.contents = MODIFY_LDT_CONTENTS_CODE; desc.read_exec_only = 0; desc.limit_in_pages = 1; desc.seg_not_present = 0; desc.useable = 1; if (modify_ldt(1, &desc, sizeof(desc)) != 0) { perror("modify_ldt"); } printf("code base is 0x%08x\n", (unsigned)code); code[0x0ffe] = 0x0f; / rdtsc / code[0x0fff] = 0x31; code[0x1000] = 0xcb; / lret */ __asm__ __volatile("lcall $7,$0xffe" : "=A" (tsc)); printf("TSC is 0x%016llx\n", tsc); } Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] x86: make IOPL explicit	Zachary Amsden
	The pushf/popf in switch_to are ONLY used to switch IOPL. Making this explicit in C code is more clear. This pushf/popf pair was added as a bugfix for leaking IOPL to unprivileged processes when using sysenter/sysexit based system calls (sysexit does not restore flags). When requesting an IOPL change in sys_iopl(), it is just as easy to change the current flags and the flags in the stack image (in case an IRET is required), but there is no reason to force an IRET if we came in from the SYSENTER path. This change is the minimal solution for supporting a paravirtualized Linux kernel that allows user processes to run with I/O privilege. Other solutions require radical rewrites of part of the low level fault / system call handling code, or do not fully support sysenter based system calls. Unfortunately, this added one field to the thread_struct. But as a bonus, on P4, the fastest time measured for switch_to() went from 312 to 260 cycles, a win of about 17% in the fast case through this performance critical path. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] x86: privilege cleanup	Zachary Amsden
	Privilege checking cleanup. Originally, these diffs were much greater, but recent cleanups in Linux have already done much of the cleanup. I added some explanatory comments in places where the reasoning behind certain tests is rather subtle. Also, in traps.c, we can skip the user_mode check in handle_BUG(). The reason is, there are only two call chains - one via die_if_kernel() and one via do_page_fault(), both entering from die(). Both of these paths already ensure that a kernel mode failure has happened. Also, the original check here, if (user_mode(regs)) was insufficient anyways, since it would not rule out BUG faults from V8086 mode execution. Saving the %ss segment in show_regs() rather than assuming a fixed value also gives better information about the current kernel state in the register dump. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] x86: more asm cleanups	Zachary Amsden
	Some more assembler cleanups I noticed along the way. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] i386: fix incorrect TSS entry for LDT	Ingo Molnar
	Noticed by Chuck Ebbert: the .ldt entry of the TSS was set up incorrectly. It never mattered since this was a leftover from old times, so remove it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] i386: use set_pte macros in a couple places where they were missing	Zachary Amsden
	Also, setting PDPEs in PAE mode does not require atomic operations, since the PDPEs are cached by the processor, and only reloaded on an explicit or implicit reload of CR3. Since the four PDPEs must always be present in an active root, and the kernel PDPE is never updated, we are safe even from SMIs and interrupts / NMIs using task gates (which reload CR3). Actually, much of this is moot, since the user PDPEs are never updated either, and the only usage of task gates is by the doublefault handler. It appears the only place PGDs get updated in PAE mode is in init_low_mappings() / zap_low_mapping() for initial page table creation and recovery from ACPI sleep state, and these sites are safe by inspection. Getting rid of the cmpxchg8b saves code space and 720 cycles in pgd_alloc on P4. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] i386: generate better code around descriptor update and access functions	Zachary Amsden
	GCC can generate better code around descriptor update and access functions when there is not an explicit "eax" register constraint. Testing: You won't boot if this is messed up, since the TSS descriptor will be corrupted. Verified the assembler and booted. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] i386: inline assembler: cleanup and encapsulate descriptor and task ↵	Zachary Amsden
	register management i386 inline assembler cleanup. This change encapsulates descriptor and task register management. Also, it is possible to improve assembler generation in two cases; savesegment may store the value in a register instead of a memory location, which allows GCC to optimize stack variables into registers, and MOV MEM, SEG is always a 16-bit write to memory, making the casting in math-emu unnecessary. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] i386: cleanup serialize msr	Zachary Amsden
	i386 arch cleanup. Introduce the serialize macro to serialize processor state. Why the microcode update needs it I am not quite sure, since wrmsr() is already a serializing instruction, but it is a microcode update, so I will keep the semantic the same, since this could be a timing workaround. As far as I can tell, this has always been there since the original microcode update source. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] i386: inline asm cleanup	Zachary Amsden
	i386 Inline asm cleanup. Use cr/dr accessor functions. Also, a potential bugfix. Also, some CR accessors really should be volatile. Reads from CR0 (numeric state may change in an exception handler), writes to CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction re-ordering. I did not add memory clobber to CR3 / CR4 / CR0 updates, as it was not there to begin with, and in no case should kernel memory be clobbered, except when doing a TLB flush, which already has memory clobber. I noticed that page invalidation does not have a memory clobber. I can't find a bug as a result, but there is definitely a potential for a bug here: #define __flush_tlb_single(addr) \ __asm__ __volatile__("invlpg %0": :"m" ((char ) addr)) Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] ES7000 platform update (i386)	Natalie.Protasevich@unisys.com
	This is subarch update for ES7000. I've modified platform check code and removed unnecessary OEM table parsing for newer systems that don't use OEM information during boot. Parsing the table in fact is causing problems, and the platform doesn't get recognized. The patch only affects the ES7000 subach. Signed-off-by: <Natalie.Protasevich@unisys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] x86: sutomatically enable bigsmp when we have more than 8 CPUs	Venkatesh Pallipadi
	i386 generic subarchitecture requires explicit dmi strings or command line to enable bigsmp mode. The patch below removes that restriction, and uses bigsmp as soon as it finds more than 8 logical CPUs, Intel processors and xAPIC support. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] x86: fix EFI memory map parsing	Matt Tolentino
	The memory descriptors that comprise the EFI memory map are not fixed in stone such that the size could change in the future. This uses the memory descriptor size obtained from EFI to iterate over the memory map entries during boot. This enables the removal of an x86 specific pad (and ifdef) in the EFI header. I also couldn't stomach the broken up nature of the function to put EFI runtime calls into virtual mode any longer so I fixed that up a bit as well. For reference, this patch only impacts x86. Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] x86: ptep_clear optimization	Zachary Amsden
	Add a new accessor for PTEs, which passes the full hint from the mmu_gather struct; this allows architectures with hardware pagetables to optimize away atomic PTE operations when destroying an address space. Removing the locked operation should allow better pipelining of memory access in this loop. I measured an average savings of 30-35 cycles per zap_pte_range on the first 500 destructions on Pentium-M, but I believe the optimization would win more on older processors which still assert the bus lock on xchg for an exclusive cacheline. Update: I made some new measurements, and this saves exactly 26 cycles over ptep_get_and_clear on Pentium M. On P4, with a PAE kernel, this saves 180 cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a full address space destruction. pte_clear_full is not yet used, but is provided for future optimizations (in particular, when running inside of a hypervisor that queues page table updates, the full hint allows us to avoid queueing unnecessary page table update for an address space in the process of being destroyed. This is not a huge win, but it does help a bit, and sets the stage for further hypervisor optimization of the mm layer on all architectures. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: Christoph Lameter <christoph@lameter.com> Cc: <linux-mm@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] sab: consolidate kmem_bufctl_t	Kyle Moffett
	This is used only in slab.c and each architecture gets to define whcih underlying type is to be used. Seems a bit silly - move it to slab.c and use the same type for all architectures: unsigned int. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] remove hugetlb_clean_stale_pgtable() and fix huge_pte_alloc()	Chen, Kenneth W
	I don't think we need to call hugetlb_clean_stale_pgtable() anymore in 2.6.13 because of the rework with free_pgtables(). It now collect all the pte page at the time of munmap. It used to only collect page table pages when entire one pgd can be freed and left with staled pte pages. Not anymore with 2.6.13. This function will never be called and We should turn it into a BUG_ON. I also spotted two problems here, not Adam's fault :-) (1) in huge_pte_alloc(), it looks like a bug to me that pud is not checked before calling pmd_alloc() (2) in hugetlb_clean_stale_pgtable(), it also missed a call to pmd_free_tlb. I think a tlb flush is required to flush the mapping for the page table itself when we clear out the pmd pointing to a pte page. However, since hugetlb_clean_stale_pgtable() is never called, so it won't trigger the bug. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Cc: Adam Litke <agl@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] hugetlb: add pte_huge() macro	Adam Litke
	This patch adds a macro pte_huge(pte) for i386/x86_64 which is needed by a patch later in the series. Instead of repeating (_PAGE_PRESENT \| _PAGE_PSE), I've added __LARGE_PTE to i386 to match x86_64. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: <linux-mm@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] mm: correct _PAGE_FILE comment	Paolo 'Blaisorblade' Giarrusso
	_PAGE_FILE does not indicate whether a file is in page / swap cache, it is set just for non-linear PTE's. Correct the comment for i386, x86_64, UML. Also clearify _PAGE_NONE. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05	[PATCH] mm: consolidate get_order	Stephen Rothwell
	Someone mentioned that almost all the architectures used basically the same implementation of get_order. This patch consolidates them into asm-generic/page.h and includes that in the appropriate places. The exceptions are ia64 and ppc which have their own (presumably optimised) versions. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-03	Merge linux-2.6 into linux-acpi-2.6 test	Len Brown

2005-08-29	[NET]: Fix ipl=>ihl typo in ip_fast_csum	Thomas Graf
	Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29	[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options	Patrick McHardy
	Allows overriding of sysctl_{wmem,rmrm}_max Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-25	[ACPI] delete CONFIG_ACPI_PCI	Len Brown
	Delete the ability to build an ACPI kernel that does not include PCI support. When such a machine is created and it requires a tuned kernel, send a patch. http://bugzilla.kernel.org/show_bug.cgi?id=1364 Signed-off-by: Len Brown <len.brown@intel.com>
2005-08-24	[ACPI] delete CONFIG_ACPI_BOOT	Len Brown
	it has been a synonym for CONFIG_ACPI since 2.6.12 Signed-off-by: Len Brown <len.brown@intel.com>
2005-08-16	[PATCH] i386 / desc_empty macro is incorrect	Zachary Amsden
	Chuck Ebbert noticed that the desc_empty macro is incorrect. Fix it. Thankfully, this is not used as a security check, but it can falsely overwrite TLS segments with carefully chosen base / limits. I do not believe this is an issue in practice, but it is a kernel bug. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Chris Wright <chrisw@osdl.org> [ x86-64 had the same problem, and the same fix. Linus ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-14	Revert PCIBIOS_MIN_IO changes for 2.6.13	Linus Torvalds
	This reverts commits 71db63acff69618b3d9d3114bd061938150e146b [PATCH] increase PCIBIOS_MIN_IO on x86 and 0b2bfb4e7ff61f286676867c3508569bea6fbf7a ACPI: increase PCIBIOS_MIN_IO on x86 since Lukas Sandströ<lukass@etek.chalmers.se> reports that this breaks his on-board nvidia audio. We should re-visit this later. For now we revert the change Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-07	[PATCH] Make visws compile again	Tom Duffy
	In file included from linux-2.6.13-rc5/arch/i386/kernel/timers/timer_pit.c:20: linux-2.6.13-rc5/include/asm-i386/mach-visws/do_timer.h: In function `do_timer_overflow': linux-2.6.13-rc5/include/asm-i386/mach-visws/do_timer.h:32: error: `i8259A_lock' undeclared (first use in this function) linux-2.6.13-rc5/include/asm-i386/mach-visws/do_timer.h:32: error: (Each undeclared identifier is reported only once linux-2.6.13-rc5/include/asm-i386/mach-visws/do_timer.h:32: error: for each function it appears in.) make[3]: * [arch/i386/kernel/timers/timer_pit.o] Error 1 make[2]: * [arch/i386/kernel/timers] Error 2 make[1]: * [arch/i386/kernel] Error 2 make: * [_all] Error 2 Signed-off-by: Tom Duffy <thomas.duffy.99@alumni.brown.edu> Cc: Andrey Panin <pazke@orbita1.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-02	[PATCH] increase PCIBIOS_MIN_IO on x86	Ivan Kokshaysky
	There is a number of x86 laptops that have some non-PCI IO ports in the 0x1000-0x1fff range, and it's quite hard to control the correct order of resource allocation between PCI and other subsystems controlling these ports. Especially with modular kernel. So just increase PCIBIOS_MIN_IO to 0x4000 to prevent any new PCI resource allocations in the problematic range (this limitation must apply _only_ to the root bus resources - see Linus' change in pci_bus_alloc_resource). As PCIBIOS_MIN_IO and PCIBIOS_MIN_CARDBUS_IO are the same now on i386 and x86-64, we can remove the latter. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-29	x86: fix new find_first_bit()	Linus Torvalds
	Some edge problems with the original C rewrite. Thanks go to Cal Peake, who pinpointed the breakage to the rewrite, and tested this fixed version. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28	[PATCH] x86_64: i386/x86_64: remove prototypes for not existing functions in ↵	Andi Kleen
	smp.h Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>