aboutsummaryrefslogtreecommitdiff
path: root/include/asm-um
AgeCommit message (Collapse)Author
2006-04-11[PATCH] uml: fix "extern-vs-static" proto conflict in TLS codePaolo 'Blaisorblade' Giarrusso
Move the prototype from arch-generic to arch-specific includes because on x86_64 these functions are two static inlines. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31[PATCH] uml: check for differences in host supportPaolo 'Blaisorblade' Giarrusso
If running on a host not supporting TLS (for instance 2.4) we should report that cleanly to the user, instead of printing not comprehensible "error 5" for that. Additionally, i386 and x86_64 support different ranges for user_desc->entry_number, and we must account for that; we couldn't pass ourselves -1 because we need to override previously existing TLS descriptors which glibc has possibly set, so test at startup the range to use. x86 and x86_64 existing ranges are hardcoded. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31[PATCH] uml: implement {get,set}_thread_area for i386Paolo 'Blaisorblade' Giarrusso
Implement sys_[gs]et_thread_area and the corresponding ptrace operations for UML. This is the main chunk, additional parts follow. This implementation is now well tested and has run reliably for some time, and we've understood all the previously existing problems. Their implementation saves the new GDT content and then forwards the call to the host when appropriate, i.e. immediately when the target process is running or on context switch otherwise (i.e. on fork and on ptrace() calls). In SKAS mode, we must switch registers on each context switch (because SKAS does not switches tls_array together with current->mm). Also, added get_cpu() locking; this has been done for SKAS mode, since TT does not need it (it does not use smp_processor_id()). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31[PATCH] uml: split ldt.h in arch-independent and arch-dependant codePaolo 'Blaisorblade' Giarrusso
ldt-{i386,x86_64}.h is made of two different parts - some code for parsing of LDT descriptors, which is arch-dependant, and the code to handle uml_ldt_t (an LDT block inside UML), which is mostly arch-independant (among x86 and x86_64, at least). Join the common part in a single file (ldt.h) and split the rest away (host_ldt-{i386,x86_64}.h). This is needed because processor.h, with next patches, will start including the LDT descriptor parsing macros in host_ldt.h, but it can't include ldt.h because it uses semaphores (and to define semaphores one must first include processor.h!). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31[PATCH] uml: sparse cleanupsAl Viro
misc sparse annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: uml pfn_to_pageKAMEZAWA Hiroyuki
UML can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] uml: fix build warnings in __get_userJeff Dike
Fix a gcc warning about losing qualifiers to the first argument of copy_from_user. The typeof change for correctness, and fixes a lot of the warnings, but there are some cases where x has some extra qualifiers, like volatile, which copy_from_user can't know about. For these, the void * cast seems to be necessary. Also cleaned up some of the whitespace and got rid of the emacs comment at the bottom. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23[PATCH] x86: SMP alternativesGerd Hoffmann
Implement SMP alternatives, i.e. switching at runtime between different code versions for UP and SMP. The code can patch both SMP->UP and UP->SMP. The UP->SMP case is useful for CPU hotplug. With CONFIG_CPU_HOTPLUG enabled the code switches to UP at boot time and when the number of CPUs goes down to 1, and switches to SMP when the number of CPUs goes up to 2. Without CONFIG_CPU_HOTPLUG or on non-SMP-capable systems the code is patched once at boot time (if needed) and the tables are released afterwards. The changes in detail: * The current alternatives bits are moved to a separate file, the SMP alternatives code is added there. * The patch adds some new elf sections to the kernel: .smp_altinstructions like .altinstructions, also contains a list of alt_instr structs. .smp_altinstr_replacement like .altinstr_replacement, but also has some space to save original instruction before replaving it. .smp_locks list of pointers to lock prefixes which can be nop'ed out on UP. The first two are used to replace more complex instruction sequences such as spinlocks and semaphores. It would be possible to deal with the lock prefixes with that as well, but by handling them as special case the table sizes become much smaller. * The sections are page-aligned and padded up to page size, so they can be free if they are not needed. * Splitted the code to release init pages to a separate function and use it to release the elf sections if they are unused. Signed-off-by: Gerd Hoffmann <kraxel@suse.de> Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01[PATCH] uml: avoid "CONFIG_NR_CPUS undeclared" bogus error messagesPaolo 'Blaisorblade' Giarrusso
Olaf Hering <olh@suse.de> Olaf reported UML doesn't build for him with a clear analisys of what happened - we're using NR_CPUS in files linked against glibc headers. Seems like it defines CONFIG_SMP but not CONFIG_NR_CPUS, so we get CONFIG_NR_CPUS undeclared. The fix is to move the declaration away from that header file and move it in asm-um headers, and to add that header where needed. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01[PATCH] uml: skas0-hold-own-ldt fixups for x86-64Paolo 'Blaisorblade' Giarrusso
In a recent fixup i386 code was copied raw to x86_64 subarch to make it compile again. Here there are some little fixups and resyncs needed for it (mainly for cleanliness sake) - I did an audit and found the rest of the code to be safe. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01[PATCH] uml: typo fixupPaolo 'Blaisorblade' Giarrusso
Trivial innocent cosmetical fixup. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] uml: use generic sys_rt_sigsuspendJeff Dike
Use the generic sys_rt_sigsuspend. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] uml: add TIF_RESTORE_SIGMASK supportJeff Dike
Add support for TIF_RESTORE_SIGMASK. I copy the i386 handling of the flag. sys_sigsuspend is also changed to follow i386. Also a bit of cleanup - turn an if into a switch get rid of a couple more emacs formatting comments Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] uml: add __raw_writel definitionJeff Dike
Add implementations of the write* and __raw_write* functions. __raw_writel is needed by lib/iocopy.c, which shouldn't be used in UML, but which is unconditionally linked in anyway. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12[PATCH] death of get_thread_info/put_thread_infoAl Viro
{get,put}_thread_info() were introduced in 2.5.4 and never had been called by anything in the tree. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10[PATCH] dump_thread() cleanupakpm@osdl.org
) From: Adrian Bunk <bunk@stusta.de> - create one common dump_thread() prototype in kernel.h - dump_thread() is only used in fs/binfmt_aout.c and can therefore be removed on all architectures where CONFIG_BINFMT_AOUT is not available Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09[PATCH] mutex subsystem, add default include/asm-*/mutex.h filesArjan van de Ven
add the per-arch mutex.h files for the remaining architectures. We default to asm-generic/mutex-dec.h, because that performs quite well on most arches. Arches that do not have atomic decrement/increment instructions should switch to mutex-xchg.h instead. Arches can also provide their own implementation for the mutex fastpath primitives. Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2006-01-08[PATCH] remove gcc-2 checksAndrew Morton
Remove various things which were checking for gcc-1.x and gcc-2.x compilers. From: Adrian Bunk <bunk@stusta.de> Some documentation updates and removes some code paths for gcc < 3.2. Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] consolidate asm/futex.hJeff Dike
Most of the architectures have the same asm/futex.h. This consolidates them into asm-generic, with the arches including it from their own asm/futex.h. In the case of UML, this reverts the old broken futex.h and goes back to using the same one as almost everyone else. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Kill L1_CACHE_SHIFT_MAXRavikiran G Thirumalai
Kill L1_CACHE_SHIFT from all arches. Since L1_CACHE_SHIFT_MAX is not used anymore with the introduction of INTERNODE_CACHE, kill L1_CACHE_SHIFT_MAX. Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] uml: eliminate anonymous union and clean up symlink lossageJeff Dike
This gives a name to the anonymous union introduced in skas-hold-own-ldt, allowing to build on a wider range of gccs. It also removes ldt.h, which somehow became real, and replaces it with a symlink, and creates ldt-x86_64.h as a copy of ldt-i386.h for now. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07[PATCH] uml: maintain own LDT entriesBodo Stroesser
Patch imlements full LDT handling in SKAS: * UML holds it's own LDT table, used to deliver data on modify_ldt(READ) * UML disables the default_ldt, inherited from the host (SKAS3) or resets LDT entries, set by host's clib and inherited in SKAS0 * A new global variable skas_needs_stub is inserted, that can be used to decide, whether stub-pages must be supported or not. * Uses the syscall-stub to replace missing PTRACE_LDT (therefore, write_ldt_entry needs to be modified) Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com> Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07[PATCH] uml: switch_mm fixBen Lahaise
Not quite, something along the lines of the patch below works correctly (and makes aio performance not suffer from multiple second delays), as skas0 mode correctly switches mm contexts, unlike TT (which should probably get nuked from the kernel now that skas0 seems to be working). Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com> Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] uml: remove old UM_FASTCALL, and make the thing work againPaolo 'Blaisorblade' Giarrusso
This was used in the old dark age of 2.4, ARCH_CFLAGS doesn't work any more since some time, and UM_FASTCALL was never used in 2.6. Instead, reintroduce the thing more properly now, directly in include/asm-um/linkage.h. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] uml: reuse i386 cpu-specific tuningPaolo 'Blaisorblade' Giarrusso
Make UML share the underlying cpu-specific tuning done on i386. Actually, for now many config options aren't used a lot - but that can be done later. Also, UML relies on GCC optimization for things like memcpy and such more than i386, so specifying the correct -march and -mtune should be enough. Later, we may want to correct some other stuff. For instance, since FPU context switching, for us, is done (at least partially, i.e. between our kernelspace and userspace) by the host, we may allow usage of FPU operations by GCC. This doesn't hold for kernelspace vs. kernelspace, but we don't support preemption. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29[PATCH] mm: pte_offset_map_lock loopsHugh Dickins
Convert those common loops using page_table_lock on the outside and pte_offset_map within to use just pte_offset_map_lock within instead. These all hold mmap_sem (some exclusively, some not), so at no level can a page table be whipped away from beneath them. But whereas pte_alloc loops tested with the "atomic" pmd_present, these loops are testing with pmd_none, which on i386 PAE tests both lower and upper halves. That's now unsafe, so add a cast into pmd_none to test only the vital lower half: we lose a little sensitivity to a corrupt middle directory, but not enough to worry about. It appears that i386 and UML were the only architectures vulnerable in this way, and pgd and pud no problem. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28[PATCH] gfp_t: remaining bits of arch/*Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28[PATCH] gfp_t: dma-mapping (simple cases)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-04[PATCH] uml: Fix sysrq-r support for skas modeAllan Graves
The old code had the IP and SP coming from the registers in the thread struct, which are completely wrong since those are the userspace registers. This fixes that by pulling the correct values from the jmp_buf in which the kernel state of each thread is stored. Signed-off-by: Allan Graves <allan.graves@oracle.com> Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30[PATCH] uml get_user() NULL noise removalAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22[PATCH] uml: don't redundantly mark pte as newpage in pte_modifyPaolo 'Blaisorblade' Giarrusso
pte_modify marks a page as needing flush, which is redundant because the resulting PTE is still set with set_pte, which already handles that. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-21[PATCH] uml: adapt asm/futex.h to our archPaolo 'Blaisorblade' Giarrusso
Follow up to 4732efbeb997189d9f9b04708dc26bf8613ed721 - uml must just reuse as-is the backing architecture support. There is a micro-fixup is needed for the included file, which won't affect i386 behaviour at all. I've not tested compilation on x86_64, only on x86, but the code is almost the same except the culprit test, so everything should be ok on x86_64 too. Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17[PATCH] uml: UML/i386 cmpxchg fixJeff Dike
Using native cmpxchg offers a slight performance improvement in uml/i386. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17[PATCH] uml: breakpoint an arbitrary threadJeff Dike
This patch implements a stack trace for a thread, not unlike sysrq-t does. The advantage to this is that a break point can be placed on showreqs, so that upon showing the stack, you jump immediately into the debugger. While sysrq-t does the same thing, sysrq-t shows *all* threads stacks. It also doesn't work right now. In the future, I thought it might be acceptable to make this show all pids stacks, but perhaps leaving well enough alone and just using sysrq-t would be okay. For now, upon receiving the stack command, UML switches context to that thread, dumps its registers, and then switches context back to the original thread. Since UML compacts all threads into one of 4 host threads, this sort of mechanism could be expanded in the future to include other debugging helpers that sysrq does not cover. Note by jdike - The main benefit to this is that it brings an arbitrary thread back into context, where it can be examined by gdb. The fact that it dumps it stack is secondary. This provides the capability to examine a sleeping thread, which has existed in tt mode, but not in skas mode until now. Also, the other threads, that sysrq doesn't cover, can be gdb-ed directly anyway. Signed-off-by: Allan Graves<allan.graves@gmail.com> Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10[PATCH] uml spinlock breakageAl Viro
mingo missed that one... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10[PATCH] uml: inline mk_pte and various friendsPaolo 'Blaisorblade' Giarrusso
Turns out that, for UML, a *lot* of VM-related trivial functions are not inlined but rather normal functions. In other sections of UML code, this is justified by having files which interact with the host and cannot therefore include kernel headers, but in this case there's no such justification. I've had to turn many of them to macros because of missing declarations. While doing this, I've decided to reuse some already existing macros. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07[PATCH] remove asm-*/hdreg.hChristoph Hellwig
unused and useless.. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07[PATCH] auxiliary vector cleanupsH. J. Lu
The size of auxiliary vector is fixed at 42 in linux/sched.h. But it isn't very obvious when looking at linux/elf.h. This patch adds AT_VECTOR_SIZE so that we can change it if necessary when a new vector is added. Because of include file ordering problems, doing this necessitated the extraction of the AT_* symbols into a standalone header file. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07[PATCH] FUTEX_WAKE_OP: pthread_cond_signal() speedupJakub Jelinek
ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter (which at least on UP usually means an immediate context switch to one of the waiter threads). This waiter wakes up and after a few instructions it attempts to acquire the cv internal lock, but that lock is still held by the thread calling pthread_cond_signal. So it goes to sleep and eventually the signalling thread is scheduled in, unlocks the internal lock and wakes the waiter again. Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal to avoid this performance issue, but it was removed when locks were redesigned to the 3 state scheme (unlocked, locked uncontended, locked contended). Following scenario shows why simply using FUTEX_REQUEUE in pthread_cond_signal together with using lll_mutex_unlock_force in place of lll_mutex_unlock is not enough and probably why it has been disabled at that time: The number is value in cv->__data.__lock. thr1 thr2 thr3 0 pthread_cond_wait 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) 0 lll_futex_wait (&cv->__data.__futex, futexval) 0 pthread_cond_signal 1 lll_mutex_lock (cv->__data.__lock) 1 pthread_cond_signal 2 lll_mutex_lock (cv->__data.__lock) 2 lll_futex_wait (&cv->__data.__lock, 2) 2 lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock) # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE 2 lll_mutex_unlock_force (cv->__data.__lock) 0 cv->__data.__lock = 0 0 lll_futex_wake (&cv->__data.__lock, 1) 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) # Here, lll_mutex_unlock doesn't know there are threads waiting # on the internal cv's lock Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal, but it will cost us not one, but 2 extra syscalls and, what's worse, one of these extra syscalls will be done for every single waiting loop in pthread_cond_*wait. We would need to use lll_mutex_unlock_force in pthread_cond_signal after requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait. Another alternative is to do the unlocking pthread_cond_signal needs to do (the lock can't be unlocked before lll_futex_wake, as that is racy) in the kernel. I have implemented both variants, futex-requeue-glibc.patch is the first one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel. The kernel interface allows userland to specify how exactly an unlocking operation should look like (some atomic arithmetic operation with optional constant argument and comparison of the previous futex value with another constant). It has been implemented just for ppc*, x86_64 and i?86, for other architectures I'm including just a stub header which can be used as a starting point by maintainers to write support for their arches and ATM will just return -ENOSYS for FUTEX_WAKE_OP. The requeue patch has been (lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running 32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL. With the following benchmark on UP x86-64 I get: for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \ for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done time elf/ld.so --library-path .:nptl-orig /tmp/bench real 0m0.655s user 0m0.253s sys 0m0.403s real 0m0.657s user 0m0.269s sys 0m0.388s time elf/ld.so --library-path .:nptl-requeue /tmp/bench real 0m0.496s user 0m0.225s sys 0m0.271s real 0m0.531s user 0m0.242s sys 0m0.288s time elf/ld.so --library-path .:nptl-wake_op /tmp/bench real 0m0.380s user 0m0.176s sys 0m0.204s real 0m0.382s user 0m0.175s sys 0m0.207s The benchmark is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt Older futex-requeue-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt Older futex-wake_op-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt Will post a new version (just x86-64 fixes so that the patch applies against pthread_cond_signal.S) to libc-hacker ml soon. Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded testcase that will not test the atomicity of the operation, but at least check if the threads that should have been woken up are woken up and whether the arithmetic operation in the kernel gave the expected results. Acked-by: Ingo Molnar <mingo@redhat.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Jamie Lokier <jamie@shareable.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] uml: fix x86_64 page leakJeff Dike
We were leaking pmd pages when 3_LEVEL_PGTABLES was enabled. This fixes that. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] uml: merge duplicated page table codeJeff Dike
There is a lot of code which is duplicated between the 2 and 3 level implementation, with the only difference that the 3-level implementation is a bit more generalized (instead of accessing directly pte_t.pte, it uses the appropriate access macros). So this code is joined together. As obvious, a "core code nice cleanup" is not a "stability-friendly patch" so usual care applies. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] uml: fixes performance regression in activate_mm and thus exec()Paolo 'Blaisorblade' Giarrusso
Normally, activate_mm() is called from exec(), and thus it used to be a no-op because we use a completely new "MM context" on the host (for instance, a new process), and so we didn't need to flush any "TLB entries" (which for us are the set of memory mappings for the host process from the virtual "RAM" file). Kernel threads, instead, are usually handled in a different way. So, when for AIO we call use_mm(), things used to break and so Benjamin implemented activate_mm(). However, that is only needed for AIO, and could slow down exec() inside UML, so be smart: detect being called for AIO (via PF_BORROWED_MM) and do the full flush only in that situation. Comment also the caller so that people won't go breaking UML without noticing. I also rely on the caller's locks for testing current->flags. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> CC: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] mm: correct _PAGE_FILE commentPaolo 'Blaisorblade' Giarrusso
_PAGE_FILE does not indicate whether a file is in page / swap cache, it is set just for non-linear PTE's. Correct the comment for i386, x86_64, UML. Also clearify _PAGE_NONE. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] mm: consolidate get_orderStephen Rothwell
Someone mentioned that almost all the architectures used basically the same implementation of get_order. This patch consolidates them into asm-generic/page.h and includes that in the appropriate places. The exceptions are ia64 and ppc which have their own (presumably optimised) versions. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-15um: fix __pa/__va macro expansion problemLinus Torvalds
Proper parentheses around arguments needed, especially as the macros use a high-precedence cast operator on the argument.
2005-07-28[PATCH] uml: vm86 compile fixJeff Dike
We added an include of asm/vm86.h in include/asm-i386/ptrace.h. Since UML includes the underlying arch's ptrace.h, it needs an asm/vm86.h in order to build. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26[PATCH] Add emergency_restart()Eric W. Biederman
When the kernel is working well and we want to restart cleanly kernel_restart is the function to use. But in many instances the kernel wants to reboot when thing are expected to be working very badly such as from panic or a software watchdog handler. This patch adds the function emergency_restart() so that callers can be clear what semantics they expect when calling restart. emergency_restart() is expected to be callable from interrupt context and possibly reliable in even more trying circumstances. This is an initial generic implementation for all architectures. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-14[PATCH] uml: consolidate modify_ldtPaolo 'Blaisorblade' Giarrusso
*) Reorganize the two cases of sys_modify_ldt to share all the reasonably common code. *) Avoid memory allocation when unneeded (i.e. when we are writing and the passed buffer size is known), thus not returning ENOMEM (which isn't allowed for this syscall, even if there is no strict "specification"). *) Add copy_{from,to}_user to modify_ldt for TT mode. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12[PATCH] uml: tlb flushing fixBenjamin LaHaise
This patch fixes a fairly serious tlb flushing bug that makes aio use under uml very unreliable -- SEGVs, Oops and panic()s occur as a result of stale tlb entires being used by uml when aio switches mms due to the fact that uml does not implement the activate_mm() hook. This patch introduces a simple but correct approach (read: hammer) for implementing activate_mm() in uml by doing a force_flush_all() if the new mm is different from old. With this patch in place, uml is able to succeed at the aio test case that was randomly faulting for me before. Cc: Jeff Dike <jdike@addtoit.com> Cc: <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07[PATCH] uml: skas0 - separate kernel address space on stock hostsJeff Dike
UML has had two modes of operation - an insecure, slow mode (tt mode) in which the kernel is mapped into every process address space which requires no host kernel modifications, and a secure, faster mode (skas mode) in which the UML kernel is in a separate host address space, which requires a patch to the host kernel. This patch implements something very close to skas mode for hosts which don't support skas - I'm calling this skas0. It provides the security of the skas host patch, and some of the performance gains. The two main things that are provided by the skas patch, /proc/mm and PTRACE_FAULTINFO, are implemented in a way that require no host patch. For the remote address space changing stuff (mmap, munmap, and mprotect), we set aside two pages in the process above its stack, one of which contains a little bit of code which can call mmap et al. To update the address space, the system call information (system call number and arguments) are written to the stub page above the code. The %esp is set to the beginning of the data, the %eip is set the the start of the stub, and it repeatedly pops the information into its registers and makes the system call until it sees a system call number of zero. This is to amortize the cost of the context switch across multiple address space updates. When the updates are done, it SIGSTOPs itself, and the kernel process continues what it was doing. For a PTRACE_FAULTINFO replacement, we set up a SIGSEGV handler in the child, and let it handle segfaults rather than nullifying them. The handler is in the same page as the mmap stub. The second page is used as the stack. The handler reads cr2 and err from the sigcontext, sticks them at the base of the stack in a faultinfo struct, and SIGSTOPs itself. The kernel then reads the faultinfo and handles the fault. A complication on x86_64 is that this involves resetting the registers to the segfault values when the process is inside the kill system call. This breaks on x86_64 because %rcx will contain %rip because you tell SYSRET where to return to by putting the value in %rcx. So, this corrupts $rcx on return from the segfault. To work around this, I added an arch_finish_segv, which on x86 does nothing, but which on x86_64 ptraces the child back through the sigreturn. This causes %rcx to be restored by sigreturn and avoids the corruption. Ultimately, I think I will replace this with the trick of having it send itself a blocked signal which will be unblocked by the sigreturn. This will allow it to be stopped just after the sigreturn, and PTRACE_SYSCALLed without all the back-and-forth of PTRACE_SYSCALLing it through sigreturn. This runs on a stock host, so theoretically (and hopefully), tt mode isn't needed any more. We need to make sure that this is better in every way than tt mode, though. I'm concerned about the speed of address space updates and page fault handling, since they involve extra round-trips to the child. We can amortize the round-trip cost for large address space updates by writing all of the operations to the data page and having the child execute them all at the same time. This will help fork and exec, but not page faults, since they involve only one page. I can't think of any way to help page faults, except to add something like PTRACE_FAULTINFO to the host. There is PTRACE_SIGINFO, but UML doesn't use siginfo for SIGSEGV (or anything else) because there isn't enough information in the siginfo struct to handle page faults (the faulting operation type is missing). Adding that would make PTRACE_SIGINFO a usable equivalent to PTRACE_FAULTINFO. As for the code itself: - The system call stub is in arch/um/kernel/sys-$(SUBARCH)/stub.S. It is put in its own section of the binary along with stub_segv_handler in arch/um/kernel/skas/process.c. This is manipulated with run_syscall_stub in arch/um/kernel/skas/mem_user.c. syscall_stub will execute any system call at all, but it's only used for mmap, munmap, and mprotect. - The x86_64 stub calls sigreturn by hand rather than allowing the normal sigreturn to happen, because the normal sigreturn is a SA_RESTORER in UML's address space provided by libc. Needless to say, this is not available in the child's address space. Also, it does a couple of odd pops before that which restore the stack to the state it was in at the time the signal handler was called. - There is a new field in the arch mmu_context, which is now a union. This is the pid to be manipulated rather than the /proc/mm file descriptor. Code which deals with this now checks proc_mm to see whether it should use the usual skas code or the new code. - userspace_tramp is now used to create a new host process for every UML process, rather than one per UML processor. It checks proc_mm and ptrace_faultinfo to decide whether to map in the pages above its stack. - start_userspace now makes CLONE_VM conditional on proc_mm since we need separate address spaces now. - switch_mm_skas now just sets userspace_pid[0] to the new pid rather than PTRACE_SWITCH_MM. There is an addition to userspace which updates its idea of the pid being manipulated each time around the loop. This is important on exec, when the pid will change underneath userspace(). - The stub page has a pte, but it can't be mapped in using tlb_flush because it is part of tlb_flush. This is why it's required for it to be mapped in by userspace_tramp. Other random things: - The stub section in uml.lds.S is page aligned. This page is written out to the backing vm file in setup_physmem because it is mapped from there into user processes. - There's some confusion with TASK_SIZE now that there are a couple of extra pages that the process can't use. TASK_SIZE is considered by the elf code to be the usable process memory, which is reasonable, so it is decreased by two pages. This confuses the definition of USER_PGDS_IN_LAST_PML4, making it too small because of the rounding down of the uneven division. So we round it to the nearest PGDIR_SIZE rather than the lower one. - I added a missing PT_SYSCALL_ARG6_OFFSET macro. - um_mmu.h was made into a userspace-usable file. - proc_mm and ptrace_faultinfo are globals which say whether the host supports these features. - There is a bad interaction between the mm.nr_ptes check at the end of exit_mmap, stack randomization, and skas0. exit_mmap will stop freeing pages at the PGDIR_SIZE boundary after the last vma. If the stack isn't on the last page table page, the last pte page won't be freed, as it should be since the stub ptes are there, and exit_mmap will BUG because there is an unfreed page. To get around this, TASK_SIZE is set to the next lowest PGDIR_SIZE boundary and mm->nr_ptes is decremented after the calls to init_stub_pte. This ensures that we know the process stack (and all other process mappings) will be below the top page table page, and thus we know that mm->nr_ptes will be one too many, and can be decremented. Things that need fixing: - We may need better assurrences that the stub code is PIC. - The stub pte is set up in init_new_context_skas. - alloc_pgdir is probably the right place. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>