aboutsummaryrefslogtreecommitdiff
path: root/arch/ia64/kernel/entry.S
AgeCommit message (Collapse)Author
2008-07-25[IA64] Wire up new system callsTony Luck
Six new system calls: signalfd4, eventfd2, epoll_create1, dup3, pipe2 and inotify_init1. Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27[IA64] pvops: paravirtualize entry.SIsaku Yamahata
paravirtualize ia64_swtich_to, ia64_leave_syscall and ia64_leave_kernel. They include sensitive or performance critical privileged instructions so that they need paravirtualization. To paravirtualize them by single source and multi compile they are converted into indirect jump. And define each pv instances. Cc: Keith Owens <kaos@ocs.com.au> Cc: "Dong, Eddie" <eddie.dong@intel.com> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-14[IA64] trivial cleanup for entry.SHidetoshi Seto
This patch does: - make comment at next to resched check more robust - move "re-check" comments to next to where change predicate regs Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-14[IA64] fix interrupt masking for pending works on kernel leaveHidetoshi Seto
[Bug-fix for "[BUG?][2.6.25-mm1] sleeping during IRQ disabled"] This patch does: - enable interrupts before calling schedule() as same as others, ex. x86 - enable interrupts during ia64_do_signal() and ia64_sync_krbs() - do_notify_resume_user() is still called with interrupts disabled, since we can take short path of fsys_mode if-statement quickly. - pfm_handle_work() is also called with interrupts disabled, since it can deal interrupt mask within itself. - fix/add some comments/notes Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-04-22[IA64] disable interrupts on exit of ia64_trace_syscallHidetoshi Seto
While testing with CONFIG_VIRT_CPU_ACCOUNTING=y, I found that I occasionally get very huge system time in some threads. So I dug the issue and finally noticed that it was caused because of an interrupt which interrupt in the following window: > [arch/ia64/kernel/entry.S: (!CONFIG_PREEMPT && CONFIG_VIRT_CPU_ACCOUNTING)] > > ENTRY(ia64_leave_syscall) > : > (pUStk) rsm psr.i > cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall > (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk > .work_processed_syscall: > adds r2=PT(LOADRS)+16,r12 > (pUStk) mov.m r22=ar.itc // fetch time at leave > adds r18=TI_FLAGS+IA64_TASK_SIZE,r13 > ;; > <<< window: from here >>> > (p6) ld4 r31=[r18] // load current_thread_info()->flags > ld8 r19=[r2],PT(B6)-PT(LOADRS) > adds r3=PT(AR_BSPSTORE)+16,r12 > ;; > mov r16=ar.bsp > ld8 r18=[r2],PT(R9)-PT(B6) > (p6) and r15=TIF_WORK_MASK,r31 // any work other than TIF_SYSCALL_TRACE? > ;; > ld8 r23=[r3],PT(R11)-PT(AR_BSPSTORE) > (p6) cmp4.ne.unc p6,p0=r15, r0 // any special work pending? > (p6) br.cond.spnt .work_pending_syscall > ;; > ld8 r9=[r2],PT(CR_IPSR)-PT(R9) > ld8 r11=[r3],PT(CR_IIP)-PT(R11) > (pNonSys) break 0 // bug check: we shouldn't be here if pNonSys is TRUE! > ;; > invala > <<< window: to here >>> > rsm psr.i | psr.ic // turn off interrupts and interruption collection If pUStk is true, it means we are going to return user mode, hence we fetch ar.itc to get time at leave from system. It seems that it is not possible to interrupt the window if pUStk is true, because interrupts are disabled early. And also disabling interrupt makes sense because it is safe for referring current_thread_info()->flags. However interrupting the window while pUStk is true was possible. The route was: ia64_trace_syscall -> .work_pending_syscall_end -> .work_processed_syscall Only in case entering the window from this route, interrupts are enabled during in the window even if pUStk is true. I suppose interrupts must be disabled here anyway if pUStk is true. I'm not sure but afraid that what kind of bad effect were there, other than crazy system time which I found. FYI, there was a commit 6f6d75825dc49b082906b84537b4df28293c2977 that points out a bug at same point(exit of ia64_trace_syscall) in 2006. It can be said that there was an another bug. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-02-20[IA64] VIRT_CPU_ACCOUNTING (accurate cpu time accounting)Hidetoshi Seto
This patch implements VIRT_CPU_ACCOUNTING for ia64, which enable us to use more accurate cpu time accounting. The VIRT_CPU_ACCOUNTING is an item of kernel config, which s390 and powerpc arch have. By turning this config on, these archs change the mechanism of cpu time accounting from tick-sampling based one to state-transition based one. The state-transition based accounting is done by checking time (cycle counter in processor) at every state-transition point, such as entrance/exit of kernel, interrupt, softirq etc. The difference between point to point is the actual time consumed during in the state. There is no doubt about that this value is more accurate than that of tick-sampling based accounting. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-02-08[IA64] Wire up timerfd_{create,settime,gettime} syscallsTony Luck
Add ia64 hooks for the new syscalls that were added in commit 4d672e7ac79b5ec5cdc90e450823441e20464691 Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-02-05timerfd: new timerfd APIDavide Libenzi
This is the new timerfd API as it is implemented by the following patch: int timerfd_create(int clockid, int flags); int timerfd_settime(int ufd, int flags, const struct itimerspec *utmr, struct itimerspec *otmr); int timerfd_gettime(int ufd, struct itimerspec *otmr); The timerfd_create() API creates an un-programmed timerfd fd. The "clockid" parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME. The timerfd_settime() API give new settings by the timerfd fd, by optionally retrieving the previous expiration time (in case the "otmr" parameter is not NULL). The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit is set in the "flags" parameter. Otherwise it's a relative time. The timerfd_gettime() API returns the next expiration time of the timer, or {0, 0} if the timerfd has not been set yet. Like the previous timerfd API implementation, read(2) and poll(2) are supported (with the same interface). Here's a simple test program I used to exercise the new timerfd APIs: http://www.xmailserver.org/timerfd-test2.c [akpm@linux-foundation.org: coding-style cleanups] [akpm@linux-foundation.org: fix ia64 build] [akpm@linux-foundation.org: fix m68k build] [akpm@linux-foundation.org: fix mips build] [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds] [heiko.carstens@de.ibm.com: fix s390] [akpm@linux-foundation.org: fix powerpc build] [akpm@linux-foundation.org: fix sparc64 more] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19[IA64] fallocate system callDavid Chinner
sys_fallocate for ia64. This uses an empty slot #1303 erroneously marked as reserved for move_pages (which had already been allocated as syscall #1276) Signed-Off-By: Dave Chinner <dgc@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-05-14[IA64] wire up {signal,timer,event}fd syscallsTony Luck
Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-05-10[IA64] Wire up epoll_pwait and utimensatTony Luck
Another day, another pair of new system calls. Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-05-08[IA64] wire up pselect, ppollAlexey Kuznetsov
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-05-08[IA64] Add TIF_RESTORE_SIGMASKAlexey Dobriyan
Preparation for pselect and ppoll. ia32 compat code not tested. :-( Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-04-30Pull percpu-dtc into release branchTony Luck
2007-02-06[IA64] remove per-cpu ia64_phys_stacked_size_p8Chen, Kenneth W
It's not efficient to use a per-cpu variable just to store how many physical stack register a cpu has. Ever since the incarnation of ia64 up till upcoming Montecito processor, that variable has "glued" to 96. Having a variable in memory means that the kernel is burning an extra cacheline access on every syscall and kernel exit path. Such "static" value is better served with the instruction patching utility exists today. Convert ia64_phys_stacked_size_p8 into dynamic insn patching. This also has a pleasant side effect of eliminating access to per-cpu area while psr.ic=0 in the kernel exit path. (fixable for per-cpu DTC work, but why bother?) There are some concerns with the default value that the instruc- tion encoded in the kernel image. It shouldn't be concerned. The reasons are: (1) cpu_init() is called at CPU initialization. In there, we find out physical stack register size from PAL and patch two instructions in kernel exit code. The code in question can not be executed before the patching is done. (2) current implementation stores zero in ia64_phys_stacked_size_p8, and that's what the current kernel exit path loads the value with. With the new code, it is equivalent that we store reg size 96 in ia64_phys_stacked_size_p8, thus creating a better safety net. Given (1) above can never fail, having (2) is just a bonus. All in all, this patch allow one less memory reference in the kernel exit path, thus reducing syscall and interrupt return latency; and avoid polluting potential useful data in the CPU cache. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-02-05[IA64] Hook up getcpu system call for IA64Fenghua Yu
getcpu system call returns cpu# and node# on which this system call and its caller are running. This patch hooks up its implementation on IA64. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-12-07[IA64] IA64 Kexec/kdumpZou Nan hai
Changes and updates. 1. Remove fake rendz path and related code according to discuss with Khalid Aziz. 2. fc.i offset fix in relocate_kernel.S. 3. iospic shutdown code eoi and mask race fix from Fujitsu. 4. Warm boot hook in machine_kexec to SN SAL code from Jack Steiner. 5. Send slave to SAL slave loop patch from Jay Lan. 6. Kdump on non-recoverable MCA event patch from Jay Lan 7. Use CTL_UNNUMBERED in kdump_on_init sysctl. Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-10-03fix file specification in commentsUwe Zeisberger
Many files include the filename at the beginning, serveral used a wrong one. Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-02[PATCH] rename the provided execve functions to kernel_execveArnd Bergmann
Some architectures provide an execve function that does not set errno, but instead returns the result code directly. Rename these to kernel_execve to get the right semantics there. Moreover, there is no reasone for any of these architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so remove these right away. [akpm@osdl.org: build fix] [bunk@stusta.de: build fix] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andi Kleen <ak@muc.de> Acked-by: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26Revert "[IA64] Unwire set/get_robust_list"Tony Luck
This reverts commit 2636255488484e04d6d54303d2b0ec30f7ef7e02. Jakub Jelinek provided the missing futex_atomic_cmpxchg_inatomic() function, so now it should be safe to re-enable these syscalls. Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-09-08[IA64] Unwire set/get_robust_listAndreas Schwab
The syscalls set/get_robust_list must not be wired up until futex_atomic_cmpxchg_inatomic is implemented. Otherwise the kernel will hang in handle_futex_death. Signed-off-by: Andreas Schwab <schwab@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-23[PATCH] page migration: sys_move_pages(): support moving of individual pagesChristoph Lameter
move_pages() is used to move individual pages of a process. The function can be used to determine the location of pages and to move them onto the desired node. move_pages() returns status information for each page. long move_pages(pid, number_of_pages_to_move, addresses_of_pages[], nodes[] or NULL, status[], flags); The addresses of pages is an array of void * pointing to the pages to be moved. The nodes array contains the node numbers that the pages should be moved to. If a NULL is passed instead of an array then no pages are moved but the status array is updated. The status request may be used to determine the page state before issuing another move_pages() to move pages. The status array will contain the state of all individual page migration attempts when the function terminates. The status array is only valid if move_pages() completed successfullly. Possible page states in status[]: 0..MAX_NUMNODES The page is now on the indicated node. -ENOENT Page is not present -EACCES Page is mapped by multiple processes and can only be moved if MPOL_MF_MOVE_ALL is specified. -EPERM The page has been mlocked by a process/driver and cannot be moved. -EBUSY Page is busy and cannot be moved. Try again later. -EFAULT Invalid address (no VMA or zero page). -ENOMEM Unable to allocate memory on target node. -EIO Unable to write back page. The page must be written back in order to move it since the page is dirty and the filesystem does not provide a migration function that would allow the moving of dirty pages. -EINVAL A dirty page cannot be moved. The filesystem does not provide a migration function and has no ability to write back pages. The flags parameter indicates what types of pages to move: MPOL_MF_MOVE Move pages that are only mapped by the process. MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes. Requires sufficient capabilities. Possible return codes from move_pages() -ENOENT No pages found that would require moving. All pages are either already on the target node, not present, had an invalid address or could not be moved because they were mapped by multiple processes. -EINVAL Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt to migrate pages in a kernel thread. -EPERM MPOL_MF_MOVE_ALL specified without sufficient priviledges. or an attempt to move a process belonging to another user. -EACCES One of the target nodes is not allowed by the current cpuset. -ENODEV One of the target nodes is not online. -ESRCH Process does not exist. -E2BIG Too many pages to move. -ENOMEM Not enough memory to allocate control array. -EFAULT Parameters could not be accessed. A test program for move_pages() may be found with the patches on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3 From: Christoph Lameter <clameter@sgi.com> Detailed results for sys_move_pages() Pass a pointer to an integer to get_new_page() that may be used to indicate where the completion status of a migration operation should be placed. This allows sys_move_pags() to report back exactly what happened to each page. Wish there would be a better way to do this. Looks a bit hacky. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Jes Sorensen <jes@trained-monkey.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andi Kleen <ak@muc.de> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-26[PATCH] Add support for the sys_vmsplice syscallJens Axboe
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice() moves data to a pipe, with the input being a user address range instead. This uses an approach suggested by Linus, where we can hold partial ranges inside the pages[] map. Hopefully this will be useful for network receive support as well. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11[PATCH] splice: add support for sys_tee()Jens Axboe
Basically an in-kernel implementation of tee, which uses splice and the pipe buffers as an intelligent way to pass data around by reference. Where the user space tee consumes the input and produces a stdout and file output, this syscall merely duplicates the data inside a pipe to another pipe. No data is copied, the output just grabs a reference to the input pipe data. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-06[IA64] Wire up new syscalls {set,get}_robust_listTony Luck
Join the dots to enable Ingo's robut futex syscalls. Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-04[IA64] Wire up new syscall sync_file_range()Tony Luck
Also reserve syscall numbers for {set,get}_robust_list Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-30[PATCH] Introduce sys_splice() system callJens Axboe
This adds support for the sys_splice system call. Using a pipe as a transport, it can connect to files or sockets (latter as output only). From the splice.c comments: "splice": joining two ropes together by interweaving their strands. This is the "extended pipe" functionality, where a pipe is used as an arbitrary in-memory buffer. Think of a pipe as a small kernel buffer that you can use to transfer data from one end to the other. The traditional unix read/write is extended with a "splice()" operation that transfers data buffers to or from a pipe buffer. Named by Larry McVoy, original implementation from Linus, extended by Jens to support splicing to files and fixing the initial implementation bugs. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-21Pull delete-sigdelayed into release branchTony Luck
2006-02-16[IA64] Missing check for TIF_WORK if trace/audit enabledJack Steiner
It appears that if auditing is enabled, the kernel fails to check for pending signals before returning to user mode. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-08[IA64] unshare system call registration for ia64Janak Desai
Registers system call for the ia64 architecture. Reserves space for ppoll and pselect, and adds unshare at system call number 1296. Signed-off-by: Janak Desai <janak@us.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-06[IA64] add syscall entry for *at()Chen, Kenneth W
Wire up the ia64 syscalls for *at() functions. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-26[IA64] Delete MCA/INIT sigdelayed codeKeith Owens
The only user of the MCA/INIT sigdelayed code (SGI's I/O probing) has moved from the kernel into SAL. Delete the MCA/INIT sigdelayed code. Signed-off-by: Keith Owens <kaos@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-08[PATCH] Swap Migration V5: sys_migrate_pages interfaceChristoph Lameter
sys_migrate_pages implementation using swap based page migration This is the original API proposed by Ray Bryant in his posts during the first half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org. The intent of sys_migrate is to migrate memory of a process. A process may have migrated to another node. Memory was allocated optimally for the prior context. sys_migrate_pages allows to shift the memory to the new node. sys_migrate_pages is also useful if the processes available memory nodes have changed through cpuset operations to manually move the processes memory. Paul Jackson is working on an automated mechanism that will allow an automatic migration if the cpuset of a process is changed. However, a user may decide to manually control the migration. This implementation is put into the policy layer since it uses concepts and functions that are also needed for mbind and friends. The patch also provides a do_migrate_pages function that may be useful for cpusets to automatically move memory. sys_migrate_pages does not modify policies in contrast to Ray's implementation. The current code here is based on the swap based page migration capability and thus is not able to preserve the physical layout relative to it containing nodeset (which may be a cpuset). When direct page migration becomes available then the implementation needs to be changed to do a isomorphic move of pages between different nodesets. The current implementation simply evicts all pages in source nodeset that are not in the target nodeset. Patch supports ia64, i386 and x86_64. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-16[IA64] Remove warnings for gcc 4.0 IA64 compilation.Peter Chubb
This patch removes some compilation warnings, mostly trivially. acpi.c fix also noted by Kenji Kaneshige. Signed-off-by; Peter Chubb <peterc@gelato.unsw.edu.au> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-09Merge master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild Linus Torvalds
2005-09-09[PATCH] Prefetch kernel stacks to speed up context switchChen, Kenneth W
For architecture like ia64, the switch stack structure is fairly large (currently 528 bytes). For context switch intensive application, we found that significant amount of cache misses occurs in switch_to() function. The following patch adds a hook in the schedule() function to prefetch switch stack structure as soon as 'next' task is determined. This allows maximum overlap in prefetch cache lines for that structure. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09kbuild: ia64 use generic asm-offsets.h supportSam Ravnborg
Delete obsolete stuff from arch Makefile Rename file to asm-offsets.h The trick used in the arch Makefile to circumvent the circular dependency is kept. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2005-09-07[IA64] minor performance tune-up in ia64_switch_toChen, Kenneth W
The reenabling of psr.ic should really belong to dtr mapping code block. It make the fall through code fast since it doesn't need to execute the predicated-off instruction. Logically make more sense as well since psr.ic was turned off in .map code block. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-08-01[PATCH] remove sys_set_zone_reclaim()Ingo Molnar
This removes sys_set_zone_reclaim() for now. While i'm sure Martin is trying to solve a real problem, we must not hard-code an incomplete and insufficient approach into a syscall, because syscalls are pretty much for eternity. I am quite strongly convinced that this syscall must not hit v2.6.13 in its current form. Firstly, the syscall lacks basic syscall design: e.g. it allows the global setting of VM policy for unprivileged users. (!) [ Imagine an Oracle installation and a SAP installation on the same NUMA box fighting over the 'optimal' setting for this flag. What will they do? Will they try to set the flag to their own preferred value every second or so? ] Secondly, it was added based on a single datapoint from Martin: http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2 where Martin characterizes the numbers the following way: ' Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to see that with reclaim the benchmark still finishes in a reasonable amount of time. ' in other words: the fundamental problem has likely not been solved, only a tendential move into the right direction has been observed, and a handful of numbers were picked out of a set of hugely variable results, without showing the variability data. How much variance is there run-to-run? I'd really suggest to first walk the walk and see what's needed to get stable & predictable kernel compilation numbers on that NUMA box, before adding random syscalls to tune a particular aspect of the VM ... which approach might not even matter once the whole picture has been analyzed and understood! The third, most important point is that the syscall exposes VM tuning internals in a completely unstructured way. What sense does it make to have a _GLOBAL_ per-node setting for 'should we go to another node for reclaim'? If then it might make sense to do this per-app, via numalib or so. The change is minimalistic in that it doesnt remove the syscall and the underlying infrastructure changes, only the user-visible changes. We could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka /proc/sys/vm/swappiness, but even that looks quite counterproductive when the generic approach is that we are trying to reduce the number of external factors in the VM balance picture. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27[IA64] inotify: ia64 syscalls.Robert Love
Attached patch adds the inotify syscalls to ia64. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-07-08[IA64] Fix a typo in arch/ia64/kernel/entry.SH. J. Lu
Both 2.4 and 2.6 kernels need this patch for the next binutils. Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-06-28Auto merge with /home/aegl/GIT/ia64-testTony Luck
2005-06-27[PATCH] Update cfq io scheduler to time sliced designJens Axboe
This updates the CFQ io scheduler to the new time sliced design (cfq v3). It provides full process fairness, while giving excellent aggregate system throughput even for many competing processes. It supports io priorities, either inherited from the cpu nice value or set directly with the ioprio_get/set syscalls. The latter closely mimic set/getpriority. This import is based on my latest from -mm. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] VM: early zone reclaimMartin Hicks
This is the core of the (much simplified) early reclaim. The goal of this patch is to reclaim some easily-freed pages from a zone before falling back onto another zone. One of the major uses of this is NUMA machines. With the default allocator behavior the allocator would look for memory in another zone, which might be off-node, before trying to reclaim from the current zone. This adds a zone tuneable to enable early zone reclaim. It is selected on a per-zone basis and is turned on/off via syscall. Adding some extra throttling on the reclaim was also required (patch 4/4). Without the machine would grind to a crawl when doing a "make -j" kernel build. Even with this patch the System Time is higher on average, but it seems tolerable. Here are some numbers for kernbench runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run: wall user sys %cpu ctx sw. sleeps ---- ---- --- ---- ------ ------ No patch 1009 1384 847 258 298170 504402 w/patch, no reclaim 880 1376 667 288 254064 396745 w/patch & reclaim 1079 1385 926 252 291625 548873 These numbers are the average of 2 runs of 3 "make -j" runs done right after system boot. Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to seee that with reclaim the benchmark still finishes in a reasonable amount of time. I also looked at the NUMA hit/miss stats for the "make -j" runs and the reclaim doesn't make any difference when the machine is thrashing away. Doing a "make -j8" on a single node that is filled with page cache pages takes 700 seconds with reclaim turned on and 735 seconds without reclaim (due to remote memory accesses). The simple zone_reclaim syscall program is at http://www.bork.org/~mort/sgi/zone_reclaim.c Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-15Auto merge with /home/aegl/GIT/linusTony Luck
2005-05-17Merge with temp tree to get David's gdb inferior calls patchTony Luck
2005-05-10[IA64] Avoid .spillpsp directive in handcoded assemblyDavid Mosberger-Tang
Some time ago, GAS was fixed to bring the .spillpsp directive in line with the Intel assembler manual (there was some disagreement as to whether or not there is a built-in 16-byte offset). Unfortunately, there are two places in the kernel where this directive is used in handwritten assembly files and those of course relied on the "buggy" behavior. As a result, when using a "fixed" assembler, the kernel picks up the UNaT bits from the wrong place (off by 16) and randomly sets NaT bits on the scratch registers. This can be noticed easily by looking at a coredump and finding various scratch registers with unexpected NaT values. The patch below fixes this by using the .spillsp directive instead, which works correctly no matter what assembler is in use. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-05-03[IA64] fix typos caught by new assemblerDavid Mosberger-Tang
Patch below fixes 3 trivial typos which are caught by the new assembler (v2.169.90). Please apply. [Note: fix to memcpy that was also part of this patch was separately applied from patches by H.J. and Andreas ... so the delta here only has the other two fixes. -Tony] Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-05-01[PATCH] consolidate sys_shmatStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>