aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2007-07-16taskstats: add context-switch countersMaxim Uvarov
Make available to the user the following task and process performance statistics: * Involuntary Context Switches (task_struct->nivcsw) * Voluntary Context Switches (task_struct->nvcsw) Statistics information is available from: 1. taskstats interface (Documentation/accounting/) 2. /proc/PID/status (task only). This data is useful for detecting hyperactivity patterns between processes. [akpm@linux-foundation.org: cleanup] Signed-off-by: Maxim Uvarov <muvarov@ru.mvista.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Jonathan Lim <jlim@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Remove capability.h from mm.hAlexey Dobriyan
I forgot to remove capability.h from mm.h while removing sched.h! This patch remedies that, because the only inline function which was using CAP_something was made out of line. Cross-compile tested without regressions on: all powerpc defconfigs all mips defconfigs all m68k defconfigs all arm defconfigs all ia64 defconfigs alpha alpha-allnoconfig alpha-defconfig alpha-up arm i386 i386-allnoconfig i386-defconfig i386-up ia64 ia64-allnoconfig ia64-defconfig ia64-up m68k mips parisc parisc-allnoconfig parisc-defconfig parisc-up powerpc powerpc-up s390 s390-allnoconfig s390-defconfig s390-up sparc sparc-allnoconfig sparc-defconfig sparc-up sparc64 sparc64-allnoconfig sparc64-defconfig sparc64-up um-x86_64 x86_64 x86_64-allnoconfig x86_64-defconfig x86_64-up as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Add a flag to indicate deferrable timers in /proc/timer_statsVenki Pallipadi
Add a flag in /proc/timer_stats to indicate deferrable timers. This will let developers/users to differentiate between types of tiemrs in /proc/timer_stats. Deferrable timer and normal timer will appear in /proc/timer_stats as below. 10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) 10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) Also version of timer_stats changes from v0.1 to v0.2 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16add printk.time option, deprecate 'time'Randy Dunlap
Allow printk_time to be enabled or disabled at boot time. Previously it could be enabled only, but not disabled. Change printk_time from an int to a bool since that's what it is. Make its logical (exposed) name just be "time" (was "printk_time"). Note: Changes kernel boot option syntax from "time" to "printk.time=value". Since printk_time is declared as a module_param, it can also be changed at run-time by modifying /sys/module/printk/parameters/time to a value of 1/Y/y to enabled it or 0/N/n to disable it. Since printk_time is declared as a module_param, its value can also be set at boot-time by using linux printk.time=<bool> If the "time" boot option is used, print a message that it is deprecated and will be removed. Note its planned removal in feature-removal-schedule.txt. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Remove clockevents_{release,request}_deviceAndi Kleen
Not called by anything in tree. Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Reduce cpuset.c write_lock_irq() to read_lock()Paul Menage
cpuset.c:update_nodemask() uses a write_lock_irq() on tasklist_lock to block concurrent forks; a read_lock() suffices and is less intrusive. Signed-off-by: Paul Menage<menage@google.com> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16vdso: print fatal signalsIngo Molnar
Add the print-fatal-signals=1 boot option and the /proc/sys/kernel/print-fatal-signals runtime switch. This feature prints some minimal information about userspace segfaults to the kernel console. This is useful to find early bootup bugs where userspace debugging is very hard. Defaults to off. [akpm@linux-foundation.org: Don't add new sysctl numbers] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Make /proc/modules use seq_list_xxx helpersPavel Emelianov
Here there is not need even in .show callback altering. The original code passes list_head in *v. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16cpu hotplug: fix ksoftirqd termination on cpu hotplug with naughty realtime ↵Satoru Takeuchi
process Fix ksoftirqd termination on cpu hotplug with naughty real time process. Assuming the following case: - Try to hot remove CPU2 from CPU1. - There is a real time process on CPU2, and that process doesn't sleep at all. - That rt process and ksoftirqd/2 is migrated to the CPU0 Then ksoftirqd/2 can't stop becasue that rt process runs everlastingly on CPU0, and CPU1 waiting the ksoftirqd/2's termination hangs up. To fix this problem, set the priority of ksoftirqd/2 to max one before kthread_stop(). [akpm@linux-foundation.org: fix warning] Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Fix stop_machine_run problem with naughty real time processSatoru Takeuchi
stop_machine_run() does its work on "kstopmachine" thread having max priority. However that thread get such priority after woken up. Therefore, in the following case ... - "kstopmachine" try to run on CPU1 - There is a real time process which doesn't relinquish CPU time voluntary on CPU1 ... "kstopmachine" can't start to run and the CPU on which stop_machine_run() is runing hangs up. To fix this problem, call sched_setscheduler() before waking up that thread. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Use boot based time for uptime in /procTomas Janousek
Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused uptime not to increase during suspend. This may cause confusion so I restore the old behaviour by using the boot based time instead of monotonic for uptime. Signed-off-by: Tomas Janousek <tjanouse@redhat.com> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Use boot based time for process start time and boot time in /procTomas Janousek
Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused boot time to move and process start times to become invalid after suspend. Using boot based time for those restores the old behaviour and fixes the issue. [akpm@linux-foundation.org: little cleanup] Signed-off-by: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Introduce boot based timeTomas Janousek
The commits 411187fb05cd11676b0979d9fbf3291db69dbce2 (GTOD: persistent clock support) c1d370e167d66b10bca3b602d3740405469383de (i386: use GTOD persistent clock support) changed the monotonic time so that it no longer jumps after resume, but it's not possible to use it for boot time and process start time calculations then. Also, the uptime no longer increases during suspend. I add a variable to track the wall_to_monotonic changes, a function to get the real boot time and a function to get the boot based time from the monotonic one. [akpm@linux-foundation.org: remove exports, add comment] Signed-off-by: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Use write_trylock_irqsave in ptrace_attachSripathi Kodi
This patch makes ptrace_attach use write_trylock_irqsave(). [akpm@linux-foundation.org: remove unneeded initialisation] Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Add generic exit-time stack-depth checking to CONFIG_DEBUG_STACK_USAGEJeff Dike
Add generic exit-time stack-depth checking to CONFIG_DEBUG_STACK_USAGE. This also adds UML support. Tested on UML and i386. [akpm@linux-foundation.org: cleanups, speedups, tweaks] Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16mm: fix improper .init-type section referencesJan Beulich
.. which modpost started warning about. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16change zonelist order: zonelist order selection logicKAMEZAWA Hiroyuki
Make zonelist creation policy selectable from sysctl/boot option v6. This patch makes NUMA's zonelist (of pgdat) order selectable. Available order are Default(automatic)/ Node-based / Zone-based. [Default Order] The kernel selects Node-based or Zone-based order automatically. [Node-based Order] This policy treats the locality of memory as the most important parameter. Zonelist order is created by each zone's locality. This means lower zones (ex. ZONE_DMA) can be used before higher zone (ex. ZONE_NORMAL) exhausion. IOW. ZONE_DMA will be in the middle of zonelist. current 2.6.21 kernel uses this. Pros. * A user can expect local memory as much as possible. Cons. * lower zone will be exhansted before higher zone. This may cause OOM_KILL. Maybe suitable if ZONE_DMA is relatively big and you never see OOM_KILL because of ZONE_DMA exhaution and you need the best locality. (example) assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL. *node(0)'s memory allocation order: node(0)'s NORMAL -> node(0)'s DMA -> node(1)'s NORMAL. *node(1)'s memory allocation order: node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA. [Zone-based order] This policy treats the zone type as the most important parameter. Zonelist order is created by zone-type order. This means lower zone never be used bofere higher zone exhaustion. IOW. ZONE_DMA will be always at the tail of zonelist. Pros. * OOM_KILL(bacause of lower zone) occurs only if the whole zones are exhausted. Cons. * memory locality may not be best. (example) assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL. *node(0)'s memory allocation order: node(0)'s NORMAL -> node(1)'s NORMAL -> node(0)'s DMA. *node(1)'s memory allocation order: node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA. bootoption "numa_zonelist_order=" and proc/sysctl is supporetd. command: %echo N > /proc/sys/vm/numa_zonelist_order Will rebuild zonelist in Node-based order. command: %echo Z > /proc/sys/vm/numa_zonelist_order Will rebuild zonelist in Zone-based order. Thanks to Lee Schermerhorn, he gives me much help and codes. [Lee.Schermerhorn@hp.com: add check_highest_zone to build_zonelists_in_zone_order] [akpm@linux-foundation.org: build fix] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: "jesse.barnes@intel.com" <jesse.barnes@intel.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16serial: convert early_uart to earlycon for 8250Yinghai Lu
Beacuse SERIAL_PORT_DFNS is removed from include/asm-i386/serial.h and include/asm-x86_64/serial.h. the serial8250_ports need to be probed late in serial initializing stage. the console_init=>serial8250_console_init=> register_console=>serial8250_console_setup will return -ENDEV, and console ttyS0 can not be enabled at that time. need to wait till uart_add_one_port in drivers/serial/serial_core.c to call register_console to get console ttyS0. that is too late. Make early_uart to use early_param, so uart console can be used earlier. Make it to be bootconsole with CON_BOOT flag, so can use console handover feature. and it will switch to corresponding normal serial console automatically. new command line will be: console=uart8250,io,0x3f8,9600n8 console=uart8250,mmio,0xff5e0000,115200n8 or earlycon=uart8250,io,0x3f8,9600n8 earlycon=uart8250,mmio,0xff5e0000,115200n8 it will print in very early stage: Early serial console at I/O port 0x3f8 (options '9600n8') console [uart0] enabled later for console it will print: console handover: boot [uart0] -> real [ttyS0] Signed-off-by: <yinghai.lu@sun.com> Cc: Andi Kleen <ak@suse.de> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Gerd Hoffmann <kraxel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16console: console handover to preferred consoleYinghai Lu
for earlyprintk=ttyS0,9600 console=tty0 console=ttyS0,9600n8 the handover will happen from earlyser0 to tty0. but what we want is to hand over to ttyS0. Later with serial-convert-early_uart-to-earlycon-for-8250.patch, console=tty0 console=uart8250,io,0x3f8,9600n8 will handover to ttyS0 instead of tty0. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Andi Kleen <ak@suse.de> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Gerd Hoffmann <kraxel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16console: more buf for index parsingYinghai Lu
Change name to buf according to the usage as name + index Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Andi Kleen <ak@suse.de> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Gerd Hoffmann <kraxel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-13CFS: Fix missing digit off in wmult tableThomas Gleixner
Roman Zippel noticed another inconsistency of the wmult table. wmult[16] has a missing digit. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-13Merge branch 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-blockLinus Torvalds
* 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block: splice: fix offset mangling with direct splicing (sendfile) security: revalidate rw permissions for sys_splice and sys_vmsplice relay: fixup kerneldoc comment relay: fix bogus cast in subbuf_splice_actor()
2007-07-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-schedLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: [PATCH] sched: small topology.h cleanup [PATCH] sched: fix show_task()/show_tasks() output [PATCH] sched: remove stale version info from kernel/sched_debug.c [PATCH] sched: allow larger granularity [PATCH] sched: fix prio_to_wmult[] for nice 1 [ I re-did the commits to get rid of some bogus merge commit that Ingo had. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-13[PATCH] sched: fix show_task()/show_tasks() outputIngo Molnar
fix show_task()/show_tasks() output: - there's no sibling info anymore - the fields were not aligned properly with the description - get rid of the lazy-TLB output: it's been quite some time since we last had a bug there, and when we had a bug it wasnt helped a bit by this debug output. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-13[PATCH] sched: remove stale version info from kernel/sched_debug.cIngo Molnar
kernel/sched_debug.c referred to CFS -v20, but there's no CFS versioning needed within the upstream kernel. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-13[PATCH] sched: allow larger granularityIngo Molnar
Allow granularity up to 100 msecs, instead of 10 msecs. (needed on larger boxes) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-13[PATCH] sched: fix prio_to_wmult[] for nice 1Mike Galbraith
There's a typo in the values in prio_to_wmult[] for nice level 1. While it did not cause bad CPU distribution, but caused more rescheduling between nice-0 and nice-1 tasks than necessary. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-13relay: fixup kerneldoc commentTom Zanussi
Change comment from kerneldoc to normal. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-13relay: fix bogus cast in subbuf_splice_actor()Tom Zanussi
The current code that sets the read position in subbuf_splice_actor may give erroneous results if the buffer size isn't a power of 2. This patch fixes the problem. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: security: unexport mmap_min_addr SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel security: Protection for exploiting null dereference using mmap SELinux: Use %lu for inode->i_no when printing avc SELinux: allow preemption between transition permission checks selinux: introduce schedule points in policydb_destroy() selinux: add selinuxfs structure for object class discovery selinux: change sel_make_dir() to specify inode counter. selinux: rename sel_remove_bools() for more general usage. selinux: add support for querying object classes and permissions from the running policy
2007-07-11security: Protection for exploiting null dereference using mmapEric Paris
Add a new security check on mmap operations to see if the user is attempting to mmap to low area of the address space. The amount of space protected is indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to 0, preserving existing behavior. This patch uses a new SELinux security class "memprotect." Policy already contains a number of allow rules like a_t self:process * (unconfined_t being one of them) which mean that putting this check in the process class (its best current fit) would make it useless as all user processes, which we also want to protect against, would be allowed. By taking the memprotect name of the new class it will also make it possible for us to move some of the other memory protect permissions out of 'process' and into the new class next time we bump the policy version number (which I also think is a good future idea) Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2007-07-11sysfs: kill unnecessary attribute->ownerTejun Heo
sysfs is now completely out of driver/module lifetime game. After deletion, a sysfs node doesn't access anything outside sysfs proper, so there's no reason to hold onto the attribute owners. Note that often the wrong modules were accounted for as owners leading to accessing removed modules. This patch kills now unnecessary attribute->owner. Note that with this change, userland holding a sysfs node does not prevent the backing module from being unloaded. For more info regarding lifetime rule cleanup, please read the following message. http://article.gmane.org/gmane.linux.kernel/510293 (tweaked by Greg to not delete the field just yet, to make it easier to merge things properly.) Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-10pipe: change the ->pin() operation to ->confirm()Jens Axboe
The name 'pin' was badly chosen, it doesn't pin a pipe buffer in the most commonly used sense in the kernel. So change the name to 'confirm', after debating this issue with Hugh Dickins a bit. A good return from ->confirm() means that the buffer is really there, and that the contents are good. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10relay: use splice_to_pipe() instead of open-coding the pipe loopJens Axboe
It cleans up the relay splice implementation a lot, and gets rid of a lot of internal pipe knowledge that should not be in there. Plus fixes for padding and partial first page (and lots more) from Tom Zanussi. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10splice: divorce the splice structure/function definitions from the pipe headerJens Axboe
We need to move even more stuff into the header so that folks can use the splice_to_pipe() implementation instead of open-coding a lot of pipe knowledge (see relay implementation), so move to our own header file finally. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10splice: relay supportTom Zanussi
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-09sched: add CFS creditsIngo Molnar
add credits for recent major scheduler contributions: Con Kolivas, for pioneering the fair-scheduling approach Peter Williams, for smpnice Mike Galbraith, for interactivity tuning of CFS Srivatsa Vaddagiri, for group scheduling enhancements Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: clean up sleep_on() APIsIngo Molnar
clean up the sleep_on() APIs: - do not use fastcall - replace fragile macro magic with proper inline functions Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: style cleanupsIngo Molnar
4 small style cleanups to sched.c: checkpatch.pl is now happy about the totality of sched.c [ignoring false positives] - yay! ;-) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: do not set softirqs to nice +19Ingo Molnar
do not set softirqs to nice +19. _If_ for whatever reason we missed to process some high-prio softirq and woke up ksoftirqd, we should give it a fair chance to actually get some work done, even if the system is under load. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: scheduler debugging, coreIngo Molnar
scheduler debugging core: implement /proc/sched_debug and /proc/<PID>/sched files for scheduler debugging. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: add CFS debug sysctlsIngo Molnar
add CFS debug sysctls: only tweakable if SCHED_DEBUG is enabled. This allows for faster debugging of scheduler problems. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: remove unused rq types from sched.cIngo Molnar
remove unused rq types from sched.c, now that we switched over to CFS. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: remove interactivity typesIngo Molnar
remove now unused interactivity-heuristics related defined and types of the old scheduler. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: clean up include files in sched.cIngo Molnar
clean up include files in sched.c, they were still old-style <asm/>. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: update delay-accounting to use CFS's precise statsBalbir Singh
update delay-accounting to use CFS's precise stats. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: turn on the use of unstable eventsIngo Molnar
make use of sched-clock-unstable events. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: x86, track TSC-unstable eventsIngo Molnar
track TSC-unstable events and propagate it to the scheduler code. Also allow sched_clock() to be used when the TSC is unstable, the rq_clock() wrapper creates a reliable clock out of it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: cfs core codeIngo Molnar
apply the CFS core code. this change switches over the scheduler core to CFS's modular design and makes use of kernel/sched_fair/rt/idletask.c to implement Linux's scheduling policies. thanks to Andrew Morton and Thomas Gleixner for lots of detailed review feedback and for fixlets. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
2007-07-09sched: remove the sleep-bonus interactivity codeIngo Molnar
remove the sleep-bonus interactivity code from the core scheduler. scheduling policy is implemented in the policy modules, and CFS does not need such type of heuristics. Signed-off-by: Ingo Molnar <mingo@elte.hu>