aboutsummaryrefslogtreecommitdiff
path: root/init
AgeCommit message (Collapse)Author
2008-02-03Move Kconfig.instrumentation to arch/Kconfig and init/KconfigMathieu Desnoyers
Move the instrumentation Kconfig to arch/Kconfig for architecture dependent options - oprofile - kprobes and init/Kconfig for architecture independent options - profiling - markers Remove the "Instrumentation Support" menu. Everything moves to "General setup". Delete the kernel/Kconfig.instrumentation file. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-02-03Create arch/KconfigMathieu Desnoyers
Puts the content of arch/Kconfig in the "General setup" menu. Linus: > Should it come with a re-duplication of it's content into each > architecture, which was the case previously ? The oprofile and kprobes > menu entries were litteraly cut and pasted from one architecture to > another. Should we put its content in init/Kconfig then ? I don't think it's a good idea to go back to making it per-architecture, although that extensive "depends on <list-of-archiectures-here>" might indicate that there certainly is room for cleanup there. And I don't think it's wrong keeping it in kernel/Kconfig.xyz per se, I just think it's wrong to (a) lump the code together when it really doesn't necessarily need to and (b) show it to users as some kind of choice that is tied together (whether it then has common code or not). On the per-architecture side, I do think it would be better to *not* have internal architecture knowledge in a generic file, and as such a line like depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32 really shouldn't exist in a file like kernel/Kconfig.instrumentation. It would be much better to do depends on ARCH_SUPPORTS_KPROBES in that generic file, and then architectures that do support it would just have a bool ARCH_SUPPORTS_KPROBES default y in *their* architecture files. That would seem to be much more logical, and is readable both for arch maintainers *and* for people who have no clue - and don't care - about which architecture is supposed to support which interface... Sam Ravnborg: Stuff it into a new file: arch/Kconfig We can then extend this file to include all the 'trailing' Kconfig things that are anyway equal for all ARCHs. But it should be kept clean - so if we introduce such a file then we should use ARCH_HAS_whatever in the arch specific Kconfig files to enable stuff that is not shared. [...] The above suggestion is actually not exactly the best way to do it... First the naming.. A quick grep shows following usage today (in Kconfig files) ARCH_HAS 51 ARCH_SUPPORTS 4 HAVE_ARCH 7 ARCH_HAS is the clear winner. In the common Kconfig file do: config FOO depends on ARCH_HAS_FOO bool "bla bla" config ARCH_HAS_FOO def_bool n In the arch specific Kconfig file in a suitable place do: config SUITABLE_OPTION select ARCH_HAS_FOO The naming of ARCH_HAS_ is fixed and shall be: ARCH_HAS_<config option it will enable> Only a single line added pr. architecture. And we will end up with a (maybe even commented) list of trivial selects. - Yet another update : Moving to HAVE_* now. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jeff Dike <jdike@addtoit.com> Cc: David Howells <dhowells@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-31RCU: add help text for "RCU implementation type"Paul E. McKenney
This patch supplies help text for the "RCU implementation type" kernel configuration choice. Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: do not PSE on CONFIG_DEBUG_PAGEALLOC=yIngo Molnar
get more testing of the c_p_a() code done by not turning off PSE on DEBUG_PAGEALLOC. this simplifies the early pagetable setup code, and tests the largepage-splitup code quite heavily. In the end, all the largepages will be split up pretty quickly, so there's no difference to how DEBUG_PAGEALLOC worked before. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86/non-x86: percpu, node ids, apic ids x86.git fixupMike Travis
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: optimize lock prefix switching to run less frequentlyAndi Kleen
On VMs implemented using JITs that cache translated code changing the lock prefixes is a quite costly operation that forces the JIT to throw away and retranslate a lot of code. Previously a SMP kernel would rewrite the locks once for each CPU which is quite unnecessary. This patch changes the code to never switch at boot in the normal case (SMP kernel booting with >1 CPU) or only once for SMP kernel on UP. This makes a significant difference in boot up performance on AMD SimNow! Also I expect it to be a little faster on native systems too because a smp switch does a lot of text_poke()s which each synchronize the pipeline. v1->v2: Rename max_cpus v1->v2: Fix off by one in UP check (Thomas Gleixner) Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30percpu: use a kconfig variable to signal arch specific percpu setuptravis@sgi.com
The use of the __GENERIC_PERCPU is a bit problematic since arches may want to run their own percpu setup while using the generic percpu definitions. Replace it through a kconfig variable. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-28kconfig: use environment optionRoman Zippel
Use the environment option to provide the ARCH symbol and the KERNELVERSION symbol. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28sh: syscall audit support.Yuichi Nakamura
Support syscall auditing.. Signed-off-by: Yuichi Nakamura <ynakam@hitachisoft.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2008-01-25Preempt-RCU: implementationPaul E. McKenney
This patch implements a new version of RCU which allows its read-side critical sections to be preempted. It uses a set of counter pairs to keep track of the read-side critical sections and flips them when all tasks exit read-side critical section. The details of this implementation can be found in this paper - http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf and the article- http://lwn.net/Articles/253651/ This patch was developed as a part of the -rt kernel development and meant to provide better latencies when read-side critical sections of RCU don't disable preemption. As a consequence of keeping track of RCU readers, the readers have a slight overhead (optimizations in the paper). This implementation co-exists with the "classic" RCU implementations and can be switched to at compiler. Also includes RCU tracing summarized in debugfs. [ akpm@linux-foundation.org: build fixes on non-preempt architectures ] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25cpu-hotplug: refcount based cpu hotplugGautham R Shenoy
This patch implements a Refcount + Waitqueue based model for cpu-hotplug. Now, a thread which wants to prevent cpu-hotplug, will bump up a global refcount and the thread which wants to perform a cpu-hotplug operation will block till the global refcount goes to zero. The readers, if any, during an ongoing cpu-hotplug operation are blocked until the cpu-hotplug operation is over. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Paul Jackson <pj@sgi.com> [For !CONFIG_HOTPLUG_CPU ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: selinux: make mls_compute_sid always polyinstantiate security/selinux: constify function pointer tables and fields security: add a secctx_to_secid() hook security: call security_file_permission from rw_verify_area security: remove security_sb_post_mountroot hook Security: remove security.h include from mm.h Security: remove security_file_mmap hook sparse-warnings (NULL as 0). Security: add get, set, and cloning of superblock security information security/selinux: Add missing "space"
2008-01-24sysfs: make SYSFS_DEPRECATED depend on SYSFSRandy Dunlap
Make SYSFS_DEPRECATED depend on SYSFS since files that check CONFIG_SYSFS_DEPRECATED don't check for CONFIG_SYSFS first. Also don't prompt user about SYSFS_DEPRECATED if SYSFS=n. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24Driver core: convert block from raw kobjects to core devicesKay Sievers
This moves the block devices to /sys/class/block. It will create a flat list of all block devices, with the disks and partitions in one directory. For compatibility /sys/block is created and contains symlinks to the disks. /sys/class/block |-- sda -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda |-- sda1 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1 |-- sda10 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda10 |-- sda5 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda5 |-- sda6 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda6 |-- sda7 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda7 |-- sda8 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda8 |-- sda9 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda9 `-- sr0 -> ../../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0 /sys/block/ |-- sda -> ../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda `-- sr0 -> ../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0 Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-25security: remove security_sb_post_mountroot hookH. Peter Anvin
The security_sb_post_mountroot() hook is long-since obsolete, and is fundamentally broken: it is never invoked if someone uses initramfs. This is particularly damaging, because the existence of this hook has been used as motivation for not using initramfs. Stephen Smalley confirmed on 2007-07-19 that this hook was originally used by SELinux but can now be safely removed: http://marc.info/?l=linux-kernel&m=118485683612916&w=2 Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Eric Paris <eparis@parisplace.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-01-02Unify /proc/slabinfo configurationLinus Torvalds
Both SLUB and SLAB really did almost exactly the same thing for /proc/slabinfo setup, using duplicate code and per-allocator #ifdef's. This just creates a common CONFIG_SLABINFO that is enabled by both SLUB and SLAB, and shares all the setup code. Maybe SLOB will want this some day too. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-06Pull bugzilla-9345 into release branchLen Brown
2007-12-02sched: cpu accounting controller (V2)Srivatsa Vaddagiri
Commit cfb5285660aad4931b2ebbfa902ea48a37dfffa1 removed a useful feature for us, which provided a cpu accounting resource controller. This feature would be useful if someone wants to group tasks only for accounting purpose and doesnt really want to exercise any control over their cpu consumption. The patch below reintroduces the feature. It is based on Paul Menage's original patch (Commit 62d0df64065e7c135d0002f069444fbdfc64768f), with these differences: - Removed load average information. I felt it needs more thought (esp to deal with SMP and virtualized platforms) and can be added for 2.6.25 after more discussions. - Convert group cpu usage to be nanosecond accurate (as rest of the cfs stats are) and invoke cpuacct_charge() from the respective scheduler classes - Make accounting scalable on SMP systems by splitting the usage counter to be per-cpu - Move the code from kernel/cpu_acct.c to kernel/sched.c (since the code is not big enough to warrant a new file and also this rightly needs to live inside the scheduler. Also things like accessing rq->lock while reading cpu usage becomes easier if the code lived in kernel/sched.c) The patch also modifies the cpu controller not to provide the same accounting information. Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran some simple tests like cpuspin (spin on the cpu), ran several tasks in the same group and timed them. Compared their time stamps with cpuacct.usage. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-20Freezer: Fix s2disk resume from initrdRafael J. Wysocki
Add appropriate freezer annotations to handle_initrd(), so that it's possible to resume from disk from an initrd. http://bugzilla.kernel.org/show_bug.cgi?id=9345 Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Chris Friedhoff <chris@friedhoff.org> Signed-off-by: Len Brown <len.brown@intel.com>
2007-11-23Blackfin arch: punt CONFIG_BFIN -- we already have CONFIG_BLACKFINMike Frysinger
Signed-off-by: Mike Frysinger <michael.frysinger@analog.com> Signed-off-by: Bryan Wu <bryan.wu@analog.com>
2007-11-14pidns: Place under CONFIG_EXPERIMENTALEric W. Biederman
This is my trivial patch to swat innumerable little bugs with a single blow. After some intensive review (my apologies for not having gotten to this sooner) what we have looks like a good base to build on with the current pid namespace code but it is not complete, and it is still much to simple to find issues where the kernel does the wrong thing outside of the initial pid namespace. Until the dust settles and we are certain we have the ABI and the implementation is as correct as humanly possible let's keep process ID namespaces behind CONFIG_EXPERIMENTAL. Allowing us the option of fixing any ABI or other bugs we find as long as they are minor. Allowing users of the kernel to avoid those bugs simply by ensuring their kernel does not have support for multiple pid namespaces. [akpm@linux-foundation.org: coding-style cleanups] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Adrian Bunk <bunk@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Kir Kolyshkin <kir@swsoft.com> Cc: Kirill Korotaev <dev@sw.ru> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14revert "Task Control Groups: example CPU accounting subsystem"Andrew Morton
Revert 62d0df64065e7c135d0002f069444fbdfc64768f. This was originally intended as a simple initial example of how to create a control groups subsystem; it wasn't intended for mainline, but I didn't make this clear enough to Andrew. The CFS cgroup subsystem now has better functionality for the per-cgroup usage accounting (based directly on CFS stats) than the "usage" status file in this patch, and the "load" status file is rather simplistic - although having a per-cgroup load average report would be a useful feature, I don't believe this patch actually provides it. If it gets into the final 2.6.24 we'd probably have to support this interface for ever. Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-09sched: proper prototype for kernel/sched.c:migration_init()Adrian Bunk
This patch adds a proper prototype for migration_init() in include/linux/sched.h Since there's no point in always returning 0 to a caller that doesn't check the return value it also changes the function to return void. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24sched: mark CONFIG_FAIR_GROUP_SCHED as !EXPERIMENTALIngo Molnar
mark CONFIG_FAIR_GROUP_SCHED as !EXPERIMENTAL. All bugs have been fixed and it's perfect ;-) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-21[PATCH] audit: watching subtreesAl Viro
New kind of audit rule predicates: "object is visible in given subtree". The part that can be sanely implemented, that is. Limitations: * if you have hardlink from outside of tree, you'd better watch it too (or just watch the object itself, obviously) * if you mount something under a watched tree, tell audit that new chunk should be added to watched subtrees * if you umount something in a watched tree and it's still mounted elsewhere, you will get matches on events happening there. New command tells audit to recalculate the trees, trimming such sources of false positives. Note that it's _not_ about path - if something mounted in several places (multiple mount, bindings, different namespaces, etc.), the match does _not_ depend on which one we are using for access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2007-10-20spelling fixes: init/Simon Arlott
Spelling fix in init/. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19Drop the superfluous test for an old version of gcc.Robert P. J. Day
The header file <linux/compiler.h> already enforces a suitably recent version of gcc, so there's no point checking for that again. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19Hook up group scheduler with control groupsSrivatsa Vaddagiri
Enable "cgroup" (formerly containers) based fair group scheduling. This will let administrator create arbitrary groups of tasks (using "cgroup" pseudo filesystem) and control their cpu bandwidth usage. [akpm@linux-foundation.org: fix cpp condition] Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19cgroups: implement namespace tracking subsystemSerge E. Hallyn
When a task enters a new namespace via a clone() or unshare(), a new cgroup is created and the task moves into it. This version names cgroups which are automatically created using cgroup_clone() as "node_<pid>" where pid is the pid of the unsharing or cloned process. (Thanks Pavel for the idea) This is safe because if the process unshares again, it will create /cgroups/(...)/node_<pid>/node_<pid> The only possibilities (AFAICT) for a -EEXIST on unshare are 1. pid wraparound 2. a process fails an unshare, then tries again. Case 1 is unlikely enough that I ignore it (at least for now). In case 2, the node_<pid> will be empty and can be rmdir'ed to make the subsequent unshare() succeed. Changelog: Name cloned cgroups as "node_<pid>". [clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19Task Control Groups: simple task cgroup debug info subsystemPaul Menage
This example subsystem exports debugging information as an aid to diagnosing refcount leaks, etc, in the cgroup framework. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19Task Control Groups: example CPU accounting subsystemPaul Menage
This example demonstrates how to use the generic cgroup subsystem for a simple resource tracker that counts, for the processes in a cgroup, the total CPU time used and the %CPU used in the last complete 10 second interval. Portions contributed by Balbir Singh <balbir@in.ibm.com> Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19Task Control Groups: make cpusets a client of cgroupsPaul Menage
Remove the filesystem support logic from the cpusets system and makes cpusets a cgroup subsystem The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get passed through to the cgroup filesystem with the appropriate options to emulate the old cpuset filesystem behaviour. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19Task Control Groups: basic task cgroup frameworkPaul Menage
Generic Process Control Groups -------------------------- There have recently been various proposals floating around for resource management/accounting and other task grouping subsystems in the kernel, including ResGroups, User BeanCounters, NSProxy cgroups, and others. These all need the basic abstraction of being able to group together multiple processes in an aggregate, in order to track/limit the resources permitted to those processes, or control other behaviour of the processes, and all implement this grouping in different ways. This patchset provides a framework for tracking and grouping processes into arbitrary "cgroups" and assigning arbitrary state to those groupings, in order to control the behaviour of the cgroup as an aggregate. The intention is that the various resource management and virtualization/cgroup efforts can also become task cgroup clients, with the result that: - the userspace APIs are (somewhat) normalised - it's easier to test e.g. the ResGroups CPU controller in conjunction with the BeanCounters memory controller, or use either of them as the resource-control portion of a virtual server system. - the additional kernel footprint of any of the competing resource management systems is substantially reduced, since it doesn't need to provide process grouping/containment, hence improving their chances of getting into the kernel This patch: Add the main task cgroups framework - the cgroup filesystem, and the basic structures for tracking membership and associating subsystem state objects to tasks. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18sparse pointer use of zero as nullStephen Hemminger
Get rid of sparse related warnings from places that use integer as NULL pointer. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Jeff Garzik <jeff@garzik.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Ian Kent <raven@themaw.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Move PREEMPT_NOTIFIERS into an always-included KconfigAvi Kivity
Kconfig.preempt is not included on some archs (for example, m68k). On those archs, the Kconfig machinery complains that KVM selects an undefined symbol PREEMPT_NOTIFIERS (which lives in Kconfig.preempt). So move the offending symbol into a Kconfig file which is included by everyone. Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuildLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits) kbuild: introduce ccflags-y, asflags-y and ldflags-y kbuild: enable 'make CPPFLAGS=...' to add additional options to CPP kbuild: enable use of AFLAGS and CFLAGS on commandline kbuild: enable 'make AFLAGS=...' to add additional options to AS kbuild: fix AFLAGS use in h8300 and m68knommu kbuild: check for wrong use of CFLAGS kbuild: enable 'make CFLAGS=...' to add additional options to CC kbuild: fix up CFLAGS usage kbuild: make modpost detect unterminated device id lists kbuild: call export_report from the Makefile kbuild: move Kai Germaschewski to CREDITS kconfig/menuconfig: distinguish between selected-by-another options and comments kconfig: tristate choices with mixed tristate and boolean values include/linux/Kbuild: remove duplicate entries kbuild: kill backward compatibility checks kbuild: kill EXTRA_ARFLAGS kbuild: fix documentation in makefiles.txt kbuild: call make once for all targets when O=.. is used kbuild: pass -g to assembler under CONFIG_DEBUG_INFO kbuild: update _shipped files for kconfig syntax cleanup ... Fix up conflicts in arch/um/sys-{x86_64,i386}/Makefile manually.
2007-10-16remove PAGE_GROUP_BY_MOBILITYMel Gorman
Grouping pages by mobility can be disabled at compile-time. This was considered undesirable by a number of people. However, in the current stack of patches, it is not a simple case of just dropping the configurable patch as it would cause merge conflicts. This patch backs out the configuration option. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16Add a configure option to group pages by mobilityMel Gorman
The grouping mechanism has some memory overhead and a more complex allocation path. This patch allows the strategy to be disabled for small memory systems or if it is known the workload is suffering because of the strategy. It also acts to show where the page groupings strategy interacts with the standard buddy allocator. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Joel Schopp <jschopp@austin.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16slow down printk during bootRandy Dunlap
Optionally add a boot delay after each kernel printk() call, crudely measured in milliseconds, with a maximum delay of 10 seconds per printk. Enable CONFIG_BOOT_PRINTK_DELAY=y and then add (e.g.): "lpj=loops_per_jiffy boot_delay=100" to the kernel command line. It has been useful in cases like "during boot, my machine just reboots or the screen goes black" by slowing down printk, (and adding initcall_debug), we can usually see the last thing that happened before the lights went out which is usually a valuable clue. [akpm@linux-foundation.org: not all architectures implement CONFIG_HZ] [akpm@linux-foundation.org: fix lots of stuff] [bunk@stusta.de: kernel/printk.c: make 2 variables static] [heiko.carstens@de.ibm.com: fix slow down printk on boot compile error] Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-15sched: group scheduler, fix coding style issuesSrivatsa Vaddagiri
Fix coding style issues reported by Randy Dunlap and others Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: enable CONFIG_FAIR_GROUP_SCHED=y by defaultIngo Molnar
enable CONFIG_FAIR_GROUP_SCHED=y by default. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: fair-group sched, cleanupsIngo Molnar
fair-group sched, cleanups. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: add fair-user schedulerSrivatsa Vaddagiri
Enable user-id based fair group scheduling. This is useful for anyone who wants to test the group scheduler w/o having to enable CONFIG_CGROUPS. A separate scheduling group (i.e struct task_grp) is automatically created for every new user added to the system. Upon uid change for a task, it is made to move to the corresponding scheduling group. A /proc tunable (/proc/root_user_share) is also provided to tune root user's quota of cpu bandwidth. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: clean up code under CONFIG_FAIR_GROUP_SCHEDSrivatsa Vaddagiri
With the view of supporting user-id based fair scheduling (and not just container-based fair scheduling), this patch renames several functions and makes them independent of whether they are being used for container or user-id based fair scheduling. Also fix a problem reported by KAMEZAWA Hiroyuki (wrt allocating less-sized array for tg->cfs_rq[] and tf->se[]). Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: group-scheduler coreSrivatsa Vaddagiri
Add interface to control cpu bandwidth allocation to task-groups. (not yet configurable, due to missing CONFIG_CONTAINERS) Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-14kbuild: enable 'make CFLAGS=...' to add additional options to CCSam Ravnborg
The variable CFLAGS is a wellknown variable and the usage by kbuild may result in unexpected behaviour. On top of that several people over time has asked for a way to pass in additional flags to gcc. This patch replace use of CFLAGS with KBUILD_CFLAGS all over the tree and enabling one to use: make CFLAGS=... to specify additional gcc commandline options. One usecase is when trying to find gcc bugs but other use cases has been requested too. Patch was tested on following architectures: alpha, arm, i386, x86_64, mips, sparc, sparc64, ia64, m68k Test was simple to do a defconfig build, apply the patch and check that nothing got rebuild. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2007-09-19disable sys_timerfd() for 2.6.23Andrew Morton
There is still some confusion and disagreement over what this interface should actually do. So it is best that we disable it in 2.6.23 until we get that fully sorted out. (sys_timerfd() was present in 2.6.22 but it was apparently broken, so here we assume that nobody is using it yet). Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Davide Libenzi <davidel@xmailserver.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19Fix failure to resume from initrdsNigel Cunningham
Commit 831441862956fffa17b9801db37e6ea1650b0f69 (Freezer: make kernel threads nonfreezable by default) breaks freezing when attempting to resume from an initrd, because the init (which is freezeable) spins while waiting for another thread to run /linuxrc, but doesn't check whether it has been told to enter the refrigerator. The original patch replaced a call to try_to_freeze() with a call to yield(). I believe a simple reversion is wrong because if !CONFIG_PM_SLEEP, try_to_freeze() is a noop. It should still yield. Signed-off-by: Nigel Cunningham <nigel@nigel.suspend2.net> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-30fix maxcpus=1 oops in show_stat()Hugh Dickins
Alexey Dobriyan reports that maxcpus=1 is still broken in 2.6.23-rc4: if CONFIG_HOTPLUG_CPU is not set, x86_64 bootup oopses in show_stat() - for_each_possible_cpu accesses a per-cpu area which was never set up. Alexey identified commit 61ec7567db103d537329b0db9a887db570431ff4 (ACPI: boot correctly with "nosmp" or "maxcpus=0") as the origin; but it's not really to blame, just exposes a bug in 2.6.23-rc1's commit 8b3b295502444340dd0701855ac422fbf32e161d (Especially when !CONFIG_HOTPLUG_CPU, avoid needlessy allocating resources for CPUs that can never become available). rc1's test for max_cpus < 2 in start_kernel() wasn't working because max_cpus was still NR_CPUS at that point: until rc4 moved the maxcpus parsing earlier. Now it sets cpu_possible_map to 1 before allocating all possible per-cpu areas; then smp_init() expands cpu_possible_map to cpu_present_map (0xf in my case) later on. rc1's commit has good intentions, but expects cpu_present_map to be limited by maxcpus, which is only the case on i386. cpus_and(possible, possible,present) might be good, but needs an audit of cpu_present_map uses - there may well be assumptions that any cpu present is possible. So stay safe for now and just revert those #ifndef CONFIG_HOTPLUG_CPU optimizations in rc1's commit. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Len Brown <lenb@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-27fix maxcpus=N parsingHugh Dickins
Commit 61ec7567db103d537329b0db9a887db570431ff4 ('ACPI: boot correctly with "nosmp" or "maxcpus=0"') broke 'maxcpus=' handling on x86[-64]. maxcpus=N is now having no effect on x86_64, and freezing bootup on i386 (because of inconsistency with the separate maxcpus parsing down in arch/i386, I guess). That's because early_param parsing is a little different from __setup parsing, and needs the "=" omitted: then it seems to work as the original commit intended (no mention of IO-APIC in /proc/interrupts when maxcpus=0). Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Len Brown <len.brown@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>