From c6d57f3312a6619d47c5557b5f6154a74d04ff80 Mon Sep 17 00:00:00 2001 From: Paul Menage Date: Wed, 23 Sep 2009 15:56:19 -0700 Subject: cgroups: support named cgroups hierarchies To simplify referring to cgroup hierarchies in mount statements, and to allow disambiguation in the presence of empty hierarchies and multiply-bindable subsystems this patch adds support for naming a new cgroup hierarchy via the "name=" mount option A pre-existing hierarchy may be specified by either name or by subsystems; a hierarchy's name cannot be changed by a remount operation. Example usage: # To create a hierarchy called "foo" containing the "cpu" subsystem mount -t cgroup -oname=foo,cpu cgroup /mnt/cgroup1 # To mount the "foo" hierarchy on a second location mount -t cgroup -oname=foo cgroup /mnt/cgroup2 Signed-off-by: Paul Menage Reviewed-by: Li Zefan Cc: KAMEZAWA Hiroyuki Cc: Balbir Singh Cc: Dhaval Giani Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/cgroups.txt | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 6eb1a97e88c..4bccfc19196 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -408,6 +408,26 @@ You can attach the current shell task by echoing 0: # echo 0 > tasks +2.3 Mounting hierarchies by name +-------------------------------- + +Passing the name= option when mounting a cgroups hierarchy +associates the given name with the hierarchy. This can be used when +mounting a pre-existing hierarchy, in order to refer to it by name +rather than by its set of active subsystems. Each hierarchy is either +nameless, or has a unique name. + +The name should match [\w.-]+ + +When passing a name= option for a new hierarchy, you need to +specify subsystems manually; the legacy behaviour of mounting all +subsystems when none are explicitly specified is not supported when +you give a subsystem a name. + +The name of the subsystem appears as part of the hierarchy description +in /proc/mounts and /proc//cgroups. + + 3. Kernel API ============= -- cgit v1.2.3 From be367d09927023d081f9199665c8500f69f14d22 Mon Sep 17 00:00:00 2001 From: Ben Blum Date: Wed, 23 Sep 2009 15:56:31 -0700 Subject: cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time Alter the ss->can_attach and ss->attach functions to be able to deal with a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a pre-patch to cgroup-procs-writable.patch.) Currently, new mode of the attach function can only tell the subsystem about the old cgroup of the threadgroup leader. No subsystem currently needs that information for each thread that's being moved, but if one were to be added (for example, one that counts tasks within a group) this bit would need to be reworked a bit to tell the subsystem the right information. [hidave.darkstar@gmail.com: fix build] Signed-off-by: Ben Blum Signed-off-by: Paul Menage Acked-by: Li Zefan Reviewed-by: Matt Helsley Cc: "Eric W. Biederman" Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Dave Young Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/cgroups.txt | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 4bccfc19196..455d4e6d346 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -521,7 +521,7 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be called multiple times against a cgroup. int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct task_struct *task) + struct task_struct *task, bool threadgroup) (cgroup_mutex held by caller) Called prior to moving a task into a cgroup; if the subsystem @@ -529,14 +529,20 @@ returns an error, this will abort the attach operation. If a NULL task is passed, then a successful result indicates that *any* unspecified task can be moved into the cgroup. Note that this isn't called on a fork. If this method returns 0 (success) then this should -remain valid while the caller holds cgroup_mutex. +remain valid while the caller holds cgroup_mutex. If threadgroup is +true, then a successful result indicates that all threads in the given +thread's threadgroup can be moved together. void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct cgroup *old_cgrp, struct task_struct *task) + struct cgroup *old_cgrp, struct task_struct *task, + bool threadgroup) (cgroup_mutex held by caller) Called after the task has been attached to the cgroup, to allow any post-attachment activity that requires memory allocations or blocking. +If threadgroup is true, the subsystem should take care of all threads +in the specified thread's threadgroup. Currently does not support any +subsystem that might need the old_cgrp for every thread in the group. void fork(struct cgroup_subsy *ss, struct task_struct *task) -- cgit v1.2.3 From 4b3bde4c983de36c59e6c1a24701f6fe816f9f55 Mon Sep 17 00:00:00 2001 From: Balbir Singh Date: Wed, 23 Sep 2009 15:56:32 -0700 Subject: memcg: remove the overhead associated with the root cgroup Change the memory cgroup to remove the overhead associated with accounting all pages in the root cgroup. As a side-effect, we can no longer set a memory hard limit in the root cgroup. A new flag to track whether the page has been accounted or not has been added as well. Flags are now set atomically for page_cgroup, pcg_default_flags is now obsolete and removed. [akpm@linux-foundation.org: fix a few documentation glitches] Signed-off-by: Balbir Singh Signed-off-by: Daisuke Nishimura Reviewed-by: KAMEZAWA Hiroyuki Cc: Daisuke Nishimura Cc: Li Zefan Cc: Paul Menage Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/memory.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 23d1262c077..ab0a02172cf 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -179,6 +179,9 @@ The reclaim algorithm has not been modified for cgroups, except that pages that are selected for reclaiming come from the per cgroup LRU list. +NOTE: Reclaim does not work for the root cgroup, since we cannot set any +limits on the root cgroup. + 2. Locking The memory controller uses the following hierarchy @@ -210,6 +213,7 @@ We can alter the memory limit: NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, mega or gigabytes. NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). +NOTE: We cannot set limits on the root cgroup any more. # cat /cgroups/0/memory.limit_in_bytes 4194304 -- cgit v1.2.3 From a6df63615b943dbef22df04c19f4506330fe835e Mon Sep 17 00:00:00 2001 From: Balbir Singh Date: Wed, 23 Sep 2009 15:56:34 -0700 Subject: memory controller: soft limit documentation Soft limits is a new feature for the memory resource controller, something similar has existed in the group scheduler in the form of shares. The CPU controllers interpretation of shares is very different though. Soft limits are the most useful feature to have for environments where the administrator wants to overcommit the system, such that only on memory contention do the limits become active. The current soft limits implementation provides a soft_limit_in_bytes interface for the memory controller and not for memory+swap controller. The implementation maintains an RB-Tree of groups that exceed their soft limit and starts reclaiming from the group that exceeds this limit by the maximum amount. This patch: Add documentation for soft limits Signed-off-by: Balbir Singh Cc: KAMEZAWA Hiroyuki Cc: Li Zefan Cc: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/memory.txt | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index ab0a02172cf..b871f2552b4 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -379,7 +379,42 @@ cgroups created below it. NOTE2: This feature can be enabled/disabled per subtree. -7. TODO +7. Soft limits + +Soft limits allow for greater sharing of memory. The idea behind soft limits +is to allow control groups to use as much of the memory as needed, provided + +a. There is no memory contention +b. They do not exceed their hard limit + +When the system detects memory contention or low memory control groups +are pushed back to their soft limits. If the soft limit of each control +group is very high, they are pushed back as much as possible to make +sure that one control group does not starve the others of memory. + +Please note that soft limits is a best effort feature, it comes with +no guarantees, but it does its best to make sure that when memory is +heavily contended for, memory is allocated based on the soft limit +hints/setup. Currently soft limit based reclaim is setup such that +it gets invoked from balance_pgdat (kswapd). + +7.1 Interface + +Soft limits can be setup by using the following commands (in this example we +assume a soft limit of 256 megabytes) + +# echo 256M > memory.soft_limit_in_bytes + +If we want to change this to 1G, we can at any time use + +# echo 1G > memory.soft_limit_in_bytes + +NOTE1: Soft limits take effect over a long period of time, since they involve + reclaiming memory for balancing between memory cgroups +NOTE2: It is recommended to set the soft limit always below the hard limit, + otherwise the hard limit will take precedence. + +8. TODO 1. Add support for accounting huge pages (as a separate controller) 2. Make per-cgroup scanner reclaim not-shared pages first -- cgit v1.2.3