aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2008-04-29ipc: define the slab_memory_callback priority as a constantNadia Derbey
This is a trivial patch that defines the priority of slab_memory_callback in the callback chain as a constant. This is to prepare for next patch in the series. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29ipc: scale msgmni to the number of ipc namespacesNadia Derbey
Since all the namespaces see the same amount of memory (the total one) this patch introduces a new variable that counts the ipc namespaces and divides msg_ctlmni by this counter. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29ipc: scale msgmni to the amount of lowmemNadia Derbey
On large systems we'd like to allow a larger number of message queues. In some cases up to 32K. However simply setting MSGMNI to a larger value may cause problems for smaller systems. The first patch of this series introduces a default maximum number of message queue ids that scales with the amount of lowmem. Since msgmni is per namespace and there is no amount of memory dedicated to each namespace so far, the second patch of this series scales msgmni to the number of ipc namespaces too. Since msgmni depends on the amount of memory, it becomes necessary to recompute it upon memory add/remove. In the 4th patch, memory hotplug management is added: a notifier block is registered into the memory hotplug notifier chain for the ipc subsystem. Since the ipc namespaces are not linked together, they have their own notification chain: one notifier_block is defined per ipc namespace. Each time an ipc namespace is created (removed) it registers (unregisters) its notifier block in (from) the ipcns chain. The callback routine registered in the memory chain invokes the ipcns notifier chain with the IPCNS_MEMCHANGE event. Each callback routine registered in the ipcns namespace, in turn, recomputes msgmni for the owning namespace. The 5th patch makes it possible to keep the memory hotplug notifier chain's lock for a lesser amount of time: instead of directly notifying the ipcns notifier chain upon memory add/remove, a work item is added to the global workqueue. When activated, this work item is the one who notifies the ipcns notifier chain. Since msgmni depends on the number of ipc namespaces, it becomes necessary to recompute it upon ipc namespace creation / removal. The 6th patch uses the ipc namespace notifier chain for that purpose: that chain is notified each time an ipc namespace is created or removed. This makes it possible to recompute msgmni for all the namespaces each time one of them is created or removed. When msgmni is explicitely set from userspace, we should avoid recomputing it upon memory add/remove or ipcns creation/removal. This is what the 7th patch does: it simply unregisters the ipcns callback routine as soon as msgmni has been changed from procfs or sysctl(). Even if msgmni is set by hand, it should be possible to make it back automatically recomputed upon memory add/remove or ipcns creation/removal. This what is achieved in patch 8: if set to a negative value, msgmni is added back to the ipcns notifier chain, making it automatically recomputed again. This patch: Compute msg_ctlmni to make it scale with the amount of lowmem. msg_ctlmni is now set to make the message queues occupy 1/32 of the available lowmem. Some cleaning has also been done for the MSGPOOL constant: the msgctl man page says it's not used, but it also defines it as a size in bytes (the code expresses it in Kbytes). Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29IB: expand ib_umem_get() prototypeArthur Kepner
Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1 when mapping user-allocated CQs with ib_umem_get(). Signed-off-by: Arthur Kepner <akepner@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Jes Sorensen <jes@sgi.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29dma/ia64: update ia64 machvecs, swiotlb.cArthur Kepner
Change all ia64 machvecs to use the new dma_*map*_attrs() interfaces. Implement the old dma_*map_*() interfaces in terms of the corresponding new interfaces. For ia64/sn, make use of one dma attribute, DMA_ATTR_WRITE_BARRIER. Introduce swiotlb_*map*_attrs() functions. Signed-off-by: Arthur Kepner <akepner@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Jes Sorensen <jes@sgi.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29dma: add dma_*map*_attrs() interfacesArthur Kepner
Introduce new interfaces, dma_*map*_attrs(), for passing architecture-specific attributes when memory is mapped and unmapped for DMA. Give the interfaces default implementations which ignore attributes. Also introduce the dma_{set|get}_attr() interfaces for setting and retrieving individual attributes. Define one attribute, DMA_ATTR_WRITE_BARRIER, in anticipation of its use by ia64/sn. Select whether architectures implement arch-specific versions of the dma_*map*_attrs() interfaces via HAVE_DMA_ATTRS in Kconfig. [markn@au1.ibm.com: dma_{set,get}_attr() have to be static inline] Signed-off-by: Arthur Kepner <akepner@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Jes Sorensen <jes@sgi.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29memcgroup: implement failcounter resetPavel Emelyanov
This is a very common requirement from people using the resource accounting facilities (not only memcgroup but also OpenVZ beancounters). They want to put the cgroup in an initial state without re-creating it. For example after re-configuring a group people want to observe how this new configuration fits the group needs without saving the previous failcnt value. Merge two resets into one mem_cgroup_reset() function to demonstrate how multiplexing work. Besides, I have plans to move the files, that correspond to res_counter to the res_counter.c file and somehow "import" them into controller. I don't know how to make it gracefully yet, but merging resets of max_usage and failcnt in one function will be there for sure. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29memcgroups: add a document describing the resource counter abstractionPavel Emelyanov
The resource counter is supposed to facilitate the resource accounting of arbitrary resource (and it already does this for memory controller). However, it is about to be used in other resources controllers (swap, kernel memory, networking, etc), so provide a doc describing how to work with it. This will eliminate all the possible future duplications in the appropriate controllers' docs. Fixed errors pointed out by Randy. [akpm@linux-foundation.org: fix documentation tpyo] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29memcgroup: add the max_usage member on the res_counterPavel Emelyanov
This field is the maximal value of the usage one since the counter creation (or since the latest reset). To reset this to the usage value simply write anything to the appropriate cgroup file. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29cgroups: add an owner to the mm_structBalbir Singh
Remove the mem_cgroup member from mm_struct and instead adds an owner. This approach was suggested by Paul Menage. The advantage of this approach is that, once the mm->owner is known, using the subsystem id, the cgroup can be determined. It also allows several control groups that are virtually grouped by mm_struct, to exist independent of the memory controller i.e., without adding mem_cgroup's for each controller, to mm_struct. A new config option CONFIG_MM_OWNER is added and the memory resource controller selects this config option. This patch also adds cgroup callbacks to notify subsystems when mm->owner changes. The mm_cgroup_changed callback is called with the task_lock() of the new task held and is called just prior to changing the mm->owner. I am indebted to Paul Menage for the several reviews of this patchset and helping me make it lighter and simpler. This patch was tested on a powerpc box, it was compiled with both the MM_OWNER config turned on and off. After the thread group leader exits, it's moved to init_css_state by cgroup_exit(), thus all future charges from runnings threads would be redirected to the init_css_set's subsystem. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Hirokazu Takahashi <taka@valinux.co.jp> Cc: David Rientjes <rientjes@google.com>, Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: Paul Menage <menage@google.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29cgroups: introduce cft->read_seq()Serge E. Hallyn
Introduce a read_seq() helper in cftype, which uses seq_file to print out lists. Use it in the devices cgroup. Also split devices.allow into two files, so now devices.deny and devices.allow are the ones to use to manipulate the whitelist, while devices.list outputs the cgroup's current whitelist. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29cgroups: remove the css_set linked-listLi Zefan
Now we can run through the hash table instead of running through the linked-list. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29cgroups: use a hash table for css_set findingLi Zefan
When we attach a process to a different cgroup, the css_set linked-list will be run through to find a suitable existing css_set to use. This patch implements a hash table for better performance. The following benchmarks have been tested: For N in 1, 5, 10, 50, 100, 500, 1000, create N cgroups with one sleeping task in each, and then move an additional task through each cgroup in turn. Here is a test result: N Loop orig - Time(s) hash - Time(s) ---------------------------------------------- 1 10000 1.201231728 1.196311177 5 2000 1.065743872 1.040566424 10 1000 0.991054735 0.986876440 50 200 0.976554203 0.969608733 100 100 0.998504680 0.969218270 500 20 1.157347764 0.962602963 1000 10 1.619521852 1.085140172 Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29cgroups: implement device whitelistSerge E. Hallyn
Implement a cgroup to track and enforce open and mknod restrictions on device files. A device cgroup associates a device access whitelist with each cgroup. A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block). 'all' means it applies to all types and all major and minor numbers. Major and minor are either an integer or * for all. Access is a composition of r (read), w (write), and m (mknod). The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of the parent. Admins can then remove devices from the whitelist or add new entries. A child cgroup can never receive a device access which is denied its parent. However when a device access is removed from a parent it will not also be removed from the child(ren). An entry is added using devices.allow, and removed using devices.deny. For instance echo 'c 1:3 mr' > /cgroups/1/devices.allow allows cgroup 1 to read and mknod the device usually known as /dev/null. Doing echo a > /cgroups/1/devices.deny will remove the default 'a *:* mrw' entry. CAP_SYS_ADMIN is needed to change permissions or move another task to a new cgroup. A cgroup may not be granted more permissions than the cgroup's parent has. Any task can move itself between cgroups. This won't be sufficient, but we can decide the best way to adequately restrict movement later. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix may-be-used-uninitialized warning] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: James Morris <jmorris@namei.org> Looks-good-to: Pavel Emelyanov <xemul@openvz.org> Cc: Daniel Hokka Zakrisson <daniel@hozac.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29cgroups: add the trigger callback to struct cftypePavel Emelyanov
Trigger callback can be used to receive a kick-up from the user space. The string written is ignored. The cftype->private is used for multiplexing events. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Paul Menage <menage@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29CGroups _s64 files: add cgroups read_s64/write_s64 file methodsPaul Menage
These patches add cgroups read_s64 and write_s64 control file methods (the signed equivalent of read_u64/write_u64) and use them to implement the cpu.rt_runtime_us control file in the CFS cgroup subsystem. This patch: These are the signed equivalents of the read_u64/write_u64 methods Signed-off-by: Paul Menage <menage@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29CGroup API files: move "releasable" to cgroup_debug subsystemPaul Menage
The "releasable" control file provided by the cgroup framework exports the state of a per-cgroup flag that's related to the notify-on-release feature. This isn't really generally useful, unless you're trying to debug this particular feature of cgroups. This patch moves the "releasable" file to the cgroup_debug subsystem. Signed-off-by: Paul Menage <menage@google.com> Cc: "Li Zefan" <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29CGroup API files: add cgroup map data typePaul Menage
Adds a new type of supported control file representation, a map from strings to u64 values. Each map entry is printed as a line in a similar format to /proc/vmstat, i.e. "$key $value\n" Signed-off-by: Paul Menage <menage@google.com> Cc: "Li Zefan" <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29CGroup API files: add res_counter_read_u64()Paul Menage
Adds a function for returning the value of a resource counter member, in a form suitable for use in a cgroup read_u64 control file method. Signed-off-by: Paul Menage <menage@google.com> Cc: "Li Zefan" <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29CGroup API files: rename read/write_uint methods to read_write_u64Paul Menage
Several people have justifiably complained that the "_uint" suffix is inappropriate for functions that handle u64 values, so this patch just renames all these functions and their users to have the suffic _u64. [peterz@infradead.org: build fix] Signed-off-by: Paul Menage <menage@google.com> Cc: "Li Zefan" <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29x86: olpc: add One Laptop Per Child architecture supportAndres Salomon
This adds support for OLPC XO hardware. Open Firmware on XOs don't contain the VSA, so it is necessary to emulate the PCI BARs in the kernel. This also adds functionality for running EC commands, and a CONFIG_OLPC. A number of OLPC drivers depend upon CONFIG_OLPC. olpc_ec_timeout is a hack to work around Embedded Controller bugs. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: geode_has_vsa build fix] [akpm@linux-foundation.org: olpc_register_battery_callback doesn't exist] Signed-off-by: Andres Salomon <dilinger@debian.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@suse.de> Cc: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29vt: fix background color on line feedJan Engelhardt
A command that causes a line feed while a background color is active, such as perl -e 'print "x" x 60, "\e[44m", "x" x 40, "\e[0m\n"' and perl -e 'print "x" x 40, "\e[44m\n", "x" x 40, "\e[0m\n"' causes the line that was started as a result of the line feed to be completely filled with the currently active background color instead of the default color. When scrolling, part of the current screen is memcpy'd/memmove'd to the new region, and the new line(s) that will appear as a result are cleared using memset. However, the lines are cleared with vc->vc_video_erase_char, causing them to be colored with the currently active background color. This is different from X11 terminal emulators which always paint the new lines with the default background color (e.g. `xterm -bg black`). The clear operation (\e[1J and \e[2J) also use vc_video_erase_char, so a new vc->vc_scrl_erase_char is introduced with contains the erase character used for scrolling, which is built from vc->vc_def_color instead of vc->vc_color. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29isolate ratelimit from printk.c for other useDave Young
Due to the rcupreempt.h WARN_ON trigged, I got 2G syslog file. For some serious complaining of kernel, we need repeat the warnings, so here I isolate the ratelimit part of printk.c to a standalone file. Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29xattr: add missing consts to function argumentsDavid Howells
Add missing consts to xattr function arguments. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29smb.h: uses struct timespec but didn't include linux/time.hIlpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29lists: add "const" qualifier to first arg of list_splice() operationsRobert P. J. Day
Since neither the list_splice() nor __list_splice() routines modify their first argument, might as well declare them "const". [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29kbuild: move files that don't check __KERNEL__Robert P. J. Day
Move files that don't check __KERNEL__ from unifdef-y to header-y. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29kbuild: remove duplicate, conflicting entry for oom.hRobert P. J. Day
oom.h is already tagged for unifdef'ing, so its entry as a simple exportable header should be deleted. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29Remove superfluous include of string.h from percpu.hRobert P. J. Day
There's nothing in percpu.h that requires an explicit inclusion of string.h. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29binfmt_misc.c: avoid potential kernel stack overflowPavel Emelyanov
This can be triggered with root help only, but... Register the ":text:E::txt::/root/cat.txt:' rule in binfmt_misc (by root) and try launching the cat.txt file (by anyone) :) The result is - the endless recursion in the load_misc_binary -> open_exec -> load_misc_binary chain and stack overflow. There's a similar problem with binfmt_script, and there's a sh_bang memner on linux_binprm structure to handle this, but simply raising this in binfmt_misc may break some setups when the interpreter of some misc binaries is a script. So the proposal is to turn sh_bang into a bit, add a new one (the misc_bang) and raise it in load_misc_binary. After this, even if we set up the misc -> script -> misc loop for binfmts one of them will step on its own bang and exit. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29proper extern for late_time_initAdrian Bunk
Add a proper extern for late_time_init in include/linux/init.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29exec: remove argv_len from struct linux_binprmTetsuo Handa
I noticed that 2.6.24.2 calculates bprm->argv_len at do_execve(). But it doesn't update bprm->argv_len after "remove_arg_zero() + copy_strings_kernel()" at load_script() etc. audit_bprm() is called from search_binary_handler() and search_binary_handler() is called from load_script() etc. Thus, I think the condition check if (bprm->argv_len > (audit_argv_kb << 10)) return -E2BIG; in audit_bprm() might return wrong result when strlen(removed_arg) != strlen(spliced_args). Why not update bprm->argv_len at load_script() etc. ? By the way, 2.6.25-rc3 seems to not doing the condition check. Is the field bprm->argv_len no longer needed? Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Ollie Wild <aaw@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29Remove the macro get_personalityWANG Cong
Remove the macro get_personality, use ->personality instead. Cc: Christoph Hellwig <hch@infradead.org Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Bryan Wu <bryan.wu@analog.com> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29Misc: phantom, consistent whitespacejan sonnek
Make it consistent with the rest of the header. Signed-off-by: jan sonnek <xsonnek@gmail.com> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29Misc: phantom, add compat ioctlJiri Slaby
Openhaptics uses pointers in _IOC() macros, implement compat for them. Also add _IOC alternatives which are not 32/64 bit dependent (structures passed through aren't yet) -- libphantom will use them. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29proper __do_softirq() prototypeAdrian Bunk
Add a proper prototype for __do_softirq() in include/linux/interrupt.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29remove mca_is_adapter_used()Adrian Bunk
Remove the no longer used mca_is_adapter_used(). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29remove generic_commit_write()Adrian Bunk
Remove the obsolete and no longer used generic_commit_write(). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29fs/aio.c: make 3 functions staticAdrian Bunk
Make the following needlessly global functions static: - __put_ioctx() - lookup_ioctx() - io_submit_one() Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29fs/drop_caches.c: make 2 functions staticAdrian Bunk
Make the following needlessly global functions static: - drop_pagecache() - drop_slab() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29fs/fs-writeback.c: make 2 functions staticAdrian Bunk
Make the following needlessly global functions static: - writeback_acquire() - writeback_release() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29make vfs_ioctl() staticAdrian Bunk
Make the needlessly global vfs_ioctl() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29make __put_super() staticAdrian Bunk
Make the needlessly global __put_super() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29cpu: fix section mismatch warnings in hotcpu_registerSam Ravnborg
Fix following warnings: WARNING: vmlinux.o(.data+0x5020): Section mismatch in reference from the variable cpu_vsyscall_notifier_nb.12876 to the function .cpuinit.text:cpu_vsyscall_notifier() WARNING: vmlinux.o(.data+0x9ce0): Section mismatch in reference from the variable profile_cpu_callback_nb.17654 to the function .devinit.text:profile_cpu_callback() WARNING: vmlinux.o(.data+0xd380): Section mismatch in reference from the variable workqueue_cpu_callback_nb.15004 to the function .devinit.text:workqueue_cpu_callback() WARNING: vmlinux.o(.data+0x11d00): Section mismatch in reference from the variable relay_hotcpu_callback_nb.19626 to the function .cpuinit.text:relay_hotcpu_callback() WARNING: vmlinux.o(.data+0x12970): Section mismatch in reference from the variable cpu_callback_nb.24694 to the function .devinit.text:cpu_callback() WARNING: vmlinux.o(.data+0x3fee0): Section mismatch in reference from the variable percpu_counter_hotcpu_callback_nb.10903 to the function .cpuinit.text:percpu_counter_hotcpu_callback() WARNING: vmlinux.o(.data+0x74ce0): Section mismatch in reference from the variable topology_cpu_callback_nb.12506 to the function .cpuinit.text:topology_cpu_callback() Functions used as argument are by definition only used in HOTPLUG_CPU situations so thay are annotated __cpuinit. Annotate the static variable used by hotcpu_register with __cpuinitdata to match this definition. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29add RUSAGE_THREADSripathi Kodi
Add the RUSAGE_THREAD option for the getrusage system call. This is essentially Roland's patch from http://lkml.org/lkml/2008/1/18/589, but the line about RUSAGE_LWP line has been removed, as suggested by Ulrich and Christoph. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29Taint kernel after WARN_ON(condition)Nur Hussein
The kernel is sent to tainted within the warn_on_slowpath() function, and whenever a warning occurs the new taint flag 'W' is set. This is useful to know if a warning occurred before a BUG by preserving the warning as a flag in the taint state. This does not work on architectures where WARN_ON has its own definition. These archs are: 1. s390 2. superh 3. avr32 4. parisc The maintainers of these architectures have been added in the Cc: list in this email to alert them to the situation. The documentation in oops-tracing.txt has been updated to include the new flag. Signed-off-by: Nur Hussein <nurhussein@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29fs/coda: remove static inline forward declarationsIlpo Järvinen
They're defined later on in the same file with bodies and nothing in between needs them. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29Avoid divides in BITS_TO_LONGSEric Dumazet
BITS_PER_LONG is a signed value (32 or 64) DIV_ROUND_UP(nr, BITS_PER_LONG) performs signed arithmetic if "nr" is signed too. Converting BITS_TO_LONGS(nr) to DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) makes sure compiler can perform a right shift, even if "nr" is a signed value, instead of an expensive integer divide. Applying this patch saves 141 bytes on x86 when CONFIG_CC_OPTIMIZE_FOR_SIZE=y and speedup bitmap operations. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29mm: fix misleading __GFP_REPEAT related commentsNishanth Aravamudan
The definition and use of __GFP_REPEAT, __GFP_NOFAIL and __GFP_NORETRY in the core VM have somewhat differing comments as to their actual semantics. Annoyingly, the flags definition has inline and header comments, which might be interpreted as not being equivalent. Just add references to the header comments in the inline ones so they don't go out of sync in the future. In their use in __alloc_pages() clarify that the current implementation treats low-order allocations and __GFP_REPEAT allocations as distinct cases. To clarify, the flags' semantics are: __GFP_NORETRY means try no harder than one run through __alloc_pages __GFP_REPEAT means __GFP_NOFAIL __GFP_NOFAIL means repeat forever order <= PAGE_ALLOC_COSTLY_ORDER means __GFP_NOFAIL Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (35 commits) siimage: coding style cleanup (take 2) ide-cd: clean up cdrom_analyze_sense_data() ide-cd: fix test unsigned var < 0 ide: add TSSTcorp CDDVDW SH-S202H to ivb_list[] piix: add Asus Eee 701 controller to short cable list ARM: always select HAVE_IDE remove the broken ETRAX_IDE driver ide: remove ->dma_prdtable field from ide_hwif_t ide: remove ->dma_vendor{1,3} fields from ide_hwif_t scc_pata: add ->dma_host_set and ->dma_start methods ide: skip "VLB sync" if host uses MMIO ide: add ide_pad_transfer() helper ide: remove ->INW and ->OUTW methods ide: use IDE I/O helpers directly in ide_tf_{load,read}() ns87415: add ->tf_read method scc_pata: add ->tf_{load,read} methods ide-h8300: add ->tf_{load,read} methods ide-cris: add ->tf_{load,read} methods ide: add ->tf_load and ->tf_read methods ide: move ide_tf_{load,read} to ide-iops.c ...