aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2010-01-29tracing/kprobe: Cleanup unused return value of tracing functionsXiao Guangrong
The return values of the kprobe's tracing functions are meaningless, lets remove these. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <4B60E9A3.2040505@cn.fujitsu.com> [fweisbec@gmail: whitespace fixes, drop useless void returns in end of functions] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-01-29perf: Factorize trace events raw sample buffer operationsXiao Guangrong
Introduce ftrace_perf_buf_prepare() and ftrace_perf_buf_submit() to gather the common code that operates on raw events sampling buffer. This cleans up redundant code between regular trace events, syscall events and kprobe events. Changelog v1->v2: - Rename function name as per Masami and Frederic's suggestion - Add __kprobes for ftrace_perf_buf_prepare() and make ftrace_perf_buf_submit() inline as per Masami's suggestion - Export ftrace_perf_buf_prepare since modules will use it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <4B60E92D.9000808@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-01-27perf: Reimplement frequency driven samplingPeter Zijlstra
There was a bug in the old period code that caused intel_pmu_enable_all() or native_write_msr_safe() to show up quite high in the profiles. In staring at that code it made my head hurt, so I rewrote it in a hopefully simpler fashion. Its now fully symetric between tick and overflow driven adjustments and uses less data to boot. The only complication is that it basically wants to do a u128 division. The code approximates that in a rather simple truncate until it fits fashion, taking care to balance the terms while truncating. This version does not generate that sampling artefact. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-18Merge branch 'perf/scheduling' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
2010-01-17perf: Better order flexible and pinned schedulingFrederic Weisbecker
When a task gets scheduled in. We don't touch the cpu bound events so the priority order becomes: cpu pinned, cpu flexible, task pinned, task flexible. So schedule out cpu flexibles when a new task context gets in and correctly order the groups to schedule in: task pinned, cpu flexible, task flexible. Cpu pinned groups don't need to be touched at this time. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17perf: Don't schedule out/in pinned events on task tickFrederic Weisbecker
We don't need to schedule in/out pinned events on task tick, now that pinned and flexible groups can be scheduled separately. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17perf: Allow pinned and flexible groups to be scheduled separatelyFrederic Weisbecker
Tune the scheduling helpers so that we can choose to schedule either pinned and/or flexible groups from a context. And while at it, refactor a bit the naming of these helpers to make these more consistent and flexible. There is no (intended) change in scheduling behaviour in this patch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17perf: Make __perf_event_sched_out staticFrederic Weisbecker
__perf_event_sched_out doesn't need to be globally available, make it static. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17tracing/kprobe: Update kprobe tracing self test for new syntaxMasami Hiramatsu
Update kprobe tracing self test for new syntax (it supports deleting individual probes, and drops $argN support) and behavior change (new probes are disabled in default). This selftest includes the following checks: - Adding function-entry probe and return probe with arguments. - Enabling these probes. - Deleting it individually. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100114051211.7814.29436.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-16perf: Export software-only event group characteristic as a flagFrederic Weisbecker
Before scheduling an event group, we first check if a group can go on. We first check if the group is made of software only events first, in which case it is enough to know if the group can be scheduled in. For that purpose, we iterate through the whole group, which is wasteful as we could do this check when we add/delete an event to a group. So we create a group_flags field in perf event that can host characteristics from a group of events, starting with a first PERF_GROUP_SOFTWARE flag that reduces the check on the fast path. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-16perf: Round robin flexible groups of events using list_rotate_left()Frederic Weisbecker
This is more proper that doing it through a list_for_each_entry() that breaks after the first entry. v2: Don't rotate pinned groups as its not needed to time share them. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-16perf/core: Split context's event group list into pinned and non-pinned listsFrederic Weisbecker
Split-up struct perf_event_context::group_list into pinned_groups and flexible_groups (non-pinned). This first appears to be useless as it duplicates various loops around the group list handlings. But it scales better in the fast-path in perf_sched_in(). We don't anymore iterate twice through the entire list to separate pinned and non-pinned scheduling. Instead we interate through two distinct lists. The another desired effect is that it makes easier to define distinct scheduling rules on both. Changes in v2: - Respectively rename pinned_grp_list and volatile_grp_list into pinned_groups and flexible_groups as per Ingo suggestion. - Various cleanups Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-13sched/perf: Make sure irqs are disabled for perf_event_task_sched_in()Jamie Iles
perf_event_task_sched_in() expects interrupts to be disabled, but on architectures with __ARCH_WANT_INTERRUPTS_ON_CTXSW defined, this isn't true. If this is defined, disable irqs around the call in finish_task_switch(). Signed-off-by: Jamie Iles <jamie.iles@picochip.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> LKML-Reference: <1262964453-27370-1-git-send-email-jamie.iles@picochip.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13tracing/kprobe: Drop function argument access syntaxMasami Hiramatsu
Drop function argument access syntax, because the function arguments depend on not only architecture but also compile-options and function API. And now, we have perf-probe for finding register/memory assigned to each argument. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Neuling <mikey@neuling.org> Cc: linuxppc-dev@ozlabs.org LKML-Reference: <20100105224648.19431.52309.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge reason: queue up dependent patch, update to -rc4 Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-11kernel/signal.c: fix kernel information leak with print-fatal-signals=1Andi Kleen
When print-fatal-signals is enabled it's possible to dump any memory reachable by the kernel to the log by simply jumping to that address from user space. Or crash the system if there's some hardware with read side effects. The fatal signals handler will dump 16 bytes at the execution address, which is fully controlled by ring 3. In addition when something jumps to a unmapped address there will be up to 16 additional useless page faults, which might be potentially slow (and at least is not very efficient) Fortunately this option is off by default and only there on i386. But fix it by checking for kernel addresses and also stopping when there's a page fault. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-11cgroups: fix 2.6.32 regression causing BUG_ON() in cgroup_diput()Dave Anderson
The LTP cgroup test suite generates a "kernel BUG at kernel/cgroup.c:790!" here in cgroup_diput(): /* * if we're getting rid of the cgroup, refcount should ensure * that there are no pidlists left. */ BUG_ON(!list_empty(&cgrp->pidlists)); The cgroup pidlist rework in 2.6.32 generates the BUG_ON, which is caused when pidlist_array_load() calls cgroup_pidlist_find(): (1) if a matching cgroup_pidlist is found, it down_write's the mutex of the pre-existing cgroup_pidlist, and increments its use_count. (2) if no matching cgroup_pidlist is found, then a new one is allocated, it down_write's its mutex, and the use_count is set to 0. (3) the matching, or new, cgroup_pidlist gets returned back to pidlist_array_load(), which increments its use_count -- regardless whether new or pre-existing -- and up_write's the mutex. So if a matching list is ever encountered by cgroup_pidlist_find() during the life of a cgroup directory, it results in an inflated use_count value, preventing it from ever getting released by cgroup_release_pid_array(). Then if the directory is subsequently removed, cgroup_diput() hits the BUG_ON() when it finds that the directory's cgroup is still populated with a pidlist. The patch simply removes the use_count increment when a matching pidlist is found by cgroup_pidlist_find(), because it gets bumped by the calling pidlist_array_load() function while still protected by the list's mutex. Signed-off-by: Dave Anderson <anderson@redhat.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Ben Blum <bblum@andrew.cmu.edu> Cc: Paul Menage <menage@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-11kmod: fix resource leak in call_usermodehelper_pipe()Masami Hiramatsu
Fix resource (write-pipe file) leak in call_usermodehelper_pipe(). When call_usermodehelper_exec() fails, write-pipe file is opened and call_usermodehelper_pipe() just returns an error. Since it is hard for caller to determine whether the error occured when opening the pipe or executing the helper, the caller cannot close the pipe by themselves. I've found this resoruce leak when testing coredump. You can check how the resource leaks as below; $ echo "|nocommand" > /proc/sys/kernel/core_pattern $ ulimit -c unlimited $ while [ 1 ]; do ./segv; done &> /dev/null & $ cat /proc/meminfo (<- repeat it) where segv.c is; //----- int main () { char *p = 0; *p = 1; } //----- This patch closes write-pipe file if call_usermodehelper_exec() failed. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-06modules: Skip empty sections when exporting section notesBen Hutchings
Commit 35dead4 "modules: don't export section names of empty sections via sysfs" changed the set of sections that have attributes, but did not change the iteration over these attributes in add_notes_attrs(). This can lead to add_notes_attrs() creating attributes with the wrong names or with null name pointers. Introduce a sect_empty() function and use it in both add_sect_attrs() and add_notes_attrs(). Reported-by: Martin Michlmayr <tbm@cyrius.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Tested-by: Martin Michlmayr <tbm@cyrius.com> Cc: stable@kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-31Merge branch 'perf-fixes-for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix NULL deref in inheritance code perf: Pass appropriate frame pointer to dump_trace()
2009-12-31Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf kmem: Fix statistics typo kprobes: Fix distinct type warning perf: Rename perf_event_hw_event in design document perf tools: Add missing header files to LIB_H Makefile variable perf record: We should fork only if a program was specified to run perf diff: Fix usage array, it must end with a NULL entry
2009-12-31Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Fix sign fields in ftrace_define_fields_##call() tracing/syscalls: Fix typo in SYSCALL_DEFINE0 tracing/kprobe: Show sign of fields in trace_kprobe format files ksym_tracer: Remove trace_stat ksym_tracer: Fix race when incrementing count ksym_tracer: Fix to allow writing newline to ksym_trace_filter ksym_tracer: Fix to make the tracer work tracing: Kconfig spelling fixes and cleanups tracing: Fix setting tracer specific options Documentation: Update ftrace-design.txt Documentation: Update tracepoint-analysis.txt Documentation: Update mmiotrace.txt
2009-12-31perf: Fix NULL deref in inheritance codePeter Zijlstra
Liming found a NULL deref when a task has a perf context but no counters when it forks. This can occur in two cases, a race during construction where the fork hits after installing the context but before the first counter gets inserted, or more reproducably, a fork after the last counter is closed (which leaves the context around). Reported-by: Wang Liming <liming.wang@windriver.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> CC: <stable@kernel.org> LKML-Reference: <1262185684.7135.222.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-30tracing: Fix sign fields in ftrace_define_fields_##call()Lai Jiangshan
Add is_signed_type() call to trace_define_field() in ftrace macros. The code previously just passed in 0 (false), disregarding whether or not the field was actually a signed type. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4B273D3A.6020007@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-30tracing/kprobe: Show sign of fields in trace_kprobe format filesLai Jiangshan
The format files of trace_kprobe do not show the sign of the fields. The other format files show the field signed type of the fields and this patch makes the trace_kprobe formats consistent with the others. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4B273D27.5040009@cn.fujitsu.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-30ksym_tracer: Remove trace_statLi Zefan
trace_stat is problematic. Don't use it, use seqfile instead. This fixes a race that reading the stat file is not protected by any lock, which can lead to use after free. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B3AF203.40200@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-30ksym_tracer: Fix race when incrementing countLi Zefan
We are under rcu read section but not holding the write lock, so count++ is not atomic. Use atomic64_t instead. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B3AF1EC.9010608@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-30ksym_tracer: Fix to allow writing newline to ksym_trace_filterLi Zefan
It used to work, but now doesn't: # echo > ksym_filter bash: echo: write error: Invalid argument It's caused by d954fbf0ff6b5fdfb32350e85a2f15d3db976506 ("tracing: Fix wrong usage of strstrip in trace_ksyms"). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B3AF1D7.5040400@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-30ksym_tracer: Fix to make the tracer workLi Zefan
ksym tracer doesn't work: # echo tasklist_lock:rw- > ksym_trace_filter -bash: echo: write error: No such device It's because we pass to perf_event_create_kernel_counter() a cpu number which is not present. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B3AF19E.1010201@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-28tracing: Kconfig spelling fixes and cleanupsRandy Dunlap
Fix filename reference (ftrace-implementation.txt -> ftrace-design.txt). Fix spelling, punctuation, grammar. Fix help text indentation and line lengths to reduce need for horizontal scrolling or larger window sizes. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091221120117.3fb49cdc.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-28perf events: Remove CONFIG_EVENT_PROFILELi Zefan
Quoted from Ingo: | This reminds me - i think we should eliminate CONFIG_EVENT_PROFILE - | it's an unnecessary Kconfig complication. If both PERF_EVENTS and | EVENT_TRACING is enabled we should expose generic tracepoints. | | Nor is it limited to event 'profiling', so it has become a misnomer as | well. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <4B2F1557.2050705@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-28kprobes: Fix distinct type warningHeiko Carstens
Every time I see this: kernel/kprobes.c: In function 'register_kretprobe': kernel/kprobes.c:1038: warning: comparison of distinct pointer types lacks a cast I'm wondering if something changed in common code and we need to do something for s390. Apparently that's not the case. Let's get rid of this annoying warning. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <20091221120224.GA4471@osiris.boeblingen.de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-28perf events: Remove arg from perf sched hooksPeter Zijlstra
Since we only ever schedule the local cpu, there is no need to pass the cpu number to the perf sched hooks. This micro-optimizes things a bit. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-24Merge branch 'sysctl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6 * 'sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6: SYSCTL: Add a mutex to the page_alloc zone order sysctl SYSCTL: Print binary sysctl warnings (nearly) only once
2009-12-23SYSCTL: Print binary sysctl warnings (nearly) only onceAndi Kleen
When printing legacy sysctls print the warning message for each of them only once. This way there is a guarantee the syslog won't be flooded for any sane program. The original attempt at this made the tables non const and stored the flag inline. Linus suggested using a separate hash table for this, this is based on a code snippet from him. The hash implies this is not exact and can sometimes not print a new sysctl due to a hash collision, but in practice this should not be a problem I used a FNV32 hash over the binary string with a 32byte bitmap. This gives relatively little collisions when all the predefined binary sysctls are hashed: size 256 bucket length number 0: [25] 1: [67] 2: [88] 3: [47] 4: [22] 5: [6] 6: [1] The worst case is a single collision of 6 hash values. Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-23Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Revert 738d2be, simplify set_task_cpu()
2009-12-23sched: Revert 738d2be, simplify set_task_cpu()Peter Zijlstra
Effectively reverts 738d2be4301007f054541c5c4bf7fb6a361c9b3a. As demonstrated by Eric, we really need to call __set_task_cpu() early in the fork() path to properly initialize the various task state -- specifically the cgroup state through set_task_rq(). [ we could probably fix this by explicitly calling __set_task_cpu() from sched_fork(), but lets try that for the next cycle and simply revert to the old behaviour for now. ] Reported-by: Eric Paris <eparis@redhat.com> Tested-by: Eric Paris <eparis@redhat.com>, Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: efault@gmx.de LKML-Reference: <1261492999.4937.36.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: jfs: Fix 32bit build warning Remove obsolete comment in fs.h Sanitize f_flags helpers Fix f_flags/f_mode in case of lookup_instantiate_filp() from open(pathname, 3) anonfd: Allow making anon files read-only fs/compat_ioctl.c: fix build error when !BLOCK pohmelfs needs I_LOCK alloc_file(): simplify handling of mnt_clone_write() errors
2009-12-22kfifo: add record handling functionsStefani Seibold
Add kfifo_in_rec() - puts some record data into the FIFO Add kfifo_out_rec() - gets some record data from the FIFO Add kfifo_from_user_rec() - puts some data from user space into the FIFO Add kfifo_to_user_rec() - gets data from the FIFO and write it to user space Add kfifo_peek_rec() - gets the size of the next FIFO record field Add kfifo_skip_rec() - skip the next fifo out record Add kfifo_avail_rec() - determinate the number of bytes available in a record FIFO Signed-off-by: Stefani Seibold <stefani@seibold.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-22kfifo: add kfifo_skip, kfifo_from_user and kfifo_to_userStefani Seibold
Add kfifo_reset_out() for save lockless discard the fifo output Add kfifo_skip() to skip a number of output bytes Add kfifo_from_user() to copy user space data into the fifo Add kfifo_to_user() to copy fifo data to user space Signed-off-by: Stefani Seibold <stefani@seibold.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-22kfifo: rename kfifo_put... into kfifo_in... and kfifo_get... into kfifo_out...Stefani Seibold
rename kfifo_put... into kfifo_in... to prevent miss use of old non in kernel-tree drivers ditto for kfifo_get... -> kfifo_out... Improve the prototypes of kfifo_in and kfifo_out to make the kerneldoc annotations more readable. Add mini "howto porting to the new API" in kfifo.h Signed-off-by: Stefani Seibold <stefani@seibold.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-22kfifo: cleanup namespaceStefani Seibold
change name of __kfifo_* functions to kfifo_*, because the prefix __kfifo should be reserved for internal functions only. Signed-off-by: Stefani Seibold <stefani@seibold.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-22kfifo: move out spinlockStefani Seibold
Move the pointer to the spinlock out of struct kfifo. Most users in tree do not actually use a spinlock, so the few exceptions now have to call kfifo_{get,put}_locked, which takes an extra argument to a spinlock. Signed-off-by: Stefani Seibold <stefani@seibold.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-22kfifo: move struct kfifo in placeStefani Seibold
This is a new generic kernel FIFO implementation. The current kernel fifo API is not very widely used, because it has to many constrains. Only 17 files in the current 2.6.31-rc5 used it. FIFO's are like list's a very basic thing and a kfifo API which handles the most use case would save a lot of development time and memory resources. I think this are the reasons why kfifo is not in use: - The API is to simple, important functions are missing - A fifo can be only allocated dynamically - There is a requirement of a spinlock whether you need it or not - There is no support for data records inside a fifo So I decided to extend the kfifo in a more generic way without blowing up the API to much. The new API has the following benefits: - Generic usage: For kernel internal use and/or device driver. - Provide an API for the most use case. - Slim API: The whole API provides 25 functions. - Linux style habit. - DECLARE_KFIFO, DEFINE_KFIFO and INIT_KFIFO Macros - Direct copy_to_user from the fifo and copy_from_user into the fifo. - The kfifo itself is an in place member of the using data structure, this save an indirection access and does not waste the kernel allocator. - Lockless access: if only one reader and one writer is active on the fifo, which is the common use case, no additional locking is necessary. - Remove spinlock - give the user the freedom of choice what kind of locking to use if one is required. - Ability to handle records. Three type of records are supported: - Variable length records between 0-255 bytes, with a record size field of 1 bytes. - Variable length records between 0-65535 bytes, with a record size field of 2 bytes. - Fixed size records, which no record size field. - Preserve memory resource. - Performance! - Easy to use! This patch: Since most users want to have the kfifo as part of another object, reorganize the code to allow including struct kfifo in another data structure. This requires changing the kfifo_alloc and kfifo_init prototypes so that we pass an existing kfifo pointer into them. This patch changes the implementation and all existing users. [akpm@linux-foundation.org: fix warning] Signed-off-by: Stefani Seibold <stefani@seibold.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-22Revert "time: Remove xtime_cache"Linus Torvalds
This reverts commit 7bc7d637452383d56ba4368d4336b0dde1bb476d, as requested by John Stultz. Quoting John: "Petr Titěra reported an issue where he saw odd atime regressions with 2.6.33 where there were a full second worth of nanoseconds in the nanoseconds field. He also reviewed the time code and narrowed down the problem: unhandled overflow of the nanosecond field caused by rounding up the sub-nanosecond accumulated time. Details: * At the end of update_wall_time(), we currently round up the sub-nanosecond portion of accumulated time when storing it into xtime. This was added to avoid time inconsistencies caused when the sub-nanosecond portion was truncated when storing into xtime. Unfortunately we don't handle the possible second overflow caused by that rounding. * Previously the xtime_cache code hid this overflow by normalizing the xtime value when storing into the xtime_cache. * We could try to handle the second overflow after the rounding up, but since this affects the timekeeping's internal state, this would further complicate the next accumulation cycle, causing small errors in ntp steering. As much as I'd like to get rid of it, the xtime_cache code is known to work. * The correct fix is really to include the sub-nanosecond portion in the timekeeping accessor function, so we don't need to round up at during accumulation. This would greatly simplify the accumulation code. Unfortunately, we can't do this safely until the last three non-GENERIC_TIME arches (sparc32, arm, cris) are converted (those patches are in -mm) and we kill off the spots where arches set xtime directly. This is all 2.6.34 material, so I think reverting the xtime_cache change is the best approach for now. Many thanks to Petr for both reporting and finding the issue!" Reported-by: Petr Titěra <P.Titera@century.cz> Requested-by: john stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-22Sanitize f_flags helpersAl Viro
* pull ACC_MODE to fs.h; we have several copies all over the place * nightmarish expression calculating f_mode by f_flags deserves a helper too (OPEN_FMODE(flags)) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-22anonfd: Allow making anon files read-onlyRoland Dreier
It seems a couple places such as arch/ia64/kernel/perfmon.c and drivers/infiniband/core/uverbs_main.c could use anon_inode_getfile() instead of a private pseudo-fs + alloc_file(), if only there were a way to get a read-only file. So provide this by having anon_inode_getfile() create a read-only file if we pass O_RDONLY in flags. Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-21tracing: Fix setting tracer specific optionsSteven Rostedt
The function __set_tracer_option() takes as its last parameter a "neg" value. If set it should negate the value of the option. The trace_options_write() passed the value written to the file which is what the new value needs to be set as. But since this is not the negative, it never sets the value. Reported-by: Peter Zijlstra <peterz@infradead.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-21resources: fix call to alignf() in allocate_resource()Dominik Brodowski
The second parameter to alignf() in allocate_resource() must reflect what new resource is attempted to be allocated, else functions like pcibios_align_resource() (at least on x86) or pcmcia_align() can't work correctly. Commit 1e5ad9679016275d422e36b12a98b0927d76f556 broke this by setting the "new" resource until we're about to return success. To keep the resource untouched when allocate_resource() fails, a "tmp" resource is introduced. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-20sched: Fix hotplug hangPeter Zijlstra
The hot-unplug kstopmachine usage does a wakeup after deactivating the cpu, hence we cannot use cpu_active() here but must rely on the good olde online. Reported-by: Sachin Sant <sachinp@in.ibm.com> Reported-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Jens Axboe <jens.axboe@oracle.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <1261326987.4314.24.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>