aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2009-12-09sched: Sanitize fork() handlingPeter Zijlstra
Currently we try to do task placement in wake_up_new_task() after we do the load-balance pass in sched_fork(). This yields complicated semantics in that we have to deal with tasks on different RQs and the set_task_cpu() calls in copy_process() and sched_fork() Rename ->task_new() to ->task_fork() and call it from sched_fork() before the balancing, this gives the policy a clear point to place the task. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09sched: Clean up ttwu() rq lockingPeter Zijlstra
Since set_task_clock() doesn't rely on rq->clock anymore we can simplyfy the mess in ttwu(). Optimize things a bit by not fiddling with the IRQ state there. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09sched: Remove rq->clock coupling from set_task_cpu()Peter Zijlstra
set_task_cpu() should be rq invariant and only touch task state, it currently fails to do so, which opens up a few races, since not all callers hold both rq->locks. Remove the relyance on rq->clock, as any site calling set_task_cpu() should also do a remote clock update, which should ensure the observed time between these two cpus is monotonic, as per kernel/sched_clock.c:sched_clock_remote(). Therefore we can simply remove the clock_offset bits and be happy. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09sched: Consolidate select_task_rq() callersPeter Zijlstra
Small cleanup. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> [ v2: build fix ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09sched: Remove sysctl.sched_featuresPeter Zijlstra
Since we've had a much saner debugfs interface to this, remove the sysctl one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> [ v2: build fix ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09sched: Protect sched_rr_get_param() access to task->sched_classThomas Gleixner
sched_rr_get_param calls task->sched_class->get_rr_interval(task) without protection against a concurrent sched_setscheduler() call which modifies task->sched_class. Serialize the access with task_rq_lock(task) and hand the rq pointer into get_rr_interval() as it's needed at least in the sched_fair implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <alpine.LFD.2.00.0912090930120.3089@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09sched: Protect task->cpus_allowed access in sched_getaffinity()Thomas Gleixner
sched_getaffinity() is not protected against a concurrent modification of the tasks affinity. Serialize the access with task_rq_lock(task). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20091208202026.769251187@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-06sched: Fix balance vs hotplug racePeter Zijlstra
Since (e761b77: cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment) we have cpu_active_mask which is suppose to rule scheduler migration and load-balancing, except it never (fully) did. The particular problem being solved here is a crash in try_to_wake_up() where select_task_rq() ends up selecting an offline cpu because select_task_rq_fair() trusts the sched_domain tree to reflect the current state of affairs, similarly select_task_rq_rt() trusts the root_domain. However, the sched_domains are updated from CPU_DEAD, which is after the cpu is taken offline and after stop_machine is done. Therefore it can race perfectly well with code assuming the domains are right. Cure this by building the domains from cpu_active_mask on CPU_DOWN_PREPARE. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-06cpumask: Fix generate_sched_domains() for UPGeert Uytterhoeven
Commit acc3f5d7cabbfd6cec71f0c1f9900621fa2d6ae7 ("cpumask: Partition_sched_domains takes array of cpumask_var_t") changed the function signature of generate_sched_domains() for the CONFIG_SMP=y case, but forgot to update the corresponding function for the CONFIG_SMP=n case, causing: kernel/cpuset.c:2073: warning: passing argument 1 of 'generate_sched_domains' from incompatible pointer type Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <alpine.DEB.2.00.0912062038070.5693@ayla.of.borg> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-05Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits) sched, cputime: Introduce thread_group_times() sched, cputime: Cleanups related to task_times() Revert "sched, x86: Optimize branch hint in __switch_to()" sched: Fix isolcpus boot option sched: Revert 498657a478c60be092208422fefa9c7b248729c2 sched, time: Define nsecs_to_jiffies() sched: Remove task_{u,s,g}time() sched: Introduce task_times() to replace task_{u,s}time() pair sched: Limit the number of scheduler debug messages sched.c: Call debug_show_all_locks() when dumping all tasks sched, x86: Optimize branch hint in __switch_to() sched: Optimize branch hint in context_switch() sched: Optimize branch hint in pick_next_task_fair() sched_feat_write(): Update ppos instead of file->f_pos sched: Sched_rt_periodic_timer vs cpu hotplug sched, kvm: Fix race condition involving sched_in_preempt_notifers sched: More generic WAKE_AFFINE vs select_idle_sibling() sched: Cleanup select_task_rq_fair() sched: Fix granularity of task_u/stime() sched: Fix/add missing update_rq_clock() calls ...
2009-12-05Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (470 commits) x86: Fix comments of register/stack access functions perf tools: Replace %m with %a in sscanf hw-breakpoints: Keep track of user disabled breakpoints tracing/syscalls: Make syscall events print callbacks static tracing: Add DEFINE_EVENT(), DEFINE_SINGLE_EVENT() support to docbook perf: Don't free perf_mmap_data until work has been done perf_event: Fix compile error perf tools: Fix _GNU_SOURCE macro related strndup() build error trace_syscalls: Remove unused syscall_name_to_nr() trace_syscalls: Simplify syscall profile trace_syscalls: Remove duplicate init_enter_##sname() trace_syscalls: Add syscall_nr field to struct syscall_metadata trace_syscalls: Remove enter_id exit_id trace_syscalls: Set event_enter_##sname->data to its metadata trace_syscalls: Remove unused event_syscall_enter and event_syscall_exit perf_event: Initialize data.period in perf_swevent_hrtimer() perf probe: Simplify event naming perf probe: Add --list option for listing current probe events perf probe: Add argv_split() from lib/argv_split.c perf probe: Move probe event utility functions to probe-event.c ...
2009-12-05Merge branch 'tracing-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (40 commits) tracing: Separate raw syscall from syscall tracer ring-buffer-benchmark: Add parameters to set produce/consumer priorities tracing, function tracer: Clean up strstrip() usage ring-buffer benchmark: Run producer/consumer threads at nice +19 tracing: Remove the stale include/trace/power.h tracing: Only print objcopy version warning once from recordmcount tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used ring-buffer: Move access to commit_page up into function used tracing: do not disable interrupts for trace_clock_local ring-buffer: Add multiple iterations between benchmark timestamps kprobes: Sanitize struct kretprobe_instance allocations tracing: Fix to use __always_unused attribute compiler: Introduce __always_unused tracing: Exit with error if a weak function is used in recordmcount.pl tracing: Move conditional into update_funcs() in recordmcount.pl tracing: Add regex for weak functions in recordmcount.pl tracing: Move mcount section search to front of loop in recordmcount.pl tracing: Fix objcopy revision check in recordmcount.pl tracing: Check absolute path of input file in recordmcount.pl tracing: Correct the check for number of arguments in recordmcount.pl ...
2009-12-05Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Fix trace_marker output tracing: Fix event format export tracing: Fix return value of tracing_stats_read()
2009-12-05Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Fix spurious irq seqfile conversion genirq: switch /proc/irq/*/spurious to seq_file irq: Do not attempt to create subdirectories if /proc/irq/<irq> failed irq: Remove unused debug_poll_all_shared_irqs() irq: Fix docbook comments irq: trivial: Fix typo in comment for #endif
2009-12-05Merge branch 'core-softlockup-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softlockup: Fix hung_task_check_count sysctl
2009-12-05Merge branch 'core-signal-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: signal: Print warning message when dropping signals signal: Fix alternate signal stack check
2009-12-05Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) rcu: Make RCU's CPU-stall detector be default rcu: Add expedited grace-period support for preemptible RCU rcu: Enable fourth level of TREE_RCU hierarchy rcu: Rename "quiet" functions rcu: Re-arrange code to reduce #ifdef pain rcu: Eliminate unneeded function wrapping rcu: Fix grace-period-stall bug on large systems with CPU hotplug rcu: Eliminate __rcu_pending() false positives rcu: Further cleanups of use of lastcomp rcu: Simplify association of forced quiescent states with grace periods rcu: Accelerate callback processing on CPUs not detecting GP end rcu: Mark init-time-only rcu_bootup_announce() as __init rcu: Simplify association of quiescent states with grace periods rcu: Rename dynticks_completed to completed_fqs rcu: Enable synchronize_sched_expedited() fastpath rcu: Remove inline from forward-referenced functions rcu: Fix note_new_gpnum() uses of ->gpnum rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls ...
2009-12-05Merge branch 'core-printk-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ratelimit: Make suppressed output messages more useful printk: Remove ratelimit.h from kernel.h ratelimit: Fix/allow use in atomic contexts ratelimit: Use per ratelimit context locking
2009-12-05Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: mutex: Fix missing conditions to build mutex_spin_on_owner() mutex: Better control mutex adaptive spinning config locking, task_struct: Reduce size on TRACE_IRQFLAGS and 64bit locking: Use __[SPIN|RW]_LOCK_UNLOCKED in [spin|rw]_lock_init() locking: Remove unused prototype locking: Reduce ifdefs in kernel/spinlock.c locking: Make inlining decision Kconfig based
2009-12-05Merge branch 'core-ipi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: generic-ipi: Add smp_call_function_any() generic-ipi: Fix misleading smp_call_function*() description
2009-12-03mutex: Fix missing conditions to build mutex_spin_on_owner()Frederic Weisbecker
We don't need to build mutex_spin_on_owner() if we have CONFIG_DEBUG_MUTEXES or CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES as it won't be used under such configs. Use CONFIG_MUTEX_SPIN_ON_OWNER as it gathers all the necessary checks before building it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1259783357-8542-2-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org>
2009-12-03mutex: Better control mutex adaptive spinning configFrederic Weisbecker
Introduce CONFIG_MUTEX_SPIN_ON_OWNER so that we can centralize in a single place the conditions that determine its definition and use. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1259783357-8542-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org>
2009-12-03rcu: Add expedited grace-period support for preemptible RCUPaul E. McKenney
Implement an synchronize_rcu_expedited() for preemptible RCU that actually is expedited. This uses synchronize_sched_expedited() to force all threads currently running in a preemptible-RCU read-side critical section onto the appropriate ->blocked_tasks[] list, then takes a snapshot of all of these lists and waits for them to drain. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1259784616158-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03rcu: Enable fourth level of TREE_RCU hierarchyPaul E. McKenney
Enable a fourth level of rcu_node hierarchy for TREE_RCU and TREE_PREEMPT_RCU. This is for stress-testing and experiemental purposes only, although in theory this would enable 16,777,216 CPUs on 64-bit systems, though only 1,048,576 CPUs on 32-bit systems. Normal experimental use of this fourth level will normally set CONFIG_RCU_FANOUT=2, requiring a 16-CPU system, though the more adventurous (and more fortunate) experimenters may wish to chose CONFIG_RCU_FANOUT=3 for 81-CPU systems or even CONFIG_RCU_FANOUT=4 for 256-CPU systems. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12597846161257-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03rcu: Rename "quiet" functionsPaul E. McKenney
The number of "quiet" functions has grown recently, and the names are no longer very descriptive. The point of all of these functions is to do some portion of the task of reporting a quiescent state, so rename them accordingly: o cpu_quiet() becomes rcu_report_qs_rdp(), which reports a quiescent state to the per-CPU rcu_data structure. If this turns out to be a new quiescent state for this grace period, then rcu_report_qs_rnp() will be invoked to propagate the quiescent state up the rcu_node hierarchy. o cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports a quiescent state for a given CPU (or possibly a set of CPUs) up the rcu_node hierarchy. o cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which reports a full set of quiescent states to the global rcu_state structure. o task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports a quiescent state due to a task exiting an RCU read-side critical section that had previously blocked in that same critical section. As indicated by the new name, this type of quiescent state is reported up the rcu_node hierarchy (using rcu_report_qs_rnp() to do so). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12597846163698-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03Merge branch 'master' into nextJames Morris
2009-12-02modules: don't export section names of empty sections via sysfsHelge Deller
On the parisc architecture we face for each and every loaded kernel module this kernel "badness warning": sysfs: cannot create duplicate filename '/module/ac97_bus/sections/.text' Badness at fs/sysfs/dir.c:487 Reason for that is, that on parisc all kernel modules do have multiple .text sections due to the usage of the -ffunction-sections compiler flag which is needed to reach all jump targets on this platform. An objdump on such a kernel module gives: Sections: Idx Name Size VMA LMA File off Algn 0 .note.gnu.build-id 00000024 00000000 00000000 00000034 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .text 00000000 00000000 00000000 00000058 2**0 CONTENTS, ALLOC, LOAD, READONLY, CODE 2 .text.ac97_bus_match 0000001c 00000000 00000000 00000058 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 3 .text 00000000 00000000 00000000 000000d4 2**0 CONTENTS, ALLOC, LOAD, READONLY, CODE ... Since the .text sections are empty (size of 0 bytes) and won't be loaded by the kernel module loader anyway, I don't see a reason why such sections need to be listed under /sys/module/<module_name>/sections/<section_name> either. The attached patch does solve this issue by not exporting section names which are empty. This fixes bugzilla http://bugzilla.kernel.org/show_bug.cgi?id=14703 Signed-off-by: Helge Deller <deller@gmx.de> CC: rusty@rustcorp.com.au CC: akpm@linux-foundation.org CC: James.Bottomley@HansenPartnership.com CC: roland@redhat.com CC: dave@hiauly1.hia.nrc.ca Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-02sched, cputime: Introduce thread_group_times()Hidetoshi Seto
This is a real fix for problem of utime/stime values decreasing described in the thread: http://lkml.org/lkml/2009/11/3/522 Now cputime is accounted in the following way: - {u,s}time in task_struct are increased every time when the thread is interrupted by a tick (timer interrupt). - When a thread exits, its {u,s}time are added to signal->{u,s}time, after adjusted by task_times(). - When all threads in a thread_group exits, accumulated {u,s}time (and also c{u,s}time) in signal struct are added to c{u,s}time in signal struct of the group's parent. So {u,s}time in task struct are "raw" tick count, while {u,s}time and c{u,s}time in signal struct are "adjusted" values. And accounted values are used by: - task_times(), to get cputime of a thread: This function returns adjusted values that originates from raw {u,s}time and scaled by sum_exec_runtime that accounted by CFS. - thread_group_cputime(), to get cputime of a thread group: This function returns sum of all {u,s}time of living threads in the group, plus {u,s}time in the signal struct that is sum of adjusted cputimes of all exited threads belonged to the group. The problem is the return value of thread_group_cputime(), because it is mixed sum of "raw" value and "adjusted" value: group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time) This misbehavior can break {u,s}time monotonicity. Assume that if there is a thread that have raw values greater than adjusted values (e.g. interrupted by 1000Hz ticks 50 times but only runs 45ms) and if it exits, cputime will decrease (e.g. -5ms). To fix this, we could do: group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time) But task_times() contains hard divisions, so applying it for every thread should be avoided. This patch fixes the above problem in the following way: - Modify thread's exit (= __exit_signal()) not to use task_times(). It means {u,s}time in signal struct accumulates raw values instead of adjusted values. As the result it makes thread_group_cputime() to return pure sum of "raw" values. - Introduce a new function thread_group_times(*task, *utime, *stime) that converts "raw" values of thread_group_cputime() to "adjusted" values, in same calculation procedure as task_times(). - Modify group's exit (= wait_task_zombie()) to use this introduced thread_group_times(). It make c{u,s}time in signal struct to have adjusted values like before this patch. - Replace some thread_group_cputime() by thread_group_times(). This replacements are only applied where conveys the "adjusted" cputime to users, and where already uses task_times() near by it. (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.) This patch have a positive side effect: - Before this patch, if a group contains many short-life threads (e.g. runs 0.9ms and not interrupted by ticks), the group's cputime could be invisible since thread's cputime was accumulated after adjusted: imagine adjustment function as adj(ticks, runtime), {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0. After this patch it will not happen because the adjustment is applied after accumulated. v2: - remove if()s, put new variables into signal_struct. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Spencer Candland <spencer@bluehost.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> LKML-Reference: <4B162517.8040909@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02sched, cputime: Cleanups related to task_times()Hidetoshi Seto
- Remove if({u,s}t)s because no one call it with NULL now. - Use cputime_{add,sub}(). - Add ifndef-endif for prev_{u,s}time since they are used only when !VIRT_CPU_ACCOUNTING. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Spencer Candland <spencer@bluehost.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> LKML-Reference: <4B1624C7.7040302@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02sched: Fix isolcpus boot optionRusty Russell
Anton Blanchard wrote: > We allocate and zero cpu_isolated_map after the isolcpus > __setup option has run. This means cpu_isolated_map always > ends up empty and if CPUMASK_OFFSTACK is enabled we write to a > cpumask that hasn't been allocated. I introduced this regression in 49557e620339cb13 (sched: Fix boot crash by zalloc()ing most of the cpu masks). Use the bootmem allocator if they set isolcpus=, otherwise allocate and zero like normal. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: peterz@infradead.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <200912021409.17013.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Anton Blanchard <anton@samba.org>
2009-12-02sched: Revert 498657a478c60be092208422fefa9c7b248729c2Tejun Heo
498657a478c60be092208422fefa9c7b248729c2 incorrectly assumed that preempt wasn't disabled around context_switch() and thus was fixing imaginary problem. It also broke KVM because it depended on ->sched_in() to be called with irq enabled so that it can do smp calls from there. Revert the incorrect commit and add comment describing different contexts under with the two callbacks are invoked. Avi: spotted transposed in/out in the added comment. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Avi Kivity <avi@redhat.com> Cc: peterz@infradead.org Cc: efault@gmx.de Cc: rusty@rustcorp.com.au LKML-Reference: <1259726212-30259-2-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02perf: Don't free perf_mmap_data until work has been doneKristian Høgsberg
In the CONFIG_PERF_USE_VMALLOC case, perf_mmap_data_free() only schedules the cleanup of the perf_mmap_data struct. In that case we have to wait until the work has been done before we free data. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> LKML-Reference: <1259697901-1747-1-git-send-email-krh@bitplanet.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01trace_syscalls: Remove unused syscall_name_to_nr()Lai Jiangshan
After duplications are removed, syscall_name_to_nr() is unused. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B14D2A6.6060803@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01trace_syscalls: Simplify syscall profileLai Jiangshan
use only one prof_sysenter_enable() instead of prof_sysenter_enable_##sname() use only one prof_sysenter_disable() instead of prof_sysenter_disable_##sname() use only one prof_sysexit_enable() instead of prof_sysexit_enable_##sname() use only one prof_sysexit_disable() instead of prof_sysexit_disable_##sname() Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B14D2A1.8060304@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01trace_syscalls: Remove duplicate init_enter_##sname()Lai Jiangshan
use only one init_syscall_trace instead of many init_enter_##sname()/init_exit_##sname() Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B14D29B.6090708@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01trace_syscalls: Add syscall_nr field to struct syscall_metadataLai Jiangshan
Add syscall_nr field to struct syscall_metadata, it helps us to get syscall number easier. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B14D293.6090800@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01trace_syscalls: Remove enter_id exit_idLai Jiangshan
use ->enter_event->id instead of ->enter_id use ->exit_event->id instead of ->exit_id Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B14D288.7030001@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01trace_syscalls: Set event_enter_##sname->data to its metadataLai Jiangshan
Set event_enter_##sname->data to its metadata, it makes codes simpler. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B14D282.7050709@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01trace_syscalls: Remove unused event_syscall_enter and event_syscall_exitLai Jiangshan
fix event_enter_##sname->event fix event_exit_##sname->event remove unused event_syscall_enter and event_syscall_exit Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B14D278.4090209@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01SLOW_WORK: Move slow_work's proc file to debugfsDavid Howells
Move slow_work's debugging proc file to debugfs. Signed-off-by: David Howells <dhowells@redhat.com> Requested-and-acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-01SLOW_WORK: Fix the CONFIG_MODULES=n caseDavid Howells
Commits 3d7a641 ("SLOW_WORK: Wait for outstanding work items belonging to a module to clear") introduced some code to make sure that all of a module's slow-work items were complete before that module was removed, and commit 3bde31a ("SLOW_WORK: Allow a requeueable work item to sleep till the thread is needed") further extended that, breaking it in the process if CONFIG_MODULES=n: CC kernel/slow-work.o kernel/slow-work.c: In function 'slow_work_execute': kernel/slow-work.c:313: error: 'slow_work_thread_processing' undeclared (first use in this function) kernel/slow-work.c:313: error: (Each undeclared identifier is reported only once kernel/slow-work.c:313: error: for each function it appears in.) kernel/slow-work.c: In function 'slow_work_wait_for_items': kernel/slow-work.c:950: error: 'slow_work_unreg_sync_lock' undeclared (first use in this function) kernel/slow-work.c:951: error: 'slow_work_unreg_wq' undeclared (first use in this function) kernel/slow-work.c:961: error: 'slow_work_unreg_work_item' undeclared (first use in this function) kernel/slow-work.c:974: error: 'slow_work_unreg_module' undeclared (first use in this function) kernel/slow-work.c:977: error: 'slow_work_thread_processing' undeclared (first use in this function) make[1]: *** [kernel/slow-work.o] Error 1 Fix this by: (1) Extracting the bits of slow_work_execute() that are contingent on CONFIG_MODULES, and the bits that should be, into inline functions and placing them into the #ifdef'd section that defines the relevant variables and adding stubs for moduleless kernels. This allows the removal of some #ifdefs. (2) #ifdef'ing out the contents of slow_work_wait_for_items() in moduleless kernels. The four functions related to handling module unloading synchronisation (and their associated variables) could be offloaded into a separate .c file, but each function is only used once and three of them are tiny, so doing so would prevent them from being inlined. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-01perf_event: Initialize data.period in perf_swevent_hrtimer()Xiao Guangrong
In current code in perf_swevent_hrtimer(), data.period is not initialized, The result is obvious wrong: # ./perf record -f -e cpu-clock make # ./perf report # Samples: 1740 # # Overhead Command ...... # ........ ........ .......................................... # 1025422183050275328.00% sh libc-2.9.90.so ... 1025422183050275328.00% perl libperl.so ... 1025422168240043264.00% perl [kernel] ... 1025422030011210752.00% perl [kernel] ... Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> LKML-Reference: <4B14E220.2050107@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01trace_kprobes: Fix a memory leak bug and check kstrdup() return valueMasami Hiramatsu
Fix a memory leak case in create_trace_probe(). When an argument is too long (> MAX_ARGSTR_LEN), it just jumps to error path. In that case tp->args[i].name is not released. This also fixes a bug to check kstrdup()'s return value. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091201001919.10235.56455.stgit@harusame> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27trace_kprobes: Don't output zero offsetLai Jiangshan
"symbol_name+0" is not so friendly. It makes the output longer. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B0CEBCB.7080309@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27trace_kprobes: Always show group nameLai Jiangshan
Sometimes the group name is not "kprobes", It'll be better if we can read it from tracing/kprobe_events. # echo 'r:laijs/vfs_read vfs_read %ax' > kprobe_events # cat kprobe_events r:laijs/vfs_read vfs_read %ax=%ax Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B0CEBAF.6000104@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27trace_kprobes: Fix memory leakLai Jiangshan
tp->nr_args is not set before we "goto error", it causes memory leak for free_trace_probe() use tp->nr_args to free memory of args. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B0CEB95.2060107@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27trace_syscalls: Add syscall nr fieldLai Jiangshan
Field syscall number is missed in syscall_enter_define_fields()/ syscall_exit_define_fields(). Syscall number is also needed for event filter or other users. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4B0E330D.1070206@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27hw-breakpoints: Use struct perf_event_attr to define kernel breakpointsFrederic Weisbecker
Kernel breakpoints are created using functions in which we pass breakpoint parameters as individual variables: address, length and type. Although it fits well for x86, this just does not scale across architectures that may support this api later as these may have more or different needs. Pass in a perf_event_attr structure instead because it is meant to evolve as much as possible into a generic hardware breakpoint parameter structure. Reported-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259294154-5197-2-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27hw-breakpoints: Use struct perf_event_attr to define user breakpointsFrederic Weisbecker
In-kernel user breakpoints are created using functions in which we pass breakpoint parameters as individual variables: address, length and type. Although it fits well for x86, this just does not scale across archictectures that may support this api later as these may have more or different needs. Pass in a perf_event_attr structure instead because it is meant to evolve as much as possible into a generic hardware breakpoint parameter structure. Reported-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259294154-5197-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27softlockup: Fix hung_task_check_count sysctlAnton Blanchard
I'm seeing spikes of up to 0.5ms in khungtaskd on a large machine. To reduce this source of jitter I tried setting hung_task_check_count to 0: # echo 0 > /proc/sys/kernel/hung_task_check_count which didn't have the intended response. Change to a post increment of max_count, so a value of 0 means check 0 tasks. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: msb@google.com LKML-Reference: <20091127022820.GU32182@kryten> Signed-off-by: Ingo Molnar <mingo@elte.hu>