aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2008-11-24sched: remove any_online_cpu()Rusty Russell
Impact: use new API any_online_cpu() is a good name, but it takes a cpumask_t, not a pointer. There are several places where any_online_cpu() doesn't really want a mask arg at all. Replace all callers with cpumask_any() and cpumask_any_and(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24sched: get rid of boutique sched.c allocations, use cpumask_var_t.Rusty Russell
Impact: use new general API Using lots of allocs rather than one big alloc is less efficient, but who cares for this setup function? Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24sched: convert sched.c from for_each_cpu_mask to for_each_cpu.Rusty Russell
Impact: trivial API conversion This is a simple conversion, but note that for_each_cpu() terminates with i >= nr_cpu_ids, not i == NR_CPUS like for_each_cpu_mask() did. I don't convert all of them: sd->span changes in a later patch, so change those iterators there rather than here. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24sched: reduce stack size requirements in kernel/sched.cMike Travis
Impact: cleanup * use node_to_cpumask_ptr in place of node_to_cpumask to reduce stack requirements in sched.c Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24Merge branches 'sched/core', 'core/core' and 'tracing/core' into cpus4096Ingo Molnar
2008-11-24Merge branches 'tracing/branch-tracer', 'tracing/fastboot', ↵Ingo Molnar
'tracing/ftrace', 'tracing/function-return-tracer', 'tracing/power-tracer', 'tracing/powerpc', 'tracing/ring-buffer', 'tracing/stack-tracer' and 'tracing/urgent' into tracing/core
2008-11-24Merge branches 'core/debug', 'core/futexes', 'core/locking', 'core/rcu', ↵Ingo Molnar
'core/signal', 'core/urgent' and 'core/xen' into core/core
2008-11-24Merge branch 'sched/rt' into sched/coreIngo Molnar
2008-11-24mutex: __used is needed for function referenced only from inline asmTörök Edwin
Impact: fix build failure on llvm-gcc-4.2 According to the gcc manual, the 'used' attribute should be applied to functions referenced only from inline assembly. This fixes a build failure with llvm-gcc-4.2, which deleted __mutex_lock_slowpath, __mutex_unlock_slowpath. Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing/function-return-tracer: free the return stack on free_task()Frederic Weisbecker
Impact: avoid losing some traces when a task is freed do_exit() is not the last function called when a task finishes. There are still some functions which are to be called such as ree_task(). So we delay the freeing of the return stack to the last moment. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23x86, mmiotrace: fix buffer overrun detectionPekka Paalanen
Impact: fix mmiotrace overrun tracing When ftrace framework moved to use the ring buffer facility, the buffer overrun detection was broken after 2.6.27 by commit | commit 3928a8a2d98081d1bc3c0a84a2d70e29b90ecf1c | Author: Steven Rostedt <rostedt@goodmis.org> | Date: Mon Sep 29 23:02:41 2008 -0400 | | ftrace: make work with new ring buffer | | This patch ports ftrace over to the new ring buffer. The detection is now fixed by using the ring buffer API. When mmiotrace detects a buffer overrun, it will report the number of lost events. People reading an mmiotrace log must know if something was missed, otherwise the data may not make sense. Signed-off-by: Pekka Paalanen <pq@iki.fi> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing/function-return-tracer: don't trace kfree while it frees the return ↵Frederic Weisbecker
stack Impact: fix a crash While I killed the cat process, I got sometimes the following (but rare) crash: [ 65.689027] Pid: 2969, comm: cat Not tainted (2.6.28-rc6-tip #83) AMILO Li 2727 [ 65.689027] EIP: 0060:[<00000000>] EFLAGS: 00010082 CPU: 1 [ 65.689027] EIP is at 0x0 [ 65.689027] EAX: 00000000 EBX: f66cd780 ECX: c019a64a EDX: f66cd780 [ 65.689027] ESI: 00000286 EDI: f66cd780 EBP: f630be2c ESP: f630be24 [ 65.689027] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 65.689027] Process cat (pid: 2969, ti=f630a000 task=f66cd780 task.ti=f630a000) [ 65.689027] Stack: [ 65.689027] 00000012 f630bd54 f630be7c c012c853 00000000 c0133cc9 f66cda54 f630be5c [ 65.689027] f630be68 f66cda54 f66cd88c f66cd878 f7070000 00000001 f630be90 c0135dbc [ 65.689027] f614a614 f630be68 f630be68 f65ba200 00000002 f630bf10 f630be90 c012cad6 [ 65.689027] Call Trace: [ 65.689027] [<c012c853>] ? do_exit+0x603/0x850 [ 65.689027] [<c0133cc9>] ? next_signal+0x9/0x40 [ 65.689027] [<c0135dbc>] ? dequeue_signal+0x8c/0x180 [ 65.689027] [<c012cad6>] ? do_group_exit+0x36/0x90 [ 65.689027] [<c013709c>] ? get_signal_to_deliver+0x20c/0x390 [ 65.689027] [<c0102b69>] ? do_notify_resume+0x99/0x8b0 [ 65.689027] [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80 [ 65.689027] [<c014db9b>] ? trace_hardirqs_on+0xb/0x10 [ 65.689027] [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80 [ 65.689027] [<c02e39b0>] ? n_tty_write+0x0/0x340 [ 65.689027] [<c02e1812>] ? redirected_tty_write+0x82/0x90 [ 65.689027] [<c019ee99>] ? vfs_write+0x99/0xd0 [ 65.689027] [<c02e1790>] ? redirected_tty_write+0x0/0x90 [ 65.689027] [<c019f342>] ? sys_write+0x42/0x70 [ 65.689027] [<c01035ca>] ? work_notifysig+0x13/0x19 [ 65.689027] Code: Bad EIP value. [ 65.689027] EIP: [<00000000>] 0x0 SS:ESP 0068:f630be24 This is because on do_exit(), kfree is called to free the return addresses stack but kfree is traced and stored its return address in this stack. This patch fixes it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing/stack-tracer: avoid races accessing fileTörök Edwin
Impact: fix race vma->vm_file reference is only stable while holding the mmap_sem, so move usage of it to within the critical section. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing/stack-tracer: introduce CONFIG_USER_STACKTRACE_SUPPORTTörök Edwin
Impact: cleanup User stack tracing is just implemented for x86, but it is not x86 specific. Introduce a generic config flag, that is currently enabled only for x86. When other arches implement it, they will have to SELECT USER_STACKTRACE_SUPPORT. Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing/stack-tracer: fix locking and refcountsTörök Edwin
Impact: fix refcounting/object-access bug Hold mmap_sem while looking up/accessing vma. Hold the RCU lock while using the task we looked up. Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing/stack-tracer: fix style issuesTörök Edwin
Impact: cleanup Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23trace: fix compiler warning in branch profilerSteven Rostedt
Impact: fix compiler warning The ftrace_pointers used in the branch profiler are constant values. They should never change. But the compiler complains when they are passed into the debugfs_create_file as a data pointer, because the function discards the qualifier. This patch typecasts the parameter to debugfs_create_file back to a void pointer. To remind the callbacks that they are pointing to a constant value, I also modified the callback local pointers to be const struct ftrace_pointer * as well. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23ftrace: add ftrace_off_permanentSteven Rostedt
Impact: add new API to disable all of ftrace on anomalies It case of a serious anomaly being detected (like something caught by lockdep) it is a good idea to disable all tracing immediately, without grabing any locks. This patch adds ftrace_off_permanent that disables the tracers, function tracing and ring buffers without a way to enable them again. This should only be used when something serious has been detected. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23ring-buffer: add tracing_off_permanentSteven Rostedt
Impact: feature to permanently disable ring buffer This patch adds a API to the ring buffer code that will permanently disable the ring buffer from ever recording. This should only be called when some serious anomaly is detected, and the system may be in an unstable state. When that happens, shutting down the recording to the ring buffers may be appropriate. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23trace: profile all if conditionalsSteven Rostedt
Impact: feature to profile if statements This patch adds a branch profiler for all if () statements. The results will be found in: /debugfs/tracing/profile_branch For example: miss hit % Function File Line ------- --------- - -------- ---- ---- 0 1 100 x86_64_start_reservations head64.c 127 0 1 100 copy_bootdata head64.c 69 1 0 0 x86_64_start_kernel head64.c 111 32 0 0 set_intr_gate desc.h 319 1 0 0 reserve_ebda_region head.c 51 1 0 0 reserve_ebda_region head.c 47 0 1 100 reserve_ebda_region head.c 42 0 0 X maxcpus main.c 165 Miss means the branch was not taken. Hit means the branch was taken. The percent is the percentage the branch was taken. This adds a significant amount of overhead and should only be used by those analyzing their system. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23trace: branch profiling should not print percent without dataSteven Rostedt
Impact: cleanup on output of branch profiler When a branch has not been taken, it does not make sense to show a percentage incorrect or hit. This patch changes the behaviour to print out a 'X' when the branch has not been executed yet. For example: correct incorrect % Function File Line ------- --------- - -------- ---- ---- 2096 0 0 do_arch_prctl process_64.c 832 0 0 X do_arch_prctl process_64.c 804 2604 0 0 IS_ERR err.h 34 130228 5765 4 __switch_to process_64.c 673 0 0 X enable_TSC process_64.c 448 0 0 X disable_TSC process_64.c 431 Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23trace: consolidate unlikely and likely profilerSteven Rostedt
Impact: clean up to make one profiler of like and unlikely tracer The likely and unlikely profiler prints out the file and line numbers of the annotated branches that it is profiling. It shows the number of times it was correct or incorrect in its guess. Having two different files or sections for that matter to tell us if it was a likely or unlikely is pretty pointless. We really only care if it was correct or not. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing: allow tracing of suspend/resume & hibernation code againIngo Molnar
Impact: widen function-tracing to suspend+resume (and hibernation) sequences Now that the ftrace kernel thread is gone, we can allow tracing during suspend/resume again. So revert these two commits: f42ac38c5 "ftrace: disable tracing for suspend to ram" 41108eb10 "ftrace: disable tracing for hibernation" This should be tested very carefully, as it could interact with altneratives instruction patching, etc. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing: identify which executable object the userspace address belongs toTörök Edwin
Impact: modify+improve the userstacktrace tracing visualization feature Store thread group leader id, and use it to lookup the address in the process's map. We could have looked up the address on thread's map, but the thread might not exist by the time we are called. The process might not exist either, but if you are reading trace_pipe, that is unlikely. Example usage: mount -t debugfs nodev /sys/kernel/debug cd /sys/kernel/debug/tracing echo userstacktrace >iter_ctrl echo sym-userobj >iter_ctrl echo sched_switch >current_tracer echo 1 >tracing_enabled cat trace_pipe >/tmp/trace& .... run application ... echo 0 >tracing_enabled cat /tmp/trace You'll see stack entries like: /lib/libpthread-2.7.so[+0xd370] You can convert them to function/line using: addr2line -fie /lib/libpthread-2.7.so 0xd370 Or: addr2line -fie /usr/lib/debug/libpthread-2.7.so 0xd370 For non-PIC/PIE executables this won't work: a.out[+0x73b] You need to run the following: addr2line -fie a.out 0x40073b (where 0x400000 is the default load address of a.out) Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing: add support for userspace stacktraces in tracing/iter_ctrlTörök Edwin
Impact: add new (default-off) tracing visualization feature Usage example: mount -t debugfs nodev /sys/kernel/debug cd /sys/kernel/debug/tracing echo userstacktrace >iter_ctrl echo sched_switch >current_tracer echo 1 >tracing_enabled .... run application ... echo 0 >tracing_enabled Then read one of 'trace','latency_trace','trace_pipe'. To get the best output you can compile your userspace programs with frame pointers (at least glibc + the app you are tracing). Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing/function-return-tracer: clean up task start/exit callbacksIngo Molnar
Impact: cleanup Eliminate #ifdefs in core code by using empty inline functions. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing/function-return-tracer: store return stack into task_struct and ↵Frederic Weisbecker
allocate it dynamically Impact: use deeper function tracing depth safely Some tests showed that function return tracing needed a more deeper depth of function calls. But it could be unsafe to store these return addresses to the stack. So these arrays will now be allocated dynamically into task_struct of current only when the tracer is activated. Typical scheme when tracer is activated: - allocate a return stack for each task in global list. - fork: allocate the return stack for the newly created task - exit: free return stack of current - idle init: same as fork I chose a default depth of 50. I don't have overruns anymore. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23Merge branches 'tracing/profiling', 'tracing/options' and 'tracing/urgent' ↵Ingo Molnar
into tracing/core
2008-11-21lockdep: consistent alignement for lockdep infoLi Zefan
Impact: prettify /proc/lockdep_info Just feel odd that not all lines of lockdep info are aligned. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-21sched: update comment for move_task_off_dead_cpuVegard Nossum
Impact: cleanup This commit: commit f7b4cddcc5aca03e80e357360c9424dfba5056c2 Author: Oleg Nesterov <oleg@tv-sign.ru> Date: Tue Oct 16 23:30:56 2007 -0700 do CPU_DEAD migrating under read_lock(tasklist) instead of write_lock_irq(ta Currently move_task_off_dead_cpu() is called under write_lock_irq(tasklist). This means it can't use task_lock() which is needed to improve migrating to take task's ->cpuset into account. Change the code to call move_task_off_dead_cpu() with irqs enabled, and change migrate_live_tasks() to use read_lock(tasklist). ...forgot to update the comment in front of move_task_off_dead_cpu. Reference: http://lkml.org/lkml/2008/6/23/135 Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-21Merge commit 'v2.6.28-rc6' into sched/coreIngo Molnar
2008-11-21function tracing: fix wrong position computing of stack_traceLiming Wang
Impact: make output of stack_trace complete if buffer overruns When read buffer overruns, the output of stack_trace isn't complete. When printing records with seq_printf in t_show, if the read buffer has overruned by the current record, then this record won't be printed to user space through read buffer, it will just be dropped in this printing. When next printing, t_start should return the "*pos"th record, which is the one dropped by previous printing, but it just returns (m->private + *pos)th record. Here we use a more sane method to implement seq_operations which can be found in kernel code. Thus we needn't initialize m->private. About testing, it's not easy to overrun read buffer, but we can use seq_printf to print more padding bytes in t_show, then it's easy to check whether or not records are lost. This commit has been tested on both condition of overrun and non overrun. Signed-off-by: Liming Wang <liming.wang@windriver.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-20Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ftrace: fix dyn ftrace filter selection ftrace: make filtered functions effective on setting ftrace: fix set_ftrace_filter trace: introduce missing mutex_unlock() tracing: kernel/trace/trace.c: introduce missing kfree()
2008-11-19cgroups: fix a serious bug in cgroupstatsLi Zefan
Try this, and you'll get oops immediately: # cd Documentation/accounting/ # gcc -o getdelays getdelays.c # mount -t cgroup -o debug xxx /mnt # ./getdelays -C /mnt/tasks Because a normal file's dentry->d_fsdata is a pointer to struct cftype, not struct cgroup. After the patch, it returns EINVAL if we try to get cgroupstats from a normal file. Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19sprint_symbol(): use less stackHugh Dickins
sprint_symbol(), itself used when dumping stacks, has been wasting 128 bytes of stack: lookup the symbol directly into the buffer supplied by the caller, instead of using a locally declared namebuf. I believe the name != buffer strcpy() is obsolete: the design here dates from when module symbol lookup pointed into a supposedly const but sadly volatile table; nowadays it copies, but an uncalled strcpy() looks better here than the risk of a recursive BUG_ON(). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19cgroup: fix potential deadlock in pre_destroyKAMEZAWA Hiroyuki
As Balbir pointed out, memcg's pre_destroy handler has potential deadlock. It has following lock sequence. cgroup_mutex (cgroup_rmdir) -> pre_destroy -> mem_cgroup_pre_destroy-> force_empty -> cpu_hotplug.lock. (lru_add_drain_all-> schedule_work-> get_online_cpus) But, cpuset has following. cpu_hotplug.lock (call notifier) -> cgroup_mutex. (within notifier) Then, this lock sequence should be fixed. Considering how pre_destroy works, it's not necessary to holding cgroup_mutex() while calling it. As a side effect, we don't have to wait at this mutex while memcg's force_empty works.(it can be long when there are tons of pages.) Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19cpuset: update top cpuset's mems after adding a nodeMiao Xie
After adding a node into the machine, top cpuset's mems isn't updated. By reviewing the code, we found that the update function cpuset_track_online_nodes() was invoked after node_states[N_ONLINE] changes. It is wrong because N_ONLINE just means node has pgdat, and if node has/added memory, we use N_HIGH_MEMORY. So, We should invoke the update function after node_states[N_HIGH_MEMORY] changes, just like its commit says. This patch fixes it. And we use notifier of memory hotplug instead of direct calling of cpuset_track_online_nodes(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Paul Menage <menage@google.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19reintroduce accept4Ulrich Drepper
Introduce a new accept4() system call. The addition of this system call matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(), inotify_init1(), epoll_create1(), pipe2()) which added new system calls that differed from analogous traditional system calls in adding a flags argument that can be used to access additional functionality. The accept4() system call is exactly the same as accept(), except that it adds a flags bit-mask argument. Two flags are initially implemented. (Most of the new system calls in 2.6.27 also had both of these flags.) SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled for the new file descriptor returned by accept4(). This is a useful security feature to avoid leaking information in a multithreaded program where one thread is doing an accept() at the same time as another thread is doing a fork() plus exec(). More details here: http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling", Ulrich Drepper). The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag to be enabled on the new open file description created by accept4(). (This flag is merely a convenience, saving the use of additional calls fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result. Here's a test program. Works on x86-32. Should work on x86-64, but I (mtk) don't have a system to hand to test with. It tests accept4() with each of the four possible combinations of SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies that the appropriate flags are set on the file descriptor/open file description returned by accept4(). I tested Ulrich's patch in this thread by applying against 2.6.28-rc2, and it passes according to my test program. /* test_accept4.c Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk <mtk.manpages@gmail.com> Licensed under the GNU GPLv2 or later. */ #define _GNU_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <sys/socket.h> #include <netinet/in.h> #include <stdlib.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #define PORT_NUM 33333 #define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0) /**********************************************************************/ /* The following is what we need until glibc gets a wrapper for accept4() */ /* Flags for socket(), socketpair(), accept4() */ #ifndef SOCK_CLOEXEC #define SOCK_CLOEXEC O_CLOEXEC #endif #ifndef SOCK_NONBLOCK #define SOCK_NONBLOCK O_NONBLOCK #endif #ifdef __x86_64__ #define SYS_accept4 288 #elif __i386__ #define USE_SOCKETCALL 1 #define SYS_ACCEPT4 18 #else #error "Sorry -- don't know the syscall # on this architecture" #endif static int accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags) { printf("Calling accept4(): flags = %x", flags); if (flags != 0) { printf(" ("); if (flags & SOCK_CLOEXEC) printf("SOCK_CLOEXEC"); if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK)) printf(" "); if (flags & SOCK_NONBLOCK) printf("SOCK_NONBLOCK"); printf(")"); } printf("\n"); #if USE_SOCKETCALL long args[6]; args[0] = fd; args[1] = (long) sockaddr; args[2] = (long) addrlen; args[3] = flags; return syscall(SYS_socketcall, SYS_ACCEPT4, args); #else return syscall(SYS_accept4, fd, sockaddr, addrlen, flags); #endif } /**********************************************************************/ static int do_test(int lfd, struct sockaddr_in *conn_addr, int closeonexec_flag, int nonblock_flag) { int connfd, acceptfd; int fdf, flf, fdf_pass, flf_pass; struct sockaddr_in claddr; socklen_t addrlen; printf("=======================================\n"); connfd = socket(AF_INET, SOCK_STREAM, 0); if (connfd == -1) die("socket"); if (connect(connfd, (struct sockaddr *) conn_addr, sizeof(struct sockaddr_in)) == -1) die("connect"); addrlen = sizeof(struct sockaddr_in); acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen, closeonexec_flag | nonblock_flag); if (acceptfd == -1) { perror("accept4()"); close(connfd); return 0; } fdf = fcntl(acceptfd, F_GETFD); if (fdf == -1) die("fcntl:F_GETFD"); fdf_pass = ((fdf & FD_CLOEXEC) != 0) == ((closeonexec_flag & SOCK_CLOEXEC) != 0); printf("Close-on-exec flag is %sset (%s); ", (fdf & FD_CLOEXEC) ? "" : "not ", fdf_pass ? "OK" : "failed"); flf = fcntl(acceptfd, F_GETFL); if (flf == -1) die("fcntl:F_GETFD"); flf_pass = ((flf & O_NONBLOCK) != 0) == ((nonblock_flag & SOCK_NONBLOCK) !=0); printf("nonblock flag is %sset (%s)\n", (flf & O_NONBLOCK) ? "" : "not ", flf_pass ? "OK" : "failed"); close(acceptfd); close(connfd); printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL"); return fdf_pass && flf_pass; } static int create_listening_socket(int port_num) { struct sockaddr_in svaddr; int lfd; int optval; memset(&svaddr, 0, sizeof(struct sockaddr_in)); svaddr.sin_family = AF_INET; svaddr.sin_addr.s_addr = htonl(INADDR_ANY); svaddr.sin_port = htons(port_num); lfd = socket(AF_INET, SOCK_STREAM, 0); if (lfd == -1) die("socket"); optval = 1; if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval)) == -1) die("setsockopt"); if (bind(lfd, (struct sockaddr *) &svaddr, sizeof(struct sockaddr_in)) == -1) die("bind"); if (listen(lfd, 5) == -1) die("listen"); return lfd; } int main(int argc, char *argv[]) { struct sockaddr_in conn_addr; int lfd; int port_num; int passed; passed = 1; port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM; memset(&conn_addr, 0, sizeof(struct sockaddr_in)); conn_addr.sin_family = AF_INET; conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); conn_addr.sin_port = htons(port_num); lfd = create_listening_socket(port_num); if (!do_test(lfd, &conn_addr, 0, 0)) passed = 0; if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0)) passed = 0; if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK)) passed = 0; if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK)) passed = 0; close(lfd); exit(passed ? EXIT_SUCCESS : EXIT_FAILURE); } [mtk.manpages@gmail.com: rewrote changelog, updated test program] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19sched: fix inconsistency when redistribute per-cpu tg->cfs_rq sharesKen Chen
Impact: make load-balancing more consistent In the update_shares() path leading to tg_shares_up(), the calculation of per-cpu cfs_rq shares is rather erratic even under moderate task wake up rate. The problem is that the per-cpu tg->cfs_rq load weight used in the sd_rq_weight aggregation and actual redistribution of the cfs_rq->shares are collected at different time. Under moderate system load, we've seen quite a bit of variation on the cfs_rq->shares and ultimately wildly affects sched_entity's load weight. This patch caches the result of initial per-cpu load weight when doing the sum calculation, and then pass it down to update_group_shares_cpu() for redistributing per-cpu cfs_rq shares. This allows consistent total cfs_rq shares across all CPUs. It also simplifies the rounding and zero load weight check. Signed-off-by: Ken Chen <kenchen@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-19profiling: clean up profile_nop()Andrew Morton
Impact: cleanup No point in inlining this. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-19Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/coreIngo Molnar
Conflicts: kernel/trace/ftrace.c [ We conflicted here because we backported a few fixes to tracing/urgent - which has different internal APIs. ]
2008-11-19ftrace: fix selftest lockingIngo Molnar
Impact: fix self-test boot crash Self-test failure forgot to re-lock the BKL - crashing the next initcall: Testing tracer irqsoff: .. no entries found ..FAILED! initcall init_irqsoff_tracer+0x0/0x11 returned 0 after 3906 usecs calling init_mmio_trace+0x0/0xf @ 1 ------------[ cut here ]------------ Kernel BUG at c0c0a915 [verbose debug info unavailable] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC last sysfs file: Pid: 1, comm: swapper Not tainted (2.6.28-rc5-tip #53704) EIP: 0060:[<c0c0a915>] EFLAGS: 00010286 CPU: 1 EIP is at unlock_kernel+0x10/0x2b EAX: ffffffff EBX: 00000000 ECX: 00000000 EDX: f7030000 ESI: c12da19c EDI: 00000000 EBP: f7039f54 ESP: f7039f54 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 1, ti=f7038000 task=f7030000 task.ti=f7038000) Stack: f7039f6c c0164d30 c013fed8 a7d8d7b4 00000000 00000000 f7039f74 c12fb78a f7039fd0 c0101132 c12fb77d 00000000 6f727200 6f632072 2d206564 c1002031 0000000f f7039fa2 f7039fb0 3531b171 00000000 00000000 0000002f c12ca480 Call Trace: [<c0164d30>] ? register_tracer+0x66/0x13f [<c013fed8>] ? ktime_get+0x19/0x1b [<c12fb78a>] ? init_mmio_trace+0xd/0xf [<c0101132>] ? do_one_initcall+0x4a/0x111 [<c12fb77d>] ? init_mmio_trace+0x0/0xf [<c015c7e6>] ? init_irq_proc+0x46/0x59 [<c12e851d>] ? kernel_init+0x104/0x152 [<c12e8419>] ? kernel_init+0x0/0x152 [<c01038b7>] ? kernel_thread_helper+0x7/0x10 Code: 58 14 43 75 0a b8 00 9b 2d c1 e8 51 43 7a ff 64 a1 00 a0 37 c1 89 58 14 5b 5d c3 55 64 8b 15 00 a0 37 c1 83 7a 14 00 89 e5 79 04 <0f> 0b eb fe 8b 42 14 48 85 c0 89 42 14 79 0a b8 00 9b 2d c1 e8 EIP: [<c0c0a915>] unlock_kernel+0x10/0x2b SS:ESP 0068:f7039f54 ---[ end trace a7919e7f17c0a725 ]--- Kernel panic - not syncing: Attempted to kill init! So clean up the flow a bit. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-19Merge branch 'linus' into sched/coreIngo Molnar
Conflicts: kernel/Makefile
2008-11-19Merge branch 'tip/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent
2008-11-19ftrace: fix dyn ftrace filter selectionSteven Rostedt
Impact: clean up and fix for dyn ftrace filter selection The previous logic of the dynamic ftrace selection of enabling or disabling functions was complex and incorrect. This patch simplifies the code and corrects the usage. This simplification also makes the code more robust. Here is the correct logic: Given a function that can be traced by dynamic ftrace: If the function is not to be traced, disable it if it was enabled. (this is if the function is in the set_ftrace_notrace file) (filter is on if there exists any functions in set_ftrace_filter file) If the filter is on, and we are enabling functions: If the function is in set_ftrace_filter, enable it if it is not already enabled. If the function is not in set_ftrace_filter, disable it if it is not already disabled. Otherwise, if the filter is off and we are enabling function tracing: Enable the function if it is not already enabled. Otherwise, if we are disabling function tracing: Disable the function if it is not already disabled. This code now sets or clears the ENABLED flag in the record, and at the end it will enable the function if the flag is set, or disable the function if the flag is cleared. The parameters for the function that does the above logic is also simplified. Instead of passing in confusing "new" and "old" where they might be swapped if the "enabled" flag is not set. The old logic even had one of the above always NULL and had to be filled in. The new logic simply passes in one parameter called "nop". A "call" is calculated in the code, and at the end of the logic, when we know we need to either disable or enable the function, we can then use the "nop" and "call" properly. This code is more robust than the previous version. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-19ftrace: make filtered functions effective on settingSteven Rostedt
Impact: fix filter selection to apply when set It can be confusing when the set_filter_functions is set (or cleared) and the functions being recorded by the dynamic tracer does not match. This patch causes the code to be updated if the function tracer is enabled and the filter is changed. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-19ftrace: fix set_ftrace_filterSteven Rostedt
Impact: fix of output of set_ftrace_filter The commit "ftrace: do not show freed records in available_filter_functions" Removed a bit too much from the set_ftrace_filter code, where we now see all functions in the set_ftrace_filter file even when we set a filter. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18ftrace: preemptoff selftest not workingHeiko Carstens
Impact: fix preemptoff and preemptirqsoff tracer self-tests I was wondering why the preemptoff and preemptirqsoff tracer selftests don't work on s390. After all its just that they get called from non-preemptible context: kernel_init() will execute all initcalls, however the first line in kernel_init() is lock_kernel(), which causes the preempt_count to be increased. Any later calls to add_preempt_count() (especially those from the selftests) will therefore not result in a call to trace_preempt_off() since the check below in add_preempt_count() will be false: if (preempt_count() == val) trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1)); Hence the trace buffer will be empty. Fix this by releasing the BKL during the self-tests. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18trace: introduce missing mutex_unlock()Vegard Nossum
Impact: fix tracing buffer mutex leak in case of allocation failure This error was spotted by this semantic patch: http://www.emn.fr/x-info/coccinelle/mut.html It looks correct as far as I can tell. Please review. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18Merge branch 'linus' into tracing/urgentIngo Molnar