From 84c6f88fc8265d7a712d7d6ed8fc1a878dfc84d1 Mon Sep 17 00:00:00 2001 From: Hitoshi Mitake Date: Thu, 4 Feb 2010 16:08:15 +0900 Subject: perf lock: Fix and add misc documentally things I've forgot to add 'perf lock' line to command-list.txt, so users of perf could not find perf lock when they type 'perf'. Fixing command-list.txt requires document (tools/perf/Documentation/perf-lock.txt). But perf lock is too much "under construction" to write a stable document, so this is something like pseudo document for now. And I wrote description of perf lock at help section of CONFIG_LOCK_STAT, this will navigate users of lock trace events. Signed-off-by: Hitoshi Mitake Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo LKML-Reference: <1265267295-8388-1-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Frederic Weisbecker --- tools/perf/Documentation/perf-lock.txt | 29 +++++++++++++++++++++++++++++ tools/perf/command-list.txt | 1 + 2 files changed, 30 insertions(+) create mode 100644 tools/perf/Documentation/perf-lock.txt (limited to 'tools') diff --git a/tools/perf/Documentation/perf-lock.txt b/tools/perf/Documentation/perf-lock.txt new file mode 100644 index 00000000000..b317102138c --- /dev/null +++ b/tools/perf/Documentation/perf-lock.txt @@ -0,0 +1,29 @@ +perf-lock(1) +============ + +NAME +---- +perf-lock - Analyze lock events + +SYNOPSIS +-------- +[verse] +'perf lock' {record|report|trace} + +DESCRIPTION +----------- +You can analyze various lock behaviours +and statistics with this 'perf lock' command. + + 'perf lock record ' records lock events + between start and end . And this command + produces the file "perf.data" which contains tracing + results of lock events. + + 'perf lock trace' shows raw lock events. + + 'perf lock report' reports statistical data. + +SEE ALSO +-------- +linkperf:perf[1] diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt index 9afcff2e3ae..db6ee94d4a8 100644 --- a/tools/perf/command-list.txt +++ b/tools/perf/command-list.txt @@ -18,3 +18,4 @@ perf-top mainporcelain common perf-trace mainporcelain common perf-probe mainporcelain common perf-kmem mainporcelain common +perf-lock mainporcelain common -- cgit v1.2.3 From b67577dfb45580c498bfdb1bc76c00c3b2ad6310 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 3 Feb 2010 09:09:33 +0100 Subject: perf lock: Drop the buffers multiplexing dependency We need to deal with time ordered events to build a correct state machine of lock events. This is why we multiplex the lock events buffers. But the ordering is done from the kernel, on the tracing fast path, leading to high contention between cpus. Without multiplexing, the events appears in a weak order. If we have four events, each split per cpu, perf record will read the events buffers in the following order: [ CPU0 ev0, CPU0 ev1, CPU0 ev3, CPU0 ev4, CPU1 ev0, CPU1 ev0....] To handle a post processing reordering, we could just read and sort the whole in memory, but it just doesn't scale with high amounts of events: lock events can fill huge amounts in few times. Basically we need to sort in memory and find a "grace period" point when we know that a given slice of previously sorted events can be committed for post-processing, so that we can unload the memory usage step by step and keep a scalable sorting list. There is no strong rules about how to define such "grace period". What does this patch is: We define a FLUSH_PERIOD value that defines a grace period in seconds. We want to have a slice of events covering 2 * FLUSH_PERIOD in our sorted list. If FLUSH_PERIOD is big enough, it ensures every events that occured in the first half of the timeslice have all been buffered and there are none remaining and there won't be further to put inside this first timeslice. Then once we reach the 2 * FLUSH_PERIOD timeslice, we flush the first half to be gentle with the memory (the second half can still get new events in the middle, so wait another period to flush it) FLUSH_PERIOD is defined to 5 seconds. Say the first event started on time t0. We can safely assume that at the time we are processing events of t0 + 10 seconds, ther won't be anymore events to read from perf.data that occured between t0 and t0 + 5 seconds. Hence we can safely flush the first half. To point out funky bugs, we have a guardian that checks a new event timestamp is not below the last event's timestamp flushed and that displays a warning in this case. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Steven Rostedt Cc: Paul Mackerras Cc: Hitoshi Mitake Cc: Li Zefan Cc: Lai Jiangshan Cc: Masami Hiramatsu Cc: Jens Axboe --- tools/perf/builtin-lock.c | 148 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 146 insertions(+), 2 deletions(-) (limited to 'tools') diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c index fb9ab2ad3f9..e12c844df1e 100644 --- a/tools/perf/builtin-lock.c +++ b/tools/perf/builtin-lock.c @@ -460,6 +460,150 @@ process_raw_event(void *data, int cpu, process_lock_release_event(data, event, cpu, timestamp, thread); } +struct raw_event_queue { + u64 timestamp; + int cpu; + void *data; + struct thread *thread; + struct list_head list; +}; + +static LIST_HEAD(raw_event_head); + +#define FLUSH_PERIOD (5 * NSEC_PER_SEC) + +static u64 flush_limit = ULLONG_MAX; +static u64 last_flush = 0; +struct raw_event_queue *last_inserted; + +static void flush_raw_event_queue(u64 limit) +{ + struct raw_event_queue *tmp, *iter; + + list_for_each_entry_safe(iter, tmp, &raw_event_head, list) { + if (iter->timestamp > limit) + return; + + if (iter == last_inserted) + last_inserted = NULL; + + process_raw_event(iter->data, iter->cpu, iter->timestamp, + iter->thread); + + last_flush = iter->timestamp; + list_del(&iter->list); + free(iter->data); + free(iter); + } +} + +static void __queue_raw_event_end(struct raw_event_queue *new) +{ + struct raw_event_queue *iter; + + list_for_each_entry_reverse(iter, &raw_event_head, list) { + if (iter->timestamp < new->timestamp) { + list_add(&new->list, &iter->list); + return; + } + } + + list_add(&new->list, &raw_event_head); +} + +static void __queue_raw_event_before(struct raw_event_queue *new, + struct raw_event_queue *iter) +{ + list_for_each_entry_continue_reverse(iter, &raw_event_head, list) { + if (iter->timestamp < new->timestamp) { + list_add(&new->list, &iter->list); + return; + } + } + + list_add(&new->list, &raw_event_head); +} + +static void __queue_raw_event_after(struct raw_event_queue *new, + struct raw_event_queue *iter) +{ + list_for_each_entry_continue(iter, &raw_event_head, list) { + if (iter->timestamp > new->timestamp) { + list_add_tail(&new->list, &iter->list); + return; + } + } + list_add_tail(&new->list, &raw_event_head); +} + +/* The queue is ordered by time */ +static void __queue_raw_event(struct raw_event_queue *new) +{ + if (!last_inserted) { + __queue_raw_event_end(new); + return; + } + + /* + * Most of the time the current event has a timestamp + * very close to the last event inserted, unless we just switched + * to another event buffer. Having a sorting based on a list and + * on the last inserted event that is close to the current one is + * probably more efficient than an rbtree based sorting. + */ + if (last_inserted->timestamp >= new->timestamp) + __queue_raw_event_before(new, last_inserted); + else + __queue_raw_event_after(new, last_inserted); +} + +static void queue_raw_event(void *data, int raw_size, int cpu, + u64 timestamp, struct thread *thread) +{ + struct raw_event_queue *new; + + if (flush_limit == ULLONG_MAX) + flush_limit = timestamp + FLUSH_PERIOD; + + if (timestamp < last_flush) { + printf("Warning: Timestamp below last timeslice flush\n"); + return; + } + + new = malloc(sizeof(*new)); + if (!new) + die("Not enough memory\n"); + + new->timestamp = timestamp; + new->cpu = cpu; + new->thread = thread; + + new->data = malloc(raw_size); + if (!new->data) + die("Not enough memory\n"); + + memcpy(new->data, data, raw_size); + + __queue_raw_event(new); + last_inserted = new; + + /* + * We want to have a slice of events covering 2 * FLUSH_PERIOD + * If FLUSH_PERIOD is big enough, it ensures every events that occured + * in the first half of the timeslice have all been buffered and there + * are none remaining (we need that because of the weakly ordered + * event recording we have). Then once we reach the 2 * FLUSH_PERIOD + * timeslice, we flush the first half to be gentle with the memory + * (the second half can still get new events in the middle, so wait + * another period to flush it) + */ + if (new->timestamp > flush_limit && + new->timestamp - flush_limit > FLUSH_PERIOD) { + flush_limit += FLUSH_PERIOD; + flush_raw_event_queue(flush_limit); + } +} + static int process_sample_event(event_t *event, struct perf_session *session) { struct thread *thread; @@ -480,7 +624,7 @@ static int process_sample_event(event_t *event, struct perf_session *session) if (profile_cpu != -1 && profile_cpu != (int) data.cpu) return 0; - process_raw_event(data.raw_data, data.cpu, data.time, thread); + queue_raw_event(data.raw_data, data.raw_size, data.cpu, data.time, thread); return 0; } @@ -576,6 +720,7 @@ static void __cmd_report(void) setup_pager(); select_key(); read_events(); + flush_raw_event_queue(ULLONG_MAX); sort_result(); print_result(); } @@ -608,7 +753,6 @@ static const char *record_args[] = { "record", "-a", "-R", - "-M", "-f", "-m", "1024", "-c", "1", -- cgit v1.2.3 From 6630125419ef37ff8781713c5e9d416f2a4ba357 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 2 Mar 2010 15:25:38 -0300 Subject: perf archive: Don't try to collect files without a build-id MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To avoid these error: [root@doppio ~]# perf archive tar: .build-id/00/00000000000000000000000000000000000000: Cannot stat: No such file or directory tar: .build-id/00/00000000000000000000000000000000000000: Cannot stat: No such file or directory tar: .build-id/00/00000000000000000000000000000000000000: Cannot stat: No such file or directory tar: .build-id/00/00000000000000000000000000000000000000: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors [root@doppio ~]# More work is needed to support archiving symtabs for binaries without a build-id, perhaps creating a perf.data UUID + adding build-ids for the binaries copied into the cache and then have this perf.data session UUID be a directory with symlinks to the by now calculated build-id of the files inside it. Or just do an extra pass and insert the calculated build-ids in the perf.data header. Reported-by: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Ingo Molnar --- tools/perf/perf-archive.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'tools') diff --git a/tools/perf/perf-archive.sh b/tools/perf/perf-archive.sh index 45fbe2f07b1..910468e6e01 100644 --- a/tools/perf/perf-archive.sh +++ b/tools/perf/perf-archive.sh @@ -9,8 +9,9 @@ fi DEBUGDIR=~/.debug/ BUILDIDS=$(mktemp /tmp/perf-archive-buildids.XXXXXX) +NOBUILDID=0000000000000000000000000000000000000000 -perf buildid-list -i $PERF_DATA --with-hits > $BUILDIDS +perf buildid-list -i $PERF_DATA --with-hits | grep -v "^$NOBUILDID " > $BUILDIDS if [ ! -s $BUILDIDS ] ; then echo "perf archive: no build-ids found" rm -f $BUILDIDS -- cgit v1.2.3 From da7196e1f986c846ffa8b2ec385223fad38e8518 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 3 Mar 2010 11:47:58 +0000 Subject: perf, ARM: Modify kuser rmb() call to compile for Thumb-2 The Thumb-2 instruction set does not provide an encoding for sub pc, r0, #95 as present in the rmb() definition used by perf. This results in compilation failure when using a compiler targetting an instruction set other than ARM. This patch redefines rmb() for ARM by casting the address of the kuser helper to a function pointer, therefore getting the compiler to take care of making the call. Patch taken against tip/master. Signed-off-by: Will Deacon Cc: Russell King - ARM Linux Cc: Jamie Iles Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker LKML-Reference: <1267616878-2154-1-git-send-email-will.deacon@arm.com> Signed-off-by: Ingo Molnar --- tools/perf/perf.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'tools') diff --git a/tools/perf/perf.h b/tools/perf/perf.h index 75f941bfba9..6fb379bc1d1 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -65,9 +65,7 @@ * Use the __kuser_memory_barrier helper in the CPU helper page. See * arch/arm/kernel/entry-armv.S in the kernel source for details. */ -#define rmb() asm volatile("mov r0, #0xffff0fff; mov lr, pc;" \ - "sub pc, r0, #95" ::: "r0", "lr", "cc", \ - "memory") +#define rmb() ((void(*)(void))0xffff0fa0)() #define cpu_relax() asm volatile("":::"memory") #endif -- cgit v1.2.3 From 10c95f4f41889daaa8130e0bd12209825dbe8d39 Mon Sep 17 00:00:00 2001 From: Tom Zanussi Date: Wed, 3 Mar 2010 01:04:32 -0600 Subject: perf trace/scripting: Remove extraneous header read perf_header__read() is already done in perf_session__open(), so remove it from the script gen case. Signed-off-by: Tom Zanussi Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org LKML-Reference: <1267599873-8193-2-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar --- tools/perf/builtin-trace.c | 1 - 1 file changed, 1 deletion(-) (limited to 'tools') diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index 5db687fc13d..b8153db5a9e 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -608,7 +608,6 @@ int cmd_trace(int argc, const char **argv, const char *prefix __used) return -1; } - perf_header__read(&session->header, input); err = scripting_ops->generate_script("perf-trace"); goto out; } -- cgit v1.2.3 From cf4fee50282312528e1f8adf73b1831d1d6ae389 Mon Sep 17 00:00:00 2001 From: Tom Zanussi Date: Wed, 3 Mar 2010 01:04:33 -0600 Subject: perf trace: Don't use pager if scripting It's useful for paging through raw traces, but just gets in the way when scripting. Signed-off-by: Tom Zanussi Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org LKML-Reference: <1267599873-8193-3-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar --- tools/perf/builtin-trace.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'tools') diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index b8153db5a9e..407041d20de 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -573,7 +573,8 @@ int cmd_trace(int argc, const char **argv, const char *prefix __used) if (symbol__init() < 0) return -1; - setup_pager(); + if (!script_name) + setup_pager(); session = perf_session__new(input_name, O_RDONLY, 0); if (session == NULL) -- cgit v1.2.3