diff options
author | Eric Dumazet <dada1@cosmosbay.com> | 2006-01-17 02:54:36 -0800 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2006-01-17 02:54:36 -0800 |
commit | 2f970d83576cf4938fd75551c465050f6a16c33c (patch) | |
tree | 7bb43aabfbd8afdab49549e4d56c0a15015d5995 /net/sched | |
parent | 8243126c5e29030bf1a3fb75187a513966dcba62 (diff) |
[IPV4]: rt_cache_stat can be statically defined
Using __get_cpu_var(obj) is slightly faster than per_cpu_ptr(obj,
raw_smp_processor_id()).
1) Smaller code and memory use
For static and small objects, DEFINE_PER_CPU(type, object) is preferred over a
alloc_percpu() : Better and smaller code to access them, and no extra memory
(storing the pointer, and the percpu array of pointers)
x86_64 code before patch
mov 1237577(%rip),%rax # ffffffff803e5990 <rt_cache_stat>
not %rax # part of per_cpu machinery
mov %gs:0x3c,%edx # get cpu number
movslq %edx,%rdx # extend 32 bits cpu number to 64 bits
mov (%rax,%rdx,8),%rax # get the pointer for this cpu
incl 0x38(%rax)
x86_64 code after patch
mov $per_cpu__rt_cache_stat,%rdx
mov %gs:0x48,%rax # get percpu data offset
incl 0x38(%rax,%rdx,1)
2) False sharing avoidance for SMP :
For a small NR_CPUS, the array of per cpu pointers allocated in alloc_percpu()
can be <= 32 bytes. This let slab code gives a part of a cache line. If the
other part of this 64 bytes (or 128 bytes) cache line is used by a mostly
written object, we can have false sharing and expensive per_cpu_ptr() operations.
Size of rt_cache_stat is 64 bytes, so this patch is not a danger of a too big
increase of bss (in UP mode) or static per_cpu data for SMP
(PERCPU_ENOUGH_ROOM is currently 32768 bytes)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/sched')
0 files changed, 0 insertions, 0 deletions