aboutsummaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)Author
2009-08-13net: skb ftracer - add tracepoint to skb_copy_datagram_iovec (v3)Neil Horman
skb allocation / cosumption tracer - Add consumption tracepoint This patch adds a tracepoint to skb_copy_datagram_iovec, which is called each time a userspace process copies a frame from a socket receive queue to a user space buffer. It allows us to hook in and examine each sk_buff that the system receives on a per-socket bases, and can be use to compile a list of which skb's were received by which processes. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> include/trace/events/skb.h | 20 ++++++++++++++++++++ net/core/datagram.c | 3 +++ 2 files changed, 23 insertions(+) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-12Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: arch/microblaze/include/asm/socket.h
2009-08-06net: Avoid enqueuing skb for default qdiscsKrishna Kumar
dev_queue_xmit enqueue's a skb and calls qdisc_run which dequeue's the skb and xmits it. In most cases, the skb that is enqueue'd is the same one that is dequeue'd (unless the queue gets stopped or multiple cpu's write to the same queue and ends in a race with qdisc_run). For default qdiscs, we can remove the redundant enqueue/dequeue and simply xmit the skb since the default qdisc is work-conserving. The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the default fast queue. The controversial part of the patch is incrementing qlen when a skb is requeued - this is to avoid checks like the second line below: + } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) && >> !q->gso_skb && + !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) { Results of a 2 hour testing for multiple netperf sessions (1, 2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are aggregate Mb/s across iterations tested with this version on System-X boxes with Chelsio 10gbps cards: ---------------------------------- Size | ORG BW NEW BW | ---------------------------------- 128K | 156964 159381 | 256K | 158650 162042 | ---------------------------------- Changes from ver1: 1. Move sch_direct_xmit declaration from sch_generic.h to pkt_sched.h 2. Update qdisc basic statistics for direct xmit path. 3. Set qlen to zero in qdisc_reset. 4. Changed some function names to more meaningful ones. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05net: implement a SO_DOMAIN getsockoptionJan Engelhardt
This sockopt goes in line with SO_TYPE and SO_PROTOCOL. It makes it possible for userspace programs to pass around file descriptors — I am referring to arguments-to-functions, but it may even work for the fd passing over UNIX sockets — without needing to also pass the auxiliary information (PF_INET6/IPPROTO_TCP). Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05net: implement a SO_PROTOCOL getsockoptionJan Engelhardt
Similar to SO_TYPE returning the socket type, SO_PROTOCOL allows to retrieve the protocol used with a given socket. I am not quite sure why we have that-many copies of socket.h, and why the values are not the same on all arches either, but for where hex numbers dominate, I use 0x1029 for SO_PROTOCOL as that seems to be the next free unused number across a bunch of operating systems, or so Google results make me want to believe. SO_PROTOCOL for others just uses the next free Linux number, 38. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05net: mark read-only arrays as constJan Engelhardt
String literals are constant, and usually, we can also tag the array of pointers const too, moving it to the .rodata section. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05net: Fix spinlock use in alloc_netdev_mq()Ingo Molnar
-tip testing found this lockdep warning: [ 2.272010] calling net_dev_init+0x0/0x164 @ 1 [ 2.276033] device class 'net': registering [ 2.280191] INFO: trying to register non-static key. [ 2.284005] the code is fine but needs lockdep annotation. [ 2.284005] turning off the locking correctness validator. [ 2.284005] Pid: 1, comm: swapper Not tainted 2.6.31-rc5-tip #1145 [ 2.284005] Call Trace: [ 2.284005] [<7958eb4e>] ? printk+0xf/0x11 [ 2.284005] [<7904f83c>] __lock_acquire+0x11b/0x622 [ 2.284005] [<7908c9b7>] ? alloc_debug_processing+0xf9/0x144 [ 2.284005] [<7904e2be>] ? mark_held_locks+0x3a/0x52 [ 2.284005] [<7908dbc4>] ? kmem_cache_alloc+0xa8/0x13f [ 2.284005] [<7904e475>] ? trace_hardirqs_on_caller+0xa2/0xc3 [ 2.284005] [<7904fdf6>] lock_acquire+0xb3/0xd0 [ 2.284005] [<79489678>] ? alloc_netdev_mq+0xf5/0x1ad [ 2.284005] [<79591514>] _spin_lock_bh+0x2d/0x5d [ 2.284005] [<79489678>] ? alloc_netdev_mq+0xf5/0x1ad [ 2.284005] [<79489678>] alloc_netdev_mq+0xf5/0x1ad [ 2.284005] [<793a38f2>] ? loopback_setup+0x0/0x74 [ 2.284005] [<798eecd0>] loopback_net_init+0x20/0x5d [ 2.284005] [<79483efb>] register_pernet_device+0x23/0x4b [ 2.284005] [<798f5c9f>] net_dev_init+0x115/0x164 [ 2.284005] [<7900104f>] do_one_initcall+0x4a/0x11a [ 2.284005] [<798f5b8a>] ? net_dev_init+0x0/0x164 [ 2.284005] [<79066f6d>] ? register_irq_proc+0x8c/0xa8 [ 2.284005] [<798cc29a>] do_basic_setup+0x42/0x52 [ 2.284005] [<798cc30a>] kernel_init+0x60/0xa1 [ 2.284005] [<798cc2aa>] ? kernel_init+0x0/0xa1 [ 2.284005] [<79003e03>] kernel_thread_helper+0x7/0x10 [ 2.284078] device: 'lo': device_add [ 2.288248] initcall net_dev_init+0x0/0x164 returned 0 after 11718 usecs [ 2.292010] calling neigh_init+0x0/0x66 @ 1 [ 2.296010] initcall neigh_init+0x0/0x66 returned 0 after 0 usecs it's using an zero-initialized spinlock. This is a side-effect of: dev_unicast_init(dev); in alloc_netdev_mq() making use of dev->addr_list_lock. The device has just been allocated freshly, it's not accessible anywhere yet so no locking is needed at all - in fact it's wrong to lock it here (the lock isnt initialized yet). This bug was introduced via: | commit a6ac65db2329e7685299666f5f7b6093c7b0f3a0 | Date: Thu Jul 30 01:06:12 2009 +0000 | | net: restore the original spinlock to protect unicast list Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jiri Pirko <jpirko@redhat.com> Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02neigh: Convert garbage collection from softirq to workqueueEric Dumazet
Current neigh_periodic_timer() function is fired by timer IRQ, and scans one hash bucket each round (very litle work in fact) As we are supposed to scan whole hash table in 15 seconds, this means neigh_periodic_timer() can be fired very often. (depending on the number of concurrent hash entries we stored in this table) Converting this to a workqueue permits scanning whole table, minimizing icache pollution, and firing this work every 15 seconds, independantly of hash table size. This 15 seconds delay is not a hard number, as work is a deferrable one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02net: restore the original spinlock to protect unicast listJiri Pirko
There is a path when an assetion in dev_unicast_sync() appears. igmp6_group_added -> dev_mc_add -> __dev_set_rx_mode -> -> vlan_dev_set_rx_mode -> dev_unicast_sync Therefore we cannot protect this list with rtnl. This patch restores the original protecting this list with spinlock. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02net: net_assign_generic() fixEric Dumazet
memcpy() should take into account size of pointers, not only number of pointers to copy. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-30Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-07-27cfg80211: make aware of net namespacesJohannes Berg
In order to make cfg80211/nl80211 aware of network namespaces, we have to do the following things: * del_virtual_intf method takes an interface index rather than a netdev pointer - simply change this * nl80211 uses init_net a lot, it changes to use the sender's network namespace * scan requests use the interface index, hold a netdev pointer and reference instead * we want a wiphy and its associated virtual interfaces to be in one netns together, so - we need to be able to change ns for a given interface, so export dev_change_net_namespace() - for each virtual interface set the NETIF_F_NETNS_LOCAL flag, and clear that flag only when the wiphy changes ns, to disallow breaking this invariant * when a network namespace goes away, we need to reparent the wiphy to init_net * cfg80211 users that support creating virtual interfaces must create them in the wiphy's namespace, currently this affects only mac80211 The end result is that you can now switch an entire wiphy into a different network namespace with the new command iw phy#<idx> set netns <pid> and all virtual interfaces will follow (or the operation fails). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-27net: ethtool_op_get_rx_csum() should be public and exportedEric Dumazet
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-26ethtool: device independent rx_csum and get_flags routinesSridhar Samudrala
This helps avoid error messages with ethtool -k on devices that don't provide device specific routines. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> ------------------------------------------------------------------ Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-24net: remove unused skb->do_not_encryptJohannes Berg
mac80211 required this due to the master netdev, but now it can put all information into skb->cb and this can go. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-24net: export __dev_addr_sync/__dev_addr_unsyncJohannes Berg
For mac80211, with the master netdev removal, we need to be able to sync a multicast address list onto another list that is not tracked within a netdev, so we need access to the functions doing that. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-23Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwmc3200wifi/netdev.c net/wireless/scan.c
2009-07-20Fix error return for setsockopt(SO_TIMESTAMPING)Rémi Denis-Courmont
I guess it should be -EINVAL rather than EINVAL. I have not checked when the bug came in. Perhaps a candidate for -stable? Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-16Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/orinoco/main.c
2009-07-16net: sock_copy() fixesEric Dumazet
Commit e912b1142be8f1e2c71c71001dc992c6e5eb2ec1 (net: sk_prot_alloc() should not blindly overwrite memory) took care of not zeroing whole new socket at allocation time. sock_copy() is another spot where we should be very careful. We should not set refcnt to a non null value, until we are sure other fields are correctly setup, or a lockless reader could catch this socket by mistake, while not fully (re)initialized. This patch puts sk_node & sk_refcnt to the very beginning of struct sock to ease sock_copy() & sk_prot_alloc() job. We add appropriate smp_wmb() before sk_refcnt initializations to match our RCU requirements (changes to sock keys should be committed to memory before sk_refcnt setting) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-13net: Rename lookup_neigh_params functionTobias Klauser
Rename lookup_neigh_params to lookup_neigh_parms as the struct is named neigh_parms and all other functions dealing with the struct carry neigh_parms in their names. Signed-off-by: Tobias Klauser <klto@zhaw.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12net: move and export get_net_ns_by_pidJohannes Berg
The function get_net_ns_by_pid(), to get a network namespace from a pid_t, will be required in cfg80211 as well. Therefore, let's move it to net_namespace.c and export it. We can't make it a static inline in the !NETNS case because it needs to verify that the given pid even exists (and return -ESRCH). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12net: make namespace iteration possible under RCUJohannes Berg
All we need to take care of is using proper RCU list add/del primitives and inserting a synchronize_rcu() at one place to make sure the exit notifiers are run after everybody has stopped iterating the list. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-11net: sk_prot_alloc() should not blindly overwrite memoryEric Dumazet
Some sockets use SLAB_DESTROY_BY_RCU, and our RCU code correctness depends on sk->sk_nulls_node.next being always valid. A NULL value is not allowed as it might fault a lockless reader. Current sk_prot_alloc() implementation doesnt respect this hypothesis, calling kmem_cache_alloc() with __GFP_ZERO. Just call memset() around the forbidden field. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-09Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2009-07-09net: adding memory barrier to the poll and receive callbacksJiri Olsa
Adding memory barrier after the poll_wait function, paired with receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper to wrap the memory barrier. Without the memory barrier, following race can happen. The race fires, when following code paths meet, and the tp->rcv_nxt and __add_wait_queue updates stay in CPU caches. CPU1 CPU2 sys_select receive packet ... ... __add_wait_queue update tp->rcv_nxt ... ... tp->rcv_nxt check sock_def_readable ... { schedule ... if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) wake_up_interruptible(sk->sk_sleep) ... } If there was no cache the code would work ok, since the wait_queue and rcv_nxt are opposit to each other. Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already passed the tp->rcv_nxt check and sleeps, or will get the new value for tp->rcv_nxt and will return with new data mask. In both cases the process (CPU1) is being added to the wait queue, so the waitqueue_active (CPU2) call cannot miss and will wake up CPU1. The bad case is when the __add_wait_queue changes done by CPU1 stay in its cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then endup calling schedule and sleep forever if there are no more data on the socket. Calls to poll_wait in following modules were ommited: net/bluetooth/af_bluetooth.c net/irda/af_irda.c net/irda/irnet/irnet_ppp.c net/mac80211/rc80211_pid_debugfs.c net/phonet/socket.c net/rds/af_rds.c net/rfkill/core.c net/sunrpc/cache.c net/sunrpc/rpc_pipe.c net/tipc/socket.c Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-08netpoll: Fix carrier detection for drivers that are using phylibAnton Vorontsov
Using early netconsole and gianfar driver this error pops up: netconsole: timeout waiting for carrier It appears that net/core/netpoll.c:netpoll_setup() is using cond_resched() in a loop waiting for a carrier. The thing is that cond_resched() is a no-op when system_state != SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never scheduled, therefore link detection doesn't work. I belive that the main problem is in cond_resched()[1], but despite how the cond_resched() story ends, it might be a good idea to call msleep(1) instead of cond_resched(), as suggested by Andrew Morton. [1] http://lkml.org/lkml/2009/7/7/463 Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-08netpoll: Introduce netpoll_carrier_timeout kernel optionAnton Vorontsov
Some PHYs require longer timeouts for carrier detection, and auto-negotiation process may take indefinite amount of time. It may be inconvenient to force longer timeouts for sane PHYs, so let's introduce a kernel command line option. Since we're using module_param(), the option also can be changed in runtime. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-05net: convert remaining non-symbolic return values in ndo_start_xmit() functionsPatrick McHardy
This patch converts the remaining occurences of raw return values to their symbolic counterparts in ndo_start_xmit() functions that were missed by the previous automatic conversion. Additionally code that assumed the symbolic value of NETDEV_TX_OK to be zero is changed to explicitly use NETDEV_TX_OK. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-26gro: Flush GRO packets in napi_disable_pending pathHerbert Xu
When NAPI is disabled while we're in net_rx_action, we end up calling __napi_complete without flushing GRO packets. This is a bug as it would cause the GRO packets to linger, of course it also literally BUGs to catch error like this :) This patch changes it to napi_complete, with the obligatory IRQ reenabling. This should be safe because we've only just disabled IRQs and it does not materially affect the test conditions in between. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: bnx2: Fix the behavior of ethtool when ONBOOT=no qla3xxx: Don't sleep while holding lock. qla3xxx: Give the PHY time to come out of reset. ipv4 routing: Ensure that route cache entries are usable and reclaimable with caching is off net: Move rx skb_orphan call to where needed ipv6: Use correct data types for ICMPv6 type and code net: let KS8842 driver depend on HAS_IOMEM can: let SJA1000 driver depend on HAS_IOMEM netxen: fix firmware init handshake netxen: fix build with without CONFIG_PM netfilter: xt_rateest: fix comparison with self netfilter: xt_quota: fix incomplete initialization netfilter: nf_log: fix direct userspace memory access in proc handler netfilter: fix some sparse endianess warnings netfilter: nf_conntrack: fix conntrack lookup race netfilter: nf_conntrack: fix confirmation race condition netfilter: nf_conntrack: death_by_timeout() fix
2009-06-23net: Move rx skb_orphan call to where neededHerbert Xu
In order to get the tun driver to account packets, we need to be able to receive packets with destructors set. To be on the safe side, I added an skb_orphan call for all protocols by default since some of them (IP in particular) cannot handle receiving packets destructors properly. Now it seems that at least one protocol (CAN) expects to be able to pass skb->sk through the rx path without getting clobbered. So this patch attempts to fix this properly by moving the skb_orphan call to where it's actually needed. In particular, I've added it to skb_set_owner_[rw] which is what most users of skb->destructor call. This is actually an improvement for tun too since it means that we only give back the amount charged to the socket when the skb is passed to another socket that will also be charged accordingly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Oliver Hartkopp <olver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits) netxen: fix tx ring accounting netxen: fix detection of cut-thru firmware mode forcedeth: fix dma api mismatches atm: sk_wmem_alloc initial value is one net: correct off-by-one write allocations reports via-velocity : fix no link detection on boot Net / e100: Fix suspend of devices that cannot be power managed TI DaVinci EMAC : Fix rmmod error net: group address list and its count ipv4: Fix fib_trie rebalancing, part 2 pkt_sched: Update drops stats in act_police sky2: version 1.23 sky2: add GRO support sky2: skb recycling sky2: reduce default transmit ring sky2: receive counter update sky2: fix shutdown synchronization sky2: PCI irq issues sky2: more receive shutdown sky2: turn off pause during shutdown ... Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck
2009-06-18net: group address list and its countJiri Pirko
This patch is inspired by patch recently posted by Johannes Berg. Basically what my patch does is to group list and a count of addresses into newly introduced structure netdev_hw_addr_list. This brings us two benefits: 1) struct net_device becames a bit nicer. 2) in the future there will be a possibility to operate with lists independently on netdevices (with exporting right functions). I wanted to introduce this patch before I'll post a multicast lists conversion. Signed-off-by: Jiri Pirko <jpirko@redhat.com> drivers/net/bnx2.c | 4 +- drivers/net/e1000/e1000_main.c | 4 +- drivers/net/ixgbe/ixgbe_main.c | 6 +- drivers/net/mv643xx_eth.c | 2 +- drivers/net/niu.c | 4 +- drivers/net/virtio_net.c | 10 ++-- drivers/s390/net/qeth_l2_main.c | 2 +- include/linux/netdevice.h | 17 +++-- net/core/dev.c | 130 ++++++++++++++++++-------------------- 9 files changed, 89 insertions(+), 90 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17skbuff: don't corrupt mac_header on skb expansionStephen Hemminger
The skb mac_header field is sometimes NULL (or ~0u) as a sentinel value. The places where skb is expanded add an offset which would change this flag into an invalid pointer (or offset). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17skbuff: skb_mac_header_was_set is always true on >32 bitStephen Hemminger
Looking at the crash in log_martians(), one suspect is that the check for mac header being set is not correct. The value of mac_header defaults to 0 on allocation, therefore skb_mac_header_was_set will always be true on platforms using NET_SKBUFF_USES_OFFSET. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-16Merge branch 'for-linus2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits) signal: fix __send_signal() false positive kmemcheck warning fs: fix do_mount_root() false positive kmemcheck warning fs: introduce __getname_gfp() trace: annotate bitfields in struct ring_buffer_event net: annotate struct sock bitfield c2port: annotate bitfield for kmemcheck net: annotate inet_timewait_sock bitfields ieee1394/csr1212: fix false positive kmemcheck report ieee1394: annotate bitfield net: annotate bitfields in struct inet_sock net: use kmemcheck bitfields API for skbuff kmemcheck: introduce bitfield API kmemcheck: add opcode self-testing at boot x86: unify pte_hidden x86: make _PAGE_HIDDEN conditional kmemcheck: make kconfig accessible for other architectures kmemcheck: enable in the x86 Kconfig kmemcheck: add hooks for the page allocator kmemcheck: add hooks for page- and sg-dma-mappings kmemcheck: don't track page tables ...
2009-06-15net: annotate struct sock bitfieldVegard Nossum
2009/2/24 Ingo Molnar <mingo@elte.hu>: > ok, this is the last warning i have from today's overnight -tip > testruns - a 32-bit system warning in sock_init_data(): > > [ 2.610389] NET: Registered protocol family 16 > [ 2.616138] initcall netlink_proto_init+0x0/0x170 returned 0 after 7812 usecs > [ 2.620010] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f642c184) > [ 2.624002] 010000000200000000000000604990c000000000000000000000000000000000 > [ 2.634076] i i i i i i u u i i i i i i i i i i i i i i i i i i i i i i i i > [ 2.641038] ^ > [ 2.643376] > [ 2.644004] Pid: 1, comm: swapper Not tainted (2.6.29-rc6-tip-01751-g4d1c22c-dirty #885) > [ 2.648003] EIP: 0060:[<c07141a1>] EFLAGS: 00010282 CPU: 0 > [ 2.652008] EIP is at sock_init_data+0xa1/0x190 > [ 2.656003] EAX: 0001a800 EBX: f6836c00 ECX: 00463000 EDX: c0e46fe0 > [ 2.660003] ESI: f642c180 EDI: c0b83088 EBP: f6863ed8 ESP: c0c412ec > [ 2.664003] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > [ 2.668003] CR0: 8005003b CR2: f682c400 CR3: 00b91000 CR4: 000006f0 > [ 2.672003] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > [ 2.676003] DR6: ffff4ff0 DR7: 00000400 > [ 2.680002] [<c07423e5>] __netlink_create+0x35/0xa0 > [ 2.684002] [<c07443cc>] netlink_kernel_create+0x4c/0x140 > [ 2.688002] [<c072755e>] rtnetlink_net_init+0x1e/0x40 > [ 2.696002] [<c071b601>] register_pernet_operations+0x11/0x30 > [ 2.700002] [<c071b72c>] register_pernet_subsys+0x1c/0x30 > [ 2.704002] [<c0bf3c8c>] rtnetlink_init+0x4c/0x100 > [ 2.708002] [<c0bf4669>] netlink_proto_init+0x159/0x170 > [ 2.712002] [<c0101124>] do_one_initcall+0x24/0x150 > [ 2.716002] [<c0bbf3c7>] do_initcalls+0x27/0x40 > [ 2.723201] [<c0bbf3fc>] do_basic_setup+0x1c/0x20 > [ 2.728002] [<c0bbfb8a>] kernel_init+0x5a/0xa0 > [ 2.732002] [<c0103e47>] kernel_thread_helper+0x7/0x10 > [ 2.736002] [<ffffffff>] 0xffffffff We fix this false positive by annotating the bitfield in struct sock. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15net: use kmemcheck bitfields API for skbuffVegard Nossum
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c
2009-06-11bridge: Simplify interface for ATM LANEMichał Mirosław
This patch changes FDB entry check for ATM LANE bridge integration. There's no point in holding a FDB entry around SKB building. br_fdb_get()/br_fdb_put() pair are changed into single br_fdb_test_addr() hook that checks if the addr has FDB entry pointing to other port to the one the request arrived on. FDB entry refcounting is removed as it's not used anywhere else. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11[PATCH] net core: Some interface flags not returned by SIOCGIFFLAGSJohn Dykstra
Commit b00055aacdb172c05067612278ba27265fcd05ce " [NET] core: add RFC2863 operstate" defined new interface flag values. Its documentation specified that these flags could be accessed from user space via SIOCGIFFLAGS. However, this does not work because the new flags do not fit in that ioctl's argument width. Change the documentation to match the code's behavior. Also change the source to explicitly show the truncation. This _should_ have no effect on executable code, and did not with gcc 4.2.4 generating x86 code. A new ioctl could be defined to return all interface flags to user space. However, since this has been broken for three years with no one complaining, there doesn't seem much need. They are still accessible via netlink. Reported-by: "Fredrik Arnerup" <fredrik.arnerup@edgeware.tv> Signed-off-by: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11neigh: fix state transition INCOMPLETE->FAILED via Netlink requestTimo Teras
The current code errors out the INCOMPLETE neigh entry skb queue only from the timer if maximum probes have been attempted and there has been no reply. This also causes the transtion to FAILED state. However, the neigh entry can be also updated via Netlink to inform that the address is unavailable. Currently, neigh_update() just stops the timers and leaves the pending skb's unreleased. This results that the clean up code in the timer callback is never called, preventing also proper garbage collection. This fixes neigh_update() to process the pending skb queue immediately if INCOMPLETE -> FAILED state transtion occurs due to a Netlink request. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11net: No more expensive sock_hold()/sock_put() on each txEric Dumazet
One of the problem with sock memory accounting is it uses a pair of sock_hold()/sock_put() for each transmitted packet. This slows down bidirectional flows because the receive path also needs to take a refcount on socket and might use a different cpu than transmit path or transmit completion path. So these two atomic operations also trigger cache line bounces. We can see this in tx or tx/rx workloads (media gateways for example), where sock_wfree() can be in top five functions in profiles. We use this sock_hold()/sock_put() so that sock freeing is delayed until all tx packets are completed. As we also update sk_wmem_alloc, we could offset sk_wmem_alloc by one unit at init time, until sk_free() is called. Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc) to decrement initial offset and atomicaly check if any packets are in flight. skb_set_owner_w() doesnt call sock_hold() anymore sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc reached 0 to perform the final freeing. Drawback is that a skb->truesize error could lead to unfreeable sockets, or even worse, prematurely calling __sk_free() on a live socket. Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt contention point. 5 % speedup on a UDP transmit workload (depends on number of flows), lowering TX completion cpu usage. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-10Merge branch 'tracing-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...
2009-06-10mac80211: do not pass PS frames out of mac80211 againJohannes Berg
In order to handle powersave frames properly we had needed to pass these out to the device queues again, and introduce the skb->requeue bit. This, however, also has unnecessary overhead by needing to 'clean up' already tried frames, and this clean-up code is also buggy when software encryption is used. Instead of sending the frames via the master netdev queue again, simply put them into the pending queue. This also fixes a problem where frames for that particular station could be reordered when some were still on the software queues and older ones are re-injected into the software queue after them. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-09Add constants for the ieee 802.15.4 stackSergey Lapin
IEEE 802.15.4 stack requires several constants to be defined/adjusted. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Sergey Lapin <slapin@ossfans.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09net: dev_addr_init() fixEric Dumazet
commit f001fde5eadd915f4858d22ed70d7040f48767cf (net: introduce a list of device addresses dev_addr_list (v6)) added one regression Vegard Nossum found in its testings. With kmemcheck help, Vegard found some uninitialized memory was read and reported to user, potentialy leaking kernel data. ( thread can be found on http://lkml.org/lkml/2009/5/30/177 ) dev_addr_init() incorrectly uses sizeof() operator. We were initializing one byte instead of MAX_ADDR_LEN bytes. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09net/core/user_dma.c: Use frag list abstraction interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09net/core/skbuff.c: Use frag list abstraction interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>