aboutsummaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2009-06-11Merge branch 'master' of ↵Patrick McHardy
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
2009-06-11net: No more expensive sock_hold()/sock_put() on each txEric Dumazet
One of the problem with sock memory accounting is it uses a pair of sock_hold()/sock_put() for each transmitted packet. This slows down bidirectional flows because the receive path also needs to take a refcount on socket and might use a different cpu than transmit path or transmit completion path. So these two atomic operations also trigger cache line bounces. We can see this in tx or tx/rx workloads (media gateways for example), where sock_wfree() can be in top five functions in profiles. We use this sock_hold()/sock_put() so that sock freeing is delayed until all tx packets are completed. As we also update sk_wmem_alloc, we could offset sk_wmem_alloc by one unit at init time, until sk_free() is called. Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc) to decrement initial offset and atomicaly check if any packets are in flight. skb_set_owner_w() doesnt call sock_hold() anymore sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc reached 0 to perform the final freeing. Drawback is that a skb->truesize error could lead to unfreeable sockets, or even worse, prematurely calling __sk_free() on a live socket. Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt contention point. 5 % speedup on a UDP transmit workload (depends on number of flows), lowering TX completion cpu usage. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09netfilter: Use frag list abstraction interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09ipv6: Use frag list abstraction interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-08netfilter: nf_ct_icmp: keep the ICMP ct entries longerJan Kasprzak
Current conntrack code kills the ICMP conntrack entry as soon as the first reply is received. This is incorrect, as we then see only the first ICMP echo reply out of several possible duplicates as ESTABLISHED, while the rest will be INVALID. Also this unnecessarily increases the conntrackd traffic on H-A firewalls. Make all the ICMP conntrack entries (including the replied ones) last for the default of nf_conntrack_icmp{,v6}_timeout seconds. Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-04netfilter: x_tables: added hook number into match extension parameter structure.Evgeniy Polyakov
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-03net: skb->dst accessorsEric Dumazet
Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-02netfilter: conntrack: simplify event caching systemPablo Neira Ayuso
This patch simplifies the conntrack event caching system by removing several events: * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted since the have no clients. * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter days. * IPCT_REFRESH which is not of any use since we always include the timeout in the messages. After this patch, the existing events are: * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify addition and deletion of entries. * IPCT_STATUS, that notes that the status bits have changes, eg. IPS_SEEN_REPLY and IPS_ASSURED. * IPCT_PROTOINFO, that reports that internal protocol information has changed, eg. the TCP, DCCP and SCTP protocol state. * IPCT_HELPER, that a helper has been assigned or unassigned to this entry. * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this covers the case when a mark is set to zero. * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence adjustment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02Merge branch 'master' of git://dev.medozas.de/linuxPatrick McHardy
2009-06-02IPv6: Print error value when skb allocation failsBrian Haley
Print-out the error value when sock_alloc_send_skb() fails in the IPv6 neighbor discovery code - can be useful for debugging. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-01IPv6: Add 'autoconf' and 'disable_ipv6' module parametersBrian Haley
Add 'autoconf' and 'disable_ipv6' parameters to the IPv6 module. The first controls if IPv6 addresses are autoconfigured from prefixes received in Router Advertisements. The IPv6 loopback (::1) and link-local addresses are still configured. The second controls if IPv6 addresses are desired at all. No IPv6 addresses will be added to any interfaces. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Avoid unnecessary comparison after skb_gro_headerHerbert Xu
For the overwhelming majority of cases, skb_gro_header's return value cannot be NULL. Yet we must check it because of its current form. This patch splits it up into multiple functions in order to avoid this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath5k/phy.c drivers/net/wireless/iwlwifi/iwl-agn.c drivers/net/wireless/iwlwifi/iwl3945-base.c
2009-05-22tcp: Unexport TCPv6 GRO functionsHerbert Xu
Sinec the TCPv6 GRO functions are used in the same file where they are defined, we do not need to export them. This was a cut-n-paste from the IPv4 code which does need to export them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20IPv6: set RTPROT_KERNEL to initial routeJean-Mickael Guerin
The use of unspecified protocol in IPv6 initial route prevents quagga to install IPv6 default route: # show ipv6 route S ::/0 [1/0] via fe80::1, eth1_0 K>* ::/0 is directly connected, lo, rej C>* ::1/128 is directly connected, lo C>* fe80::/64 is directly connected, eth1_0 # ip -6 route fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit -1 ff00::/8 dev eth1_0 metric 256 mtu 1500 advmss 1440 hoplimit -1 unreachable default dev lo proto none metric -1 error -101 hoplimit 255 The attached patch ensures RTPROT_KERNEL to the default initial route and fixes the problem for quagga. This is similar to "ipv6: protocol for address routes" f410a1fba7afa79d2992620e874a343fdba28332. # show ipv6 route S>* ::/0 [1/0] via fe80::1, eth1_0 C>* ::1/128 is directly connected, lo C>* fe80::/64 is directly connected, eth1_0 # ip -6 route fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit -1 fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit -1 ff00::/8 dev eth1_0 metric 256 mtu 1500 advmss 1440 hoplimit -1 default via fe80::1 dev eth1_0 proto zebra metric 1024 mtu 1500 advmss 1440 hoplimit -1 unreachable default dev lo proto kernel metric -1 error -101 hoplimit 255 Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20net: Remove unused parameter from fill method in fib_rules_ops.Rami Rosen
The netlink message header (struct nlmsghdr) is an unused parameter in fill method of fib_rules_ops struct. This patch removes this parameter from this method and fixes the places where this method is called. (include/net/fib_rules.h) Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19sit: stateless autoconf for isatapSascha Hlusiak
be sent periodically. The rs_delay can be speficied when adding the PRL entry and defaults to 15 minutes. The RS is sent from every link local adress that's assigned to the tunnel interface. It's directed to the (guessed) linklocal address of the router and is sent through the tunnel. Better: send to ff02::2 encapsuled in unicast directed to router-v4. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19addrconf: refuse isatap eui64 for INADDR_ANYSascha Hlusiak
A tunnel with no local ipv4 endpoint would otherwise use the ISATAP linklocal address fe80::5efe:0:0, which is invalid. Rather not add a linklocal address at all. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19sit: ipip6_tunnel_del_prl: return errSascha Hlusiak
Typo. When deleting a PRL entry, return status to userspace instead of success. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19sit: strictly restrict incoming traffic to tunnel link deviceSascha Hlusiak
Check link device when looking up a tunnel. When a tunnel is linked to a interface, traffic from a different interface must not reach the tunnel. This also allows creating of multiple tunnels with the same endpoints, if the link device differs. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19sit: Fail to create tunnel, if it already existsSascha Hlusiak
When locating the tunnel, do not continue if it is found. Otherwise a different tunnel with similar configuration would be returned and parts could be overwritten. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18net: FIX ipv6_forward sysctl restartEric W. Biederman
Just returning -ERESTARTSYS without a signal pending is not good that will just leak it to userspace. We need return -ERESTARTNOINTR so we always restart and set signal pending so that we fall of the fast path of syscall return and setup the system call restart. So use restart_syscall() which does all of this for us. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17net: remove needless (now buggy) & from dev->dev_addrJiri Pirko
Patch fixes issues with dev->dev_addr changing from array to pointer. Hopefully there are no others. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17ipv4: remove an unused parameter from configure method of fib_rules_ops.Rami Rosen
Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-08Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: include/net/tcp.h
2009-05-08netfilter: xtables: consolidate comefrom debug cast accessJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: remove another level of indentJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: remove some gotoJan Engelhardt
Combining two ifs, and goto is easily gone. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: reduce indent level by oneJan Engelhardt
Cosmetic only. Transformation applied: -if (foo) { long block; } else { short block; } +if (!foo) { short block; continue; } long block; Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: consolidate open-coded logicJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: fix const inconsistencyJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: remove redundant castsJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: use NFPROTO_ in standard targetsJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: queue: use NFPROTO_ for queue callsitesJan Engelhardt
af is an nfproto. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: use NFPROTO_ for xt_proto_init callsitesJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-05Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
2009-05-05netfilter: ip6t_ipv6header: fix match on packets ending with NEXTHDR_NONEChristoph Paasch
As packets ending with NEXTHDR_NONE don't have a last extension header, the check for the length needs to be after the check for NEXTHDR_NONE. Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-04-29Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/isdn/00-INDEX drivers/net/wireless/iwlwifi/iwl-scan.c drivers/net/wireless/rndis_wlan.c net/mac80211/main.c
2009-04-28netfilter: revised locking for x_tablesStephen Hemminger
The x_tables are organized with a table structure and a per-cpu copies of the counters and rules. On older kernels there was a reader/writer lock per table which was a performance bottleneck. In 2.6.30-rc, this was converted to use RCU and the counters/rules which solved the performance problems for do_table but made replacing rules much slower because of the necessary RCU grace period. This version uses a per-cpu set of spinlocks and counters to allow to table processing to proceed without the cache thrashing of a global reader lock and keeps the same performance for table updates. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27gro: Fix COMPLETE checksum handlingHerbert Xu
On a brand new GRO skb, we cannot call ip_hdr since the header may lie in the non-linear area. This patch adds the helper skb_gro_network_header to handle this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27snmp: add missing counters for RFC 4293Neil Horman
The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and OutMcastOctets: http://tools.ietf.org/html/rfc4293 But it seems we don't track those in any way that easy to separate from other protocols. This patch adds those missing counters to the stats file. Tested successfully by me With help from Eric Dumazet. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-20syncookies: remove last_synq_overflow from struct tcp_sockFlorian Westphal
last_synq_overflow eats 4 or 8 bytes in struct tcp_sock, even though it is only used when a listening sockets syn queue is full. We can (ab)use rx_opt.ts_recent_stamp to store the same information; it is not used otherwise as long as a socket is in listen state. Move linger2 around to avoid splitting struct mtu_probe across cacheline boundary on 32 bit arches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-14ipv6:remove useless checkYang Hongyang
After switch (rthdr->type) {...},the check below is completely useless.Because: if the type is 2,then hdrlen must be 2 and segments_left must be 1,clearly the check is redundant;if the type is not 2,then goto sticky_done,the check is useless too. Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-11ipv6: Fix NULL pointer dereference with time-wait socketsVlad Yasevich
Commit b2f5e7cd3dee2ed721bf0675e1a1ddebb849aee6 (ipv6: Fix conflict resolutions during ipv6 binding) introduced a regression where time-wait sockets were not treated correctly. This resulted in the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000062 IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70 ... Call Trace: [<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6] [<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6] [<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400 [<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6] [<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0 [<ffffffff8056ed49>] sys_bind+0x89/0x100 [<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b Tested-by: Brian Haley <brian.haley@hp.com> Tested-by: Ed Tomlinson <edt@aei.ca> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-08Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
2009-04-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: b44: Use kernel DMA addresses for the kernel DMA API forcedeth: Fix resume from hibernation regression. xfrm: fix fragmentation on inter family tunnels ibm_newemac: Fix dangerous struct assumption gigaset: documentation update gigaset: in file ops, check for device disconnect before anything else bas_gigaset: use tasklet_hi_schedule for timing critical tasklets net/802/fddi.c: add MODULE_LICENSE smsc911x: remove unused #include <linux/version.h> axnet_cs: fix phy_id detection for bogus Asix chip. bnx2: Use request_firmware() b44: Fix sizes passed to b44_sync_dma_desc_for_{device,cpu}() socket: use percpu_add() while updating sockets_in_use virtio_net: Set the mac config only when VIRITO_NET_F_MAC myri_sbus: use request_firmware e1000: fix loss of multicast packets vxge: should include tcp.h Conflict in firmware/WHENCE (SCSI vs net firmware)
2009-04-06xfrm: fix fragmentation on inter family tunnelsSteffen Klassert
If an ipv4 packet (not locally generated with IP_DF flag not set) bigger than mtu size is supposed to go via a xfrm ipv6 tunnel, the packetsize check in xfrm4_tunnel_check_size() is omited and ipv6 drops the packet without sending a notice to the original sender of the ipv4 packet. Another issue is that ipv4 connection tracking does reassembling of incomming fragmented packets. If such a reassembled packet is supposed to go via a xfrm ipv6 tunnel it will be droped, even if the original sender did proper fragmentation. According to RFC 2473 (section 7) tunnel ipv6 packets resulting from the encapsulation of an original packet are considered as locally generated packets. If such a packet passed the checks in xfrm{4,6}_tunnel_check_size() fragmentation is allowed according to RFC 2473 (section 7.1/7.2). This patch sets skb->local_df in xfrm6_prepare_output() to achieve fragmentation in this case. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-06netfilter: ip6tables regression fixEric Dumazet
Commit 7845447 (netfilter: iptables: lock free counters) broke ip6_tables by unconditionally returning ENOMEM in alloc_counters(), Reported-by: Graham Murray <graham@gmurray.org.uk> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-04-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits) trivial: Update my email address trivial: NULL noise: drivers/mtd/tests/mtd_*test.c trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h trivial: Fix misspelling of "Celsius". trivial: remove unused variable 'path' in alloc_file() trivial: fix a pdlfush -> pdflush typo in comment trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL trivial: wusb: Storage class should be before const qualifier trivial: drivers/char/bsr.c: Storage class should be before const qualifier trivial: h8300: Storage class should be before const qualifier trivial: fix where cgroup documentation is not correctly referred to trivial: Give the right path in Documentation example trivial: MTD: remove EOL from MODULE_DESCRIPTION trivial: Fix typo in bio_split()'s documentation trivial: PWM: fix of #endif comment trivial: fix typos/grammar errors in Kconfig texts trivial: Fix misspelling of firmware trivial: cgroups: documentation typo and spelling corrections trivial: Update contact info for Jochen Hein trivial: fix typo "resgister" -> "register" ...
2009-04-02netfilter: use rcu_read_bh() in ipt_do_table()Eric Dumazet
Commit 784544739a25c30637397ace5489eeb6e15d7d49 (netfilter: iptables: lock free counters) forgot to disable BH in arpt_do_table(), ipt_do_table() and ip6t_do_table() Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem. Reported-and-bisected-by: Roman Mindalev <r000n@r000n.net> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Patrick McHardy <kaber@trash.net> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>