aboutsummaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2007-04-24IPv6: fix Routing Header Type 0 handling thinkoYOSHIFUJI Hideaki
Oops, thinko. The test for accempting a RH0 was exatly the wrong way around. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-24[IPV6]: Disallow RH0 by default.YOSHIFUJI Hideaki
A security issue is emerging. Disallow Routing Header Type 0 by default as we have been doing for IPv4. Note: We allow RH2 by default because it is harmless. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-13[IPV6] SNMP: Fix {In,Out}NoRoutes statistics.YOSHIFUJI Hideaki
A packet which is being discarded because of no routes in the forwarding path should not be counted as OutNoRoutes but as InNoRoutes. Additionally, on this occasion, a packet whose destinaion is not valid should be counted as InAddrErrors separately. Based on patch from Mitsuru Chinen <mitch@linux.vnet.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-06[IPV6]: Revert recent change to rt6_check_dev().David S. Miller
This reverts a0d78ebf3a0e33a1aeacf2fc518ad9273d6a1c2f It causes pings to link-local addresses to fail. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-04[IPv6]: Exclude truncated packets from InHdrErrors statisticsMitsuru Chinen
Incoming trancated packets are counted as not only InTruncatedPkts but also InHdrErrors. They should be counted as InTruncatedPkts only. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-02[IPv6]: Fix incorrect length check in rawv6_sendmsg()YOSHIFUJI Hideaki
In article <20070329.142644.70222545.davem@davemloft.net> (at Thu, 29 Mar 2007 14:26:44 -0700 (PDT)), David Miller <davem@davemloft.net> says: > From: Sridhar Samudrala <sri@us.ibm.com> > Date: Thu, 29 Mar 2007 14:17:28 -0700 > > > The check for length in rawv6_sendmsg() is incorrect. > > As len is an unsigned int, (len < 0) will never be TRUE. > > I think checking for IPV6_MAXPLEN(65535) is better. > > > > Is it possible to send ipv6 jumbo packets using raw > > sockets? If so, we can remove this check. > > I don't see why such a limitation against jumbo would exist, > does anyone else? > > Thanks for catching this Sridhar. A good compiler should simply > fail to compile "if (x < 0)" when 'x' is an unsigned type, don't > you think :-) Dave, we use "int" for returning value, so we should fix this anyway, IMHO; we should not allow len > INT_MAX. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-27[IPV6]: Set IF_READY if the device is up and has carrierHerbert Xu
We still need to set the IF_READY flag in ipv6_add_dev for the case where all addresses (including the link-local) are deleted and then recreated. In that case the IPv6 device too will be destroyed and then recreated. In order to prevent the original problem, we simply ensure that the device is up before setting IF_READY. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25[IPV6]: Fix routing round-robin locking.David S. Miller
As per RFC2461, section 6.3.6, item #2, when no routers on the matching list are known to be reachable or probably reachable we do round robin on those available routes so that we make sure to probe as many of them as possible to detect when one becomes reachable faster. Each routing table has a rwlock protecting the tree and the linked list of routes at each leaf. The round robin code executes during lookup and thus with the rwlock taken as a reader. A small local spinlock tries to provide protection but this does not work at all for two reasons: 1) The round-robin list manipulation, as coded, goes like this (with read lock held): walk routes finding head and tail spin_lock(); rotate list using head and tail spin_unlock(); While one thread is rotating the list, another thread can end up with stale values of head and tail and then proceed to corrupt the list when it gets the lock. This ends up causing the OOPS in fib6_add() later onthat many people have been hitting. 2) All the other code paths that run with the rwlock held as a reader do not expect the list to change on them, they expect it to remain completely fixed while they hold the lock in that way. So, simply stated, it is impossible to implement this correctly using a manipulation of the list without violating the rwlock locking semantics. Reimplement using a per-fib6_node round-robin pointer. This way we don't need to manipulate the list at all, and since the round-robin pointer can only ever point to real existing entries we don't need to perform any locking on the changing of the round-robin pointer itself. We only need to reset the round-robin pointer to NULL when the entry it is pointing to is removed. The idea is from Thomas Graf and it is very similar to how this was implemented before the advanced router selection code when in. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25[NET]: Fix fib_rules compatibility breakageThomas Graf
Based upon a patch from Patrick McHardy. The fib_rules netlink attribute policy introduced in 2.6.19 broke userspace compatibilty. When specifying a rule with "from all" or "to all", iproute adds a zero byte long netlink attribute, but the policy requires all addresses to have a size equal to sizeof(struct in_addr)/sizeof(struct in6_addr), resulting in a validation error. Check attribute length of FRA_SRC/FRA_DST in the generic framework by letting the family specific rules implementation provide the length of an address. Report an error if address length is non zero but no address attribute is provided. Fix actual bug by checking address length for non-zero instead of relying on availability of attribute. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-22[NET]: fix up misplaced inlines.Dave Jones
Turning up the warnings on gcc makes it emit warnings about the placement of 'inline' in function declarations. Here's everything that was under net/ Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-16[IPV6]: ipv6_fl_socklist is inadvertently shared.Masayuki Nakagawa
The ipv6_fl_socklist from listening socket is inadvertently shared with new socket created for connection. This leads to a variety of interesting, but fatal, bugs. For example, removing one of the sockets may lead to the other socket's encountering a page fault when the now freed list is referenced. The fix is to not share the flow label list with the new socket. Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-09[IPV6] fix ipv6_getsockopt_sticky copy_to_user leakChris Wright
User supplied len < 0 can cause leak of kernel memory. Use unsigned compare instead. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-09[IPV6]: Fix for ipv6_setsockopt NULL dereferenceOlaf Kirch
I came across this bug in http://bugzilla.kernel.org/show_bug.cgi?id=8155 Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-07[IPV6]: Do not set IF_READY if device is downHerbert Xu
Now that we add the IPv6 device at registration time we don't need to set IF_READY in ipv6_add_dev anymore because we will always get a NETDEV_UP event later on should the device ever become ready. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-07[IPV6]: Handle np->opt being NULL in ipv6_getsockopt_sticky().David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-07[NETFILTER]: nf_conntrack_ipv6: fix incorrect classification of IPv6 ↵Patrick McHardy
fragments as ESTABLISHED The individual fragments of a packet reassembled by conntrack have the conntrack reference from the reassembled packet attached, but nfctinfo is not copied. This leaves it initialized to 0, which unfortunately is the value of IP_CT_ESTABLISHED. The result is that all IPv6 fragments are tracked as ESTABLISHED, allowing them to bypass a usual ruleset which accepts ESTABLISHED packets early. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-05[NETFILTER]: ip6_route_me_harder should take into account markYasuyuki Kozakai
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-05[NETFILTER]: nf_conntrack/nf_nat: fix incorrect config ifdefsPatrick McHardy
The nf_conntrack_netlink config option is named CONFIG_NF_CT_NETLINK, but multiple files use CONFIG_IP_NF_CONNTRACK_NETLINK or CONFIG_NF_CONNTRACK_NETLINK for ifdefs. Fix this and reformat all CONFIG_NF_CT_NETLINK ifdefs to only use a line. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-28[IPV6]: /proc/net/anycast6 unbalanced inet6_dev refcntDavid Stevens
Reading /proc/net/anycast6 when there is no anycast address on an interface results in an ever-increasing inet6_dev reference count, as well as a reference to the netdevice you can't get rid of. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-28[IPV6]: anycast refcnt fixMichal Wrobel
This patch fixes a bug in Linux IPv6 stack which caused anycast address to be added to a device prior DAD has been completed. This led to incorrect reference count which resulted in infinite wait for unregister_netdevice completion on interface removal. Signed-off-by: Michal Wrobel <xmxwx@asn.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26[IPV6]: Fix __ipv6_addr_type() export in correct place.David S. Miller
It needs to be in net/ipv6/addrconf_core.c Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26[IPV6] ADDRCONF: Register inet6_dev earlier.YOSHIFUJI Hideaki
Allocate inet6_dev earlier to allow users to set up per-interface variables. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-02-26[IPV6] ADDRCONF: Manage prefix route corresponding to address manually added.YOSHIFUJI Hideaki
It is more natural to manage prefix routes corresponding to address which is being added manually. With help from Masafumi Aramoto <aramoto@linux-ipv6.org>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-02-26[IPV6] IP6TUNNEL: Use update_pmtu() of dst on xmit.Yasuyuki Kozakai
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-02-26[IPV6] ADDRCONF: Statically link __ipv6_addr_type() for sunrpc subsystem.YOSHIFUJI Hideaki
Link __ipv6_addr_type() statically for sunrpc code even if IPv6 is built as module. Signed-off-by: YOSHIFUJI Hidaki <yoshfuji@linux-ipv6.org>
2007-02-26[IPV6]: Adjust inet6_exit() cleanup sequence against inet6_init()Joe Jin
This patch for adjust inet6_exit() to inverse sequence to inet6_init(). At ipv6_init, it first create proc_root/net/dev_snmp6 entry by call ipv6_misc_proc_init(), then call addrconf_init() to create the corresponding device entry at this directory, but at inet6_exit, ipv6_misc_proc_exit() called first, then call addrconf_init(). Signed-off-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26[IPSEC]: More fix is needed for __xfrm6_bundle_create().Noriaki TAKAMIYA
Fixed to set fl_tunnel.fl6_src correctly in xfrm6_bundle_create(). Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Acked-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-14[PATCH] sysctl: remove the proc_dir_entry member for the sysctl tablesEric W. Biederman
It isn't needed anymore, all of the users are gone, and all of the ctl_table initializers have been converted to use explicit names of the fields they are initializing. [akpm@osdl.org: NTFS fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14[PATCH] sysctl: remove insert_at_head from register_sysctlEric W. Biederman
The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14[PATCH] remove many unneeded #includes of sched.hTim Schmielau
After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-13[IPSEC]: changing API of xfrm6_tunnel_registerKazunori MIYAZAWA
This patch changes xfrm6_tunnel register and deregister interface to prepare for solving the conflict of device tunnels with inter address family IPsec tunnel. There is no device which conflicts with IPv4 over IPv6 IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13[IPSEC]: make sit use the xfrm4_tunnel_registerKazunori MIYAZAWA
This patch makes sit use xfrm4_tunnel_register instead of inet_add_protocol. It solves conflict of sit device with inter address family IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[IPV6] HASHTABLES: Use appropriate seed for caluculating ehash index.YOSHIFUJI Hideaki
Tetsuo Handa <handat@pm.nttdata.co.jp> told me that connect(2) with TCPv6 socket almost always took a few minutes to return when we did not have any ports available in the range of net.ipv4.ip_local_port_range. The reason was that we used incorrect seed for calculating index of hash when we check established sockets in __inet6_check_established(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: ip6t_mh: drop piggyback payload packet on MH packetsMasahide NAKAMURA
Regarding RFC3775, MH payload proto field should be IPPROTO_NONE. Otherwise it must be discarded (and the receiver should send ICMP error). We assume filter should drop such piggyback everytime to disallow slipping through firewall rules, even the final receiver will discard it. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: Kconfig: improve dependency handlingPatrick McHardy
Instead of depending on internally needed options and letting users figure out what is needed, select them when needed: - IP_NF_IPTABLES, IP_NF_ARPTABLES and IP6_NF_IPTABLES select NETFILTER_XTABLES - NETFILTER_XT_TARGET_CONNMARK, NETFILTER_XT_MATCH_CONNMARK and IP_NF_TARGET_CLUSTERIP select NF_CONNTRACK_MARK - NETFILTER_XT_MATCH_CONNBYTES selects NF_CT_ACCT Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: nf_conntrack: fix invalid conntrack statistics RCU assumptionPatrick McHardy
NF_CT_STAT_INC assumes rcu_read_lock in nf_hook_slow disables preemption as well, making it legal to use __get_cpu_var without disabling preemption manually. The assumption is not correct anymore with preemptable RCU, additionally we need to protect against softirqs when not holding nf_conntrack_lock. Add NF_CT_STAT_INC_ATOMIC macro, which disables local softirqs, and use where necessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: nf_conntrack: properly use RCU API for ↵Patrick McHardy
nf_ct_protos/nf_ct_l3protos arrays Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in all paths not obviously only used within packet process context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: nf_log: minor cleanupsPatrick McHardy
- rename nf_logging to nf_loggers since its an array of registered loggers - rename nf_log_unregister_logger() to nf_log_unregister() to make it symetrical to nf_log_register() and convert all users Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[PATCH] mark struct file_operations const 7Arjan van de Ven
Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits) [IPV4]: Restore multipath routing after rt_next changes. [XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel patch. [NET]: Reorder fields of struct dst_entry [DECNET]: Convert decnet route to use the new dst_entry 'next' pointer [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer [NET]: Introduce union in struct dst_entry to hold 'next' pointer [DECNET]: fix misannotation of linkinfo_dn [DECNET]: FRA_{DST,SRC} are le16 for decnet [UDP]: UDP can use sk_hash to speedup lookups [NET]: Fix whitespace errors. [NET] XFRM: Fix whitespace errors. [NET] X25: Fix whitespace errors. [NET] WANROUTER: Fix whitespace errors. [NET] UNIX: Fix whitespace errors. [NET] TIPC: Fix whitespace errors. [NET] SUNRPC: Fix whitespace errors. [NET] SCTP: Fix whitespace errors. [NET] SCHED: Fix whitespace errors. [NET] RXRPC: Fix whitespace errors. ...
2007-02-11[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().Robert P. J. Day
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-10[XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel ↵Masahide NAKAMURA
patch. It seems to miss RO mode path by IPv6 over IPv4 IPsec tunnel patch when it changed semantics to check the mode from "xfrm[i]->props.mode != XFRM_MODE_TRANSPORT" to "xfrm[i]->props.mode == XFRM_MODE_TUNNEL" before changing address. It also makes two incline functions __xfrm6_bundle_addr_{remote,local} are used by nobody. This patch fixes it. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointerEric Dumazet
This patch removes the next pointer from 'struct rt6_info.u' union, and renames u.next to u.dst.rt6_next. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[UDP]: UDP can use sk_hash to speedup lookupsEric Dumazet
In a prior patch, I introduced a sk_hash field (__sk_common.skc_hash) to let tcp lookups use one cache line per unmatched entry instead of two. We can also use sk_hash to speedup UDP part as well. We store in sk_hash the hnum value, and use sk->sk_hash (same cache line than 'next' pointer), instead of inet->num (different cache line) Note : We still have a false sharing problem for SMP machines, because sock_hold(sock) dirties the cache line containing the 'next' pointer. Not counting the udp_hash_lock rwlock. (did someone mentioned RCU ? :) ) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[NET] IPV6: Fix whitespace errors.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NET]: change layout of ehash tableEric Dumazet
ehash table layout is currently this one : First half of this table is used by sockets not in TIME_WAIT state Second half of it is used by sockets in TIME_WAIT state. This is non optimal because of for a given hash or socket, the two chain heads are located in separate cache lines. Moreover the locks of the second half are never used. If instead of this halving, we use two list heads in inet_ehash_bucket instead of only one, we probably can avoid one cache miss, and reduce ram usage, particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep together related chains to speedup lookups and socket state change. In this patch I did not try to align struct inet_ehash_bucket, but a future patch could try to make this structure have a convenient size (a power of two or a multiple of L1_CACHE_SIZE). I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so maybe we dont need to scratch our heads to align the bucket... Note : In case struct inet_ehash_bucket is not a power of two, we could probably change alloc_large_system_hash() (in case it use __get_free_pages()) to free the unused space. It currently allocates a big zone, but the last quarter of it could be freed. Again, this should be a temporary 'problem'. Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETFILTER]: ip6_tables: remove redundant structure definitionsPatrick McHardy
Move ip6t_standard/ip6t_error_target/ip6t_error definitions to ip6_tables.h instead of defining them in each table individually. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETFILTER]: ip6_tables: support MH matchMasahide NAKAMURA
This introduces match for Mobility Header (MH) described by Mobile IPv6 specification (RFC3775). User can specify the MH type or its range to be matched. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: Yasuyuki Kozakai <kozakai@linux-ipv6.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETFILTER]: {ip,ip6}_tables: use struct xt_table instead of redefined ↵Jan Engelhardt
structure names Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETFILTER]: {ip,ip6}_tables: remove x_tables wrapper functionsJan Engelhardt
Use the x_tables functions directly to make it better visible which parts are shared between ip_tables and ip6_tables. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>