aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2007-03-06Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [DCCP]: Set RTO for newly created child socket [DCCP]: Correctly split CCID half connections [NET]: Fix compat_sock_common_getsockopt typo. [NET]: Revert incorrect accept queue backlog changes. [INET]: twcal_jiffie should be unsigned long, not int [GIANFAR]: Fix compile error in latest git [PPPOE]: Use ifindex instead of device pointer in key lookups. [NETFILTER]: ip6_route_me_harder should take into account mark [NETFILTER]: nfnetlink_log: fix reference counting [NETFILTER]: nfnetlink_log: fix module reference counting [NETFILTER]: nfnetlink_log: fix possible NULL pointer dereference [NETFILTER]: nfnetlink_log: fix NULL pointer dereference [NETFILTER]: nfnetlink_log: fix use after free [NETFILTER]: nfnetlink_log: fix reference leak [NETFILTER]: tcp conntrack: accept SYN|URG as valid [NETFILTER]: nf_conntrack/nf_nat: fix incorrect config ifdefs [NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops
2007-03-06bonding: Improve IGMP join processingJay Vosburgh
In active-backup mode, the current bonding code duplicates IGMP traffic to all slaves, so that switches are up to date in case of a failover from an active to a backup interface. If bonding then fails back to the original active interface, it is likely that the "active slave" switch's IGMP forwarding for the port will be out of date until some event occurs to refresh the switch (e.g., a membership query). This patch alters the behavior of bonding to no longer flood IGMP to all ports, and to issue IGMP JOINs to the newly active port at the time of a failover. This insures that switches are kept up to date for all cases. "GOELLESCH Niels" <niels.goellesch@eurocontrol.int> originally reported this problem, and included a patch. His original patch was modified by Jay Vosburgh to additionally remove the existing IGMP flood behavior, use RCU, streamline code paths, fix trailing white space, and adjust for style. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-03-05[NETFILTER]: tcp conntrack: accept SYN|URG as validPatrick McHardy
Some stacks apparently send packets with SYN|URG set. Linux accepts these packets, so TCP conntrack should to. Pointed out by Martijn Posthuma <posthuma@sangine.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-05[NETFILTER]: nf_conntrack/nf_nat: fix incorrect config ifdefsPatrick McHardy
The nf_conntrack_netlink config option is named CONFIG_NF_CT_NETLINK, but multiple files use CONFIG_IP_NF_CONNTRACK_NETLINK or CONFIG_NF_CONNTRACK_NETLINK for ifdefs. Fix this and reformat all CONFIG_NF_CT_NETLINK ifdefs to only use a line. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-05[NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loopsPatrick McHardy
Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling: - unconfirmed entries can not be killed manually, they are removed on confirmation or final destruction of the conntrack entry, which means we might iterate forever without making forward progress. This can happen in combination with the conntrack event cache, which holds a reference to the conntrack entry, which is only released when the packet makes it all the way through the stack or a different packet is handled. - taking references to an unconfirmed entry and using it outside the locked section doesn't work, the list entries are not refcounted and another CPU might already be waiting to destroy the entry What the code really wants to do is make sure the references of the hash table to the selected conntrack entries are released, so they will be destroyed once all references from skbs and the event cache are dropped. Since unconfirmed entries haven't even entered the hash yet, simply mark them as dying and skip confirmation based on that. Reported and tested by Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-02[NetLabel]: Verify sensitivity level has a valid CIPSO mappingPaul Moore
The current CIPSO engine has a problem where it does not verify that the given sensitivity level has a valid CIPSO mapping when the "std" CIPSO DOI type is used. The end result is that bad packets are sent on the wire which should have never been sent in the first place. This patch corrects this problem by verifying the sensitivity level mapping similar to what is done with the category mapping. This patch also changes the returned error code in this case to -EPERM to better match what the category mapping verification code returns. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-28[TCP]: Fix minisock tcp_create_openreq_child() typo.Arnaldo Carvalho de Melo
On 2/28/07, KOVACS Krisztian <hidden@balabit.hu> wrote: > > Hi, > > While reading TCP minisock code I've found this suspiciously looking > code fragment: > > - 8< - > struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, struct sk_buff *skb) > { > struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC); > > if (newsk != NULL) { > const struct inet_request_sock *ireq = inet_rsk(req); > struct tcp_request_sock *treq = tcp_rsk(req); > struct inet_connection_sock *newicsk = inet_csk(sk); > struct tcp_sock *newtp; > - 8< - > > The above code initializes newicsk to inet_csk(sk), isn't that supposed > to be inet_csk(newsk)? As far as I can tell this might leave > icsk_ack.last_seg_size zero even if we do have received data. Good catch! David, please apply the attached patch. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26[XFRM]: Fix oops in xfrm4_dst_destroy()Bernhard Walle
With 2.6.21-rc1, I get an oops when running 'ifdown eth0' and an IPsec connection is active. If I shut down the connection before running 'ifdown eth0', then there's no problem. The critical operation of this script is to kill dhcpd. The problem is probably caused by commit with git identifier 4337226228e1cfc1d70ee975789c6bd070fb597c (Linus tree) "[IPSEC]: IPv4 over IPv6 IPsec tunnel". This patch fixes that oops. I don't know the network code of the Linux kernel in deep, so if that fix is wrong, please change it. But please fix the oops. :) Signed-off-by: Bernhard Walle <bwalle@suse.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26[XFRM_TUNNEL]: Reload header pointer after pskb_may_pull/pskb_expand_headArnaldo Carvalho de Melo
Please consider applying, this was found on your latest net-2.6 tree while playing around with that ip_hdr() + turn skb->nh/h/mac pointers as offsets on 64 bits idea :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26[IPV4]: Use random32() in net/ipv4/multipathJoe Perches
Removed local random number generator function Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26[IPV4] devinet: Register inetdev earlier.Herbert Xu
This patch allocates inetdev at registration for all devices in line with IPv6. This allows sysctl configuration on the devices to occur before they're brought up or addresses are added. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-02-26[IPV4]: Correct links in net/ipv4/KconfigBaruch Even
Correct dead/indirect links in net/ipv4/Kconfig Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26[TCP]: Fix MD5 signature pool locking.David S. Miller
The locking calls assumed that these code paths were only invoked in software interrupt context, but that isn't true. Therefore we need to use spin_{lock,unlock}_bh() throughout. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-17correct a dead URL in the IP_MULTICAST help textAdrian Bunk
Reported in kernel Bugzilla #6216. Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-17Various typo fixes.Robert P. J. Day
Correct mis-spellings of "algorithm", "appear", "consistent" and (shame, shame) "kernel". Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-14[PATCH] sysctl: remove the proc_dir_entry member for the sysctl tablesEric W. Biederman
It isn't needed anymore, all of the users are gone, and all of the ctl_table initializers have been converted to use explicit names of the fields they are initializing. [akpm@osdl.org: NTFS fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14[PATCH] sysctl: remove insert_at_head from register_sysctlEric W. Biederman
The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14[PATCH] remove many unneeded #includes of sched.hTim Schmielau
After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-13[IPSEC]: Changing API of xfrm4_tunnel_register.Kazunori MIYAZAWA
This patch changes xfrm4_tunnel register and deregister interface to prepare for solving the conflict of device tunnels with inter address family IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13[TCP]: Prevent pseudo garbage in SYN's advertized windowIlpo Järvinen
TCP may advertize up to 16-bits window in SYN packets (no window scaling allowed). At the same time, TCP may have rcv_wnd (32-bits) that does not fit to 16-bits without window scaling resulting in pseudo garbage into advertized window from the low-order bits of rcv_wnd. This can happen at least when mss <= (1<<wscale) (see tcp_select_initial_window). This patch fixes the handling of SYN advertized windows (compile tested only). In worst case (which is unlikely to occur though), the receiver advertized window could be just couple of bytes. I'm not sure that such situation would be handled very well at all by the receiver!? Fortunately, the situation normalizes after the first non-SYN ACK is received because it has the correct, scaled window. Alternatively, tcp_select_initial_window could be changed to prevent too large rcv_wnd in the first place. [ tcp_make_synack() has the same bug, and I've added a fix for that to this patch -DaveM ] Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13[NETFILTER]: Clear GSO bits for TCP reset packetHerbert Xu
The TCP reset packet is copied from the original. This includes all the GSO bits which do not apply to the new packet. So we should clear those bits. Spotted by Patrick McHardy. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[XFRM]: Fix IPv4 tunnel mode decapsulation with IPV6=nPatrick McHardy
Add missing break when CONFIG_IPV6=n. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[TCP]: cleanup of htcp (resend)Stephen Hemminger
Minor non-invasive cleanups: * white space around operators and line wrapping * use const * use __read_mostly Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[TCP]: Use read mostly for CUBIC parameters.Stephen Hemminger
These module parameters should be in the read mostly area to avoid cache pollution. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: Kconfig: improve dependency handlingPatrick McHardy
Instead of depending on internally needed options and letting users figure out what is needed, select them when needed: - IP_NF_IPTABLES, IP_NF_ARPTABLES and IP6_NF_IPTABLES select NETFILTER_XTABLES - NETFILTER_XT_TARGET_CONNMARK, NETFILTER_XT_MATCH_CONNMARK and IP_NF_TARGET_CLUSTERIP select NF_CONNTRACK_MARK - NETFILTER_XT_MATCH_CONNBYTES selects NF_CT_ACCT Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: nf_conntrack: properly use RCU for nf_conntrack_destroyed callbackPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: ip_conntrack: properly use RCU for ip_conntrack_destroyed callbackPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: ip_conntrack: fix invalid conntrack statistics RCU assumptionPatrick McHardy
CONNTRACK_STAT_INC assumes rcu_read_lock in nf_hook_slow disables preemption as well, making it legal to use __get_cpu_var without disabling preemption manually. The assumption is not correct anymore with preemptable RCU, additionally we need to protect against softirqs when not holding ip_conntrack_lock. Add CONNTRACK_STAT_INC_ATOMIC macro, which disables local softirqs, and use where necessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: nf_conntrack: properly use RCU API for ↵Patrick McHardy
nf_ct_protos/nf_ct_l3protos arrays Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in all paths not obviously only used within packet process context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: ip_conntrack: properly use RCU API for ip_ct_protos arrayPatrick McHardy
Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in all paths not obviously only used within packet process context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: nf_nat: properly use RCU API for nf_nat_protos arrayPatrick McHardy
Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in paths used outside of packet processing context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: ip_nat: properly use RCU API for ip_nat_protos arrayPatrick McHardy
Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in paths used outside of packet processing context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: nf_log: minor cleanupsPatrick McHardy
- rename nf_logging to nf_loggers since its an array of registered loggers - rename nf_log_unregister_logger() to nf_log_unregister() to make it symetrical to nf_log_register() and convert all users Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[NETFILTER]: Properly use RCU in nf_ct_attachPatrick McHardy
Use rcu_assign_pointer/rcu_dereference for ip_ct_attach pointer instead of self-made RCU and use rcu_read_lock to make sure the conntrack module doesn't disappear below us while calling it, since this function can be called from outside the netfilter hooks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[PATCH] mark struct file_operations const 7Arjan van de Ven
Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits) [IPV4]: Restore multipath routing after rt_next changes. [XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel patch. [NET]: Reorder fields of struct dst_entry [DECNET]: Convert decnet route to use the new dst_entry 'next' pointer [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer [NET]: Introduce union in struct dst_entry to hold 'next' pointer [DECNET]: fix misannotation of linkinfo_dn [DECNET]: FRA_{DST,SRC} are le16 for decnet [UDP]: UDP can use sk_hash to speedup lookups [NET]: Fix whitespace errors. [NET] XFRM: Fix whitespace errors. [NET] X25: Fix whitespace errors. [NET] WANROUTER: Fix whitespace errors. [NET] UNIX: Fix whitespace errors. [NET] TIPC: Fix whitespace errors. [NET] SUNRPC: Fix whitespace errors. [NET] SCTP: Fix whitespace errors. [NET] SCHED: Fix whitespace errors. [NET] RXRPC: Fix whitespace errors. ...
2007-02-11[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().Robert P. J. Day
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-10[IPV4]: Restore multipath routing after rt_next changes.Eric Dumazet
I forgot to test build this part of the networking code... Sorry guys. This patch renames u.rt_next to u.dst.rt_next Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointerEric Dumazet
This patch removes the rt_next pointer from 'struct rtable.u' union, and renames u.rt_next to u.dst_rt_next. It also moves 'struct flowi' right after 'struct dst_entry' to prepare the gain on lookups. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[UDP]: UDP can use sk_hash to speedup lookupsEric Dumazet
In a prior patch, I introduced a sk_hash field (__sk_common.skc_hash) to let tcp lookups use one cache line per unmatched entry instead of two. We can also use sk_hash to speedup UDP part as well. We store in sk_hash the hnum value, and use sk->sk_hash (same cache line than 'next' pointer), instead of inet->num (different cache line) Note : We still have a false sharing problem for SMP machines, because sock_hold(sock) dirties the cache line containing the 'next' pointer. Not counting the udp_hash_lock rwlock. (did someone mentioned RCU ? :) ) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[NET] IPV4: Fix whitespace errors.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NET]: change layout of ehash tableEric Dumazet
ehash table layout is currently this one : First half of this table is used by sockets not in TIME_WAIT state Second half of it is used by sockets in TIME_WAIT state. This is non optimal because of for a given hash or socket, the two chain heads are located in separate cache lines. Moreover the locks of the second half are never used. If instead of this halving, we use two list heads in inet_ehash_bucket instead of only one, we probably can avoid one cache miss, and reduce ram usage, particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep together related chains to speedup lookups and socket state change. In this patch I did not try to align struct inet_ehash_bucket, but a future patch could try to make this structure have a convenient size (a power of two or a multiple of L1_CACHE_SIZE). I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so maybe we dont need to scratch our heads to align the bucket... Note : In case struct inet_ehash_bucket is not a power of two, we could probably change alloc_large_system_hash() (in case it use __get_free_pages()) to free the unused space. It currently allocates a big zone, but the last quarter of it could be freed. Again, this should be a temporary 'problem'. Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETFILTER]: {ip,ip6}_tables: use struct xt_table instead of redefined ↵Jan Engelhardt
structure names Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETFILTER]: {ip,ip6}_tables: remove x_tables wrapper functionsJan Engelhardt
Use the x_tables functions directly to make it better visible which parts are shared between ip_tables and ip6_tables. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETFILTER]: x_tables: fix return values for LOG/ULOGJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETFILTER]: NAT: optional source port randomization supportEric Leblond
This patch adds support to NAT to randomize source ports. Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETFILTER]: add IPv6-capable TCPMSS targetPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NET]: Add UDPLITE support in a few missing spotsPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETFILTER]: nf_nat: remove broken HOOKNAME macroPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETFILTER]: tcp conntrack: do liberal tracking for picked up connectionsPatrick McHardy
Do liberal tracking (only RSTs need to be in-window) for connections picked up without seeing a SYN to deal with window scaling. Also change logging of invalid packets not to log packets accepted by liberal tracking to avoid spamming the logs. Based on suggestion from James Ralston <ralston@pobox.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>