aboutsummaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2009-09-30sit: fix off-by-one in ipip6_tunnel_get_prlSascha Hlusiak
When requesting all prl entries (kprl.addr == INADDR_ANY) and there are more prl entries than there is space passed from userspace, the existing code would always copy cmax+1 entries, which is more than can be handled. This patch makes the kernel copy only exactly cmax entries. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Acked-By: Fred L. Templin <Fred.L.Templin@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30net: Make setsockopt() optlen be unsigned.David S. Miller
This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-26Revert "sit: stateless autoconf for isatap"Sascha Hlusiak
This reverts commit 645069299a1c7358cf7330afe293f07552f11a5d. While the code does not actually break anything, it does not completely follow RFC5214 yet. After talking back with Fred L. Templin, I agree that completing the ISATAP specific RS/RA code, would pollute the kernel a lot with code that is better implemented in userspace. The kernel should not send RS packages for ISATAP at all. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Acked-by: Fred L. Templin <Fred.L.Templin@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24tunnel: eliminate recursion fieldEric Dumazet
It seems recursion field from "struct ip_tunnel" is not anymore needed. recursion prevention is done at the upper level (in dev_queue_xmit()), since we use HARD_TX_LOCK protection for tunnels. This avoids a cache line ping pong on "struct ip_tunnel" : This structure should be now mostly read on xmit and receive paths. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24sysctl: remove "struct file *" argument of ->proc_handlerAlexey Dobriyan
It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23seq_file: constify seq_operationsJames Morris
Make all seq_operations structs const, to help mitigate against revectoring user-triggerable function pointers. This is derived from the grsecurity patch, although generated from scratch because it's simpler than extracting the changes from there. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (66 commits) be2net: fix some cmds to use mccq instead of mbox atl1e: fix 2.6.31-git4 -- ATL1E 0000:03:00.0: DMA-API: device driver frees DMA pkt_sched: Fix qstats.qlen updating in dump_stats ipv6: Log the affected address when DAD failure occurs wl12xx: Fix print_mac() conversion. af_iucv: fix race when queueing skbs on the backlog queue af_iucv: do not call iucv_sock_kill() twice af_iucv: handle non-accepted sockets after resuming from suspend af_iucv: fix race in __iucv_sock_wait() iucv: use correct output register in iucv_query_maxconn() iucv: fix iucv_buffer_cpumask check when calling IUCV functions iucv: suspend/resume error msg for left over pathes wl12xx: switch to %pM to print the mac address b44: the poll handler b44_poll must not enable IRQ unconditionally ipv6: Ignore route option with ROUTER_PREF_INVALID bonding: make ab_arp select active slaves as other modes cfg80211: fix SME connect rc80211_minstrel: fix contention window calculation ssb/sdio: fix printk format warnings p54usb: add Zcomax XG-705A usbid ...
2009-09-17ipv6: Log the affected address when DAD failure occursJens Rosenboom
If an interface has multiple addresses, the current message for DAD failure isn't really helpful, so this patch adds the address itself to the printk. Signed-off-by: Jens Rosenboom <me@jayr.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-16ipv6: Ignore route option with ROUTER_PREF_INVALIDJens Rosenboom
RFC4191 says that "If the Reserved (10) value is received, the Route Information Option MUST be ignored.", so this patch makes us conform to the RFC. This is different to the usage of the Default Router Preference, where an invalid value must indeed be treated as PREF_MEDIUM. Signed-off-by: Jens Rosenboom <me@jayr.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-15bonding: remap muticast addresses without using dev_close() and dev_open()Moni Shoua
This patch fixes commit e36b9d16c6a6d0f59803b3ef04ff3c22c3844c10. The approach there is to call dev_close()/dev_open() whenever the device type is changed in order to remap the device IP multicast addresses to HW multicast addresses. This approach suffers from 2 drawbacks: *. It assumes tha the device is UP when calling dev_close(), or otherwise dev_close() has no affect. It is worth to mention that initscripts (Redhat) and sysconfig (Suse) doesn't act the same in this matter. *. dev_close() has other side affects, like deleting entries from the routing table, which might be unnecessary. The fix here is to directly remap the IP multicast addresses to HW multicast addresses for a bonding device that changes its type, and nothing else. Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15tcp: fix ssthresh u16 leftoverIlpo Järvinen
It was once upon time so that snd_sthresh was a 16-bit quantity. ...That has not been true for long period of time. I run across some ancient compares which still seem to trust such legacy. Put all that magic into a single place, I hopefully found all of them. Compile tested, though linking of allyesconfig is ridiculous nowadays it seems. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-14net: constify struct inet6_protocolAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-11ipv6: Add IFA_F_DADFAILED flagBrian Haley
Add IFA_F_DADFAILED flag to denote an IPv6 address that has failed Duplicate Address Detection, that way tools like /sbin/ip can be more informative. 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:db8::1/64 scope global tentative dadfailed valid_lft forever preferred_lft forever Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-10Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2009-09-09headers: net/ipv[46]/protocol.c header trimAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03ipv6: Fix tcp_v6_send_response(): it didn't set skb transport headerCosmin Ratiu
Here is a patch which fixes an issue observed when using TCP over IPv6 and AH from IPsec. When a connection gets closed the 4-way method and the last ACK from the server gets dropped, the subsequent FINs from the client do not get ACKed because tcp_v6_send_response does not set the transport header pointer. This causes ah6_output to try to allocate a lot of memory, which typically fails, so the ACKs never make it out of the stack. I have reproduced the problem on kernel 2.6.7, but after looking at the latest kernel it seems the problem is still there. Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02tcp: replace hard coded GFP_KERNEL with sk_allocationWu Fengguang
This fixed a lockdep warning which appeared when doing stress memory tests over NFS: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock mount_root => nfs_root_data => tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim David raised a concern that if the allocation fails in tcp_send_fin(), and it's GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting for the allocation to succeed. But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could loop endlessly under memory pressure. CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> CC: David S. Miller <davem@davemloft.net> CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02ip: Report qdisc packet dropsEric Dumazet
Christoph Lameter pointed out that packet drops at qdisc level where not accounted in SNMP counters. Only if application sets IP_RECVERR, drops are reported to user (-ENOBUFS errors) and SNMP counters updated. IP_RECVERR is used to enable extended reliable error message passing, but these are not needed to update system wide SNMP stats. This patch changes things a bit to allow SNMP counters to be updated, regardless of IP_RECVERR being set or not on the socket. Example after an UDP tx flood # netstat -s ... IP: 1487048 outgoing packets dropped ... Udp: ... SndbufErrors: 1487048 send() syscalls, do however still return an OK status, to not break applications. Note : send() manual page explicitly says for -ENOBUFS error : "The output queue for a network interface was full. This generally indicates that the interface has stopped sending, but may be caused by transient congestion. (Normally, this does not occur in Linux. Packets are just silently dropped when a device queue overflows.) " This is not true for IP_RECVERR enabled sockets : a send() syscall that hit a qdisc drop returns an ENOBUFS error. Many thanks to Christoph, David, and last but not least, Alexey ! Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02net: file_operations should be constStephen Hemminger
All instances of file_operations should be const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02inet: inet_connection_sock_af_ops constStephen Hemminger
The function block inet_connect_sock_af_ops contains no data make it constant. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02tcp: MD5 operations should be constStephen Hemminger
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02net: seq_operations should be constStephen Hemminger
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/yellowfin.c
2009-09-01ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDSEric Dumazet
qdisc drops should be notified to IP_RECVERR enabled sockets, as done in IPV4. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01net: make neigh_ops constantStephen Hemminger
These tables are never modified at runtime. Move to read-only section. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01netns: embed ip6_dst_ops directlyAlexey Dobriyan
struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated, but there is no fundamental reason for it. Embed it directly into struct netns_ipv6. For that: * move struct dst_ops into separate header to fix circular dependencies I honestly tried not to, it's pretty impossible to do other way * drop dynamical allocation, allocate together with netns For a change, remove struct dst_ops::dst_net, it's deducible by using container_of() given dst_ops pointer. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01netdev: convert pseudo-devices to netdev_tx_tStephen Hemminger
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-31netfilter: ip6t_eui: fix read outside array boundsPatrick McHardy
Use memcmp() instead of open coded comparison that reads one byte past the intended end. Based on patch from Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-29ipv6: Update Neighbor Cache when IPv6 RA is received on a routerDavid Ward
When processing a received IPv6 Router Advertisement, the kernel creates or updates an IPv6 Neighbor Cache entry for the sender -- but presently this does not occur if IPv6 forwarding is enabled (net.ipv6.conf.*.forwarding = 1), or if IPv6 Router Advertisements are not accepted (net.ipv6.conf.*.accept_ra = 0), because in these cases processing of the Router Advertisement has already halted. This patch allows the Neighbor Cache to be updated in these cases, while still avoiding any modification to routes or link parameters. This continues to satisfy RFC 4861, since any entry created in the Neighbor Cache as the result of a received Router Advertisement is still placed in the STALE state. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28sit: allow ip fragmentation when using nopmtudisc to fix package lossSascha Hlusiak
if tunnel parameters have frag_off set to IP_DF, pmtudisc on the ipv4 link will be performed by deriving the mtu from the ipv4 link and setting the DF-Flag of the encapsulating IPv4 Header. If fragmentation is needed on the way, the IPv4 pmtu gets adjusted, the ipv6 package will be resent eventually, using the new and lower mtu and everyone is happy. If the frag_off parameter is unset, the mtu for the tunnel will be derived from the tunnel device or the ipv6 pmtu, which might be higher than the ipv4 pmtu. In that case we must allow the fragmentation of the IPv4 packet because the IPv6 mtu wouldn't 'learn' from the adjusted IPv4 pmtu, resulting in frequent icmp_frag_needed and package loss on the IPv6 layer. This patch allows fragmentation when tunnel was created with parameter nopmtudisc, like in ipip/gre tunnels. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-25netfilter: nf_conntrack: log packets dropped by helpersPatrick McHardy
Log packets dropped by helpers using the netfilter logging API. This is useful in combination with nfnetlink_log to analyze those packets in userspace for debugging. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-24netfilter: xtables: mark initial tables constantJan Engelhardt
The inputted table is never modified, so should be considered const. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-23ipv6: Fix commit 63d9950b08184e6531adceb65f64b429909cc101 (ipv6: Make ↵Bruno Prémont
v4-mapped bindings consistent with IPv4) Commit 63d9950b08184e6531adceb65f64b429909cc101 (ipv6: Make v4-mapped bindings consistent with IPv4) changes behavior of inet6_bind() for v4-mapped addresses so it should behave the same way as inet_bind(). During this change setting of err to -EADDRNOTAVAIL got lost: af_inet.c:469 inet_bind() err = -EADDRNOTAVAIL; if (!sysctl_ip_nonlocal_bind && !(inet->freebind || inet->transparent) && addr->sin_addr.s_addr != htonl(INADDR_ANY) && chk_addr_ret != RTN_LOCAL && chk_addr_ret != RTN_MULTICAST && chk_addr_ret != RTN_BROADCAST) goto out; af_inet6.c:463 inet6_bind() if (addr_type == IPV6_ADDR_MAPPED) { int chk_addr_ret; /* Binding to v4-mapped address on a v6-only socket * makes no sense */ if (np->ipv6only) { err = -EINVAL; goto out; } /* Reproduce AF_INET checks to make the bindings consitant */ v4addr = addr->sin6_addr.s6_addr32[3]; chk_addr_ret = inet_addr_type(net, v4addr); if (!sysctl_ip_nonlocal_bind && !(inet->freebind || inet->transparent) && v4addr != htonl(INADDR_ANY) && chk_addr_ret != RTN_LOCAL && chk_addr_ret != RTN_MULTICAST && chk_addr_ret != RTN_BROADCAST) goto out; } else { Signed-off-by Bruno Prémont <bonbons@linux-vserver.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-14Merge branch 'percpu-for-linus' into percpu-for-nextTejun Heo
Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-13inet6: Set default traffic classGerrit Renker
This patch addresses: * assigning -1 to np->tclass as it is currently done is not very meaningful, since it turns into 0xff; * RFC 3542, 6.5 allows -1 for clearing the sticky IPV6_TCLASS option and specifies -1 to mean "use kernel default": - RFC 2460, 7. requires that the default traffic class must be zero for all 8 bits, - this is consistent with RFC 2474, 4.1 which recommends a default PHB of 0, in combination with a value of the ECN field of "non-ECT" (RFC 3168, 5.). This patch changes the meaning of -1 from assigning 255 to mean the RFC 2460 default, which at the same time allows to satisfy clearing the sticky TCLASS option as per RFC 3542, 6.5. (When passing -1 as ancillary data, the fallback remains np->tclass, which has either been set via socket options, or contains the default value.) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-13inet6: Conversion from u8 to intGerrit Renker
This replaces assignments of the type "int on LHS" = "u8 on RHS" with simpler code. The LHS can express all of the unsigned right hand side values, hence the assigned value can not be negative. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-13ipv6: Log the explicit address that triggered DAD failureJens Rosenboom
If an interface has multiple addresses, the current message for DAD failure isn't really helpful, so this patch adds the address itself to the printk. Signed-off-by: Jens Rosenboom <jens@mcbone.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-10netfilter: xtables: check for standard verdicts in policiesJan Engelhardt
This adds the second check that Rusty wanted to have a long time ago. :-) Base chain policies must have absolute verdicts that cease processing in the table, otherwise rule execution may continue in an unexpected spurious fashion (e.g. next chain that follows in memory). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: check for unconditionality of policiesJan Engelhardt
This adds a check that iptables's original author Rusty set forth in a FIXME comment. Underflows in iptables are better known as chain policies, and are required to be unconditional or there would be a stochastical chance for the policy rule to be skipped if it does not match. If that were to happen, rule execution would continue in an unexpected spurious fashion. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: ignore unassigned hooks in check_entry_size_and_hooksJan Engelhardt
The "hook_entry" and "underflow" array contains values even for hooks not provided, such as PREROUTING in conjunction with the "filter" table. Usually, the values point to whatever the next rule is. For the upcoming unconditionality and underflow checking patches however, we must not inspect that arbitrary rule. Skipping unassigned hooks seems like a good idea, also because newinfo->hook_entry and newinfo->underflow will then continue to have the poison value for detecting abnormalities. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: use memcmp in unconditional checkJan Engelhardt
Instead of inspecting each u32/char open-coded, clean up and make use of memcmp. On some arches, memcmp is implemented as assembly or GCC's __builtin_memcmp which can possibly take advantages of known alignment. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: switch table AFs to nfprotoJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: switch hook PFs to nfprotoJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: conntrack: switch hook PFs to nfprotoJan Engelhardt
Simple substitution to indicate that the fields indeed use the NFPROTO_ space. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-05net: mark read-only arrays as constJan Engelhardt
String literals are constant, and usually, we can also tag the array of pointers const too, moving it to the .rodata section. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-04xfrm6: Fix xfrm6_policy.c build when SYSCTL disabled.David S. Miller
Same as how Randy Dunlap fixed the ipv4 side of things. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02inet6: functions shadow global variableGerrit Renker
This renames away a variable clash: * ipv6_table[] is declared as a static global table; * ipv6_sysctl_net_init() uses ipv6_table to refer/destroy dynamic memory; * ipv6_sysctl_net_exit() also uses ipv6_table for the same purpose; * both the two last functions call kfree() on ipv6_table. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-30xfrm: select sane defaults for xfrm[4|6] gc_threshNeil Horman
Choose saner defaults for xfrm[4|6] gc_thresh values on init Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values (set to 1024). Given that the ipv4 and ipv6 routing caches are sized dynamically at boot time, the static selections can be non-sensical. This patch dynamically selects an appropriate gc threshold based on the corresponding main routing table size, using the assumption that we should in the worst case be able to handle as many connections as the routing table can. For ipv4, the maximum route cache size is 16 * the number of hash buckets in the route cache. Given that xfrm4 starts garbage collection at the gc_thresh and prevents new allocations at 2 * gc_thresh, we set gc_thresh to half the maximum route cache size. For ipv6, its a bit trickier. there is no maximum route cache size, but the ipv6 dst_ops gc_thresh is statically set to 1024. It seems sane to select a simmilar gc_thresh for the xfrm6 code that is half the number of hash buckets in the v6 route cache times 16 (like the v4 code does). Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-27xfrm: export xfrm garbage collector thresholds via sysctlNeil Horman
Export garbage collector thresholds for xfrm[4|6]_dst_ops Had a problem reported to me recently in which a high volume of ipsec connections on a system began reporting ENOBUFS for new connections eventually. It seemed that after about 2000 connections we started being unable to create more. A quick look revealed that the xfrm code used a dst_ops structure that limited the gc_thresh value to 1024, and always dropped route cache entries after 2x the gc_thresh. It seems the most direct solution is to export the gc_thresh values in the xfrm[4|6] dst_ops as sysctls, like the main routing table does, so that higher volumes of connections can be supported. This patch has been tested and allows the reporter to increase their ipsec connection volume successfully. Reported-by: Joe Nall <joe@nall.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> ipv4/xfrm4_policy.c | 18 ++++++++++++++++++ ipv6/xfrm6_policy.c | 18 ++++++++++++++++++ 2 files changed, 36 insertions(+) Signed-off-by: David S. Miller <davem@davemloft.net>