aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2010-02-19xfrm: Flushing empty SAD generates false eventsJamal Hadi Salim
To see the effect make sure you have an empty SAD. On window1 "ip xfrm mon" and on window2 issue "ip xfrm state flush" You get prompt back in window2 and you see the flush event on window1. With this fix, you still get prompt on window1 but no event on window2. Thanks to Alexey Dobriyan for finding a bug in earlier version when using pfkey to do the flushing. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-19pfkey: fix SA and SP flush sequenceJamal Hadi Salim
RFC 2367 says flushing behavior should be: 1) user space -> kernel: flush 2) kernel: flush 3) kernel -> user space: flush event to ALL listeners This is not realistic today in the presence of selinux policies which may reject the flush etc. So we make the sequence become: 1) user space -> kernel: flush 2) kernel: flush 3) kernel -> user space: flush response to originater from #1 4) if there were no errors then: kernel -> user space: flush event to ALL listeners Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-19nl80211: add power save commandsKalle Valo
The most needed command from nl80211, which Wireless Extensions had, is support for power save mode. Add a simple command to make it possible to enable and disable power save via nl80211. I was also planning about extending the interface, for example adding the timeout value, but after thinking more about this I decided not to do it. Basically there were three reasons: Firstly, the parameters for power save are very much hardware dependent. Trying to find a unified interface which would work with all hardware, and still make sense to users, will be very difficult. Secondly, IEEE 802.11 power save implementation in Linux is still in state of flux. We have a long way to still to go and there is no way to predict what kind of implementation we will have after few years. And because we need to support nl80211 interface a long time, practically forever, adding now parameters to nl80211 might create maintenance problems later on. Third issue are the users. Power save parameters are mostly used for debugging, so debugfs is better, more flexible, interface for this. For example, wpa_supplicant currently doesn't configure anything related to power save mode. It's better to strive that kernel can automatically optimise the power save parameters, like with help of pm qos network and other traffic parameters. Later on, when we have better understanding of power save, we can extend this command with more features, if there's a need for that. Signed-off-by: Kalle Valo <kalle.valo@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-19Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2010-02-19netfilter: nf_conntrack_reasm: properly handle packets fragmented into a ↵Patrick McHardy
single fragment When an ICMPV6_PKT_TOOBIG message is received with a MTU below 1280, all further packets include a fragment header. Unlike regular defragmentation, conntrack also needs to "reassemble" those fragments in order to obtain a packet without the fragment header for connection tracking. Currently nf_conntrack_reasm checks whether a fragment has either IP6_MF set or an offset != 0, which makes it ignore those fragments. Remove the invalid check and make reassembly handle fragment queues containing only a single fragment. Reported-and-tested-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-19netfilter: nf_queue: fix NF_STOLEN skb leakEric Dumazet
commit 3bc38712e3a6e059 (handle NF_STOP and unknown verdicts in nf_reinject) was a partial fix to packet leaks. If user asks NF_STOLEN status, we must free the skb as well. Reported-by: Afi Gjermund <afigjermund@gmail.com> Signed-off-by: Eric DUmazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-19netfilter: ctnetlink: fix creation of conntrack with helpersPablo Neira Ayuso
This patch fixes a bug that triggers an assertion if you create a conntrack entry with a helper and netfilter debugging is enabled. Basically, we hit the assertion because the confirmation flag is set before the conntrack extensions are added. To fix this, we move the extension addition before the aforementioned flag is set. This patch also removes the possibility of setting a helper for existing conntracks. This operation would also trigger the assertion since we are not allowed to add new extensions for existing conntracks. We know noone that could benefit from this operation sanely. Thanks to Eric Dumazet for initial posting a preliminary patch to address this issue. Reported-by: David Ramblewski <David.Ramblewski@atosorigin.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-18xfrm: Introduce LINUX_MIB_XFRMFWDHDRERRORjamal
XFRMINHDRERROR counter is ambigous when validating forwarding path. It makes it tricky to debug when you have both in and fwd validation. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18net: TCP thin dupackAndreas Petlund
This patch enables fast retransmissions after one dupACK for TCP if the stream is identified as thin. This will reduce latencies for thin streams that are not able to trigger fast retransmissions due to high packet interarrival time. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin. Signed-off-by: Andreas Petlund <apetlund@simula.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18net: TCP thin linear timeoutsAndreas Petlund
This patch will make TCP use only linear timeouts if the stream is thin. This will help to avoid the very high latencies that thin stream suffer because of exponential backoff. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin. A maximum of 6 linear timeouts is tried before exponential backoff is resumed. Signed-off-by: Andreas Petlund <apetlund@simula.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18const: struct nla_policyAlexey Dobriyan
Make remaining netlink policies as const. Fixup coding style where needed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18ipv6: drop unused "dev" arg of icmpv6_send()Alexey Dobriyan
Dunno, what was the idea, it wasn't used for a long time. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18ipv6: use standard lists for FIB walksAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18ipv6: remove stale MIB definitionsAlexey Dobriyan
ICMP6 MIB statistics was per-netns for quite a time. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18AF_UNIX: update locking commentStephen Hemminger
The lock used in unix_state_lock() is a spin_lock not reader-writer. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18netfilter: nf_defrag_ipv4: fix compilation error with NF_CONNTRACK=nPatrick McHardy
As reported by Randy Dunlap <randy.dunlap@oracle.com>, compilation of nf_defrag_ipv4 fails with: include/net/netfilter/nf_conntrack.h:94: error: field 'ct_general' has incomplete type include/net/netfilter/nf_conntrack.h:178: error: 'const struct sk_buff' has no member named 'nfct' include/net/netfilter/nf_conntrack.h:185: error: implicit declaration of function 'nf_conntrack_put' include/net/netfilter/nf_conntrack.h:294: error: 'const struct sk_buff' has no member named 'nfct' net/ipv4/netfilter/nf_defrag_ipv4.c:45: error: 'struct sk_buff' has no member named 'nfct' net/ipv4/netfilter/nf_defrag_ipv4.c:46: error: 'struct sk_buff' has no member named 'nfct' net/nf_conntrack.h must not be included with NF_CONNTRACK=n, add a few #ifdefs. Long term the header file should be fixed to be usable even with NF_CONNTRACK=n. Tested-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-18ipvs: SCTP Trasport Loadbalancing SupportVenkata Mohan Reddy
Enhance IPVS to load balance SCTP transport protocol packets. This is done based on the SCTP rfc 4960. All possible control chunks have been taken care. The state machine used in this code looks some what lengthy. I tried to make the state machine easy to understand. Signed-off-by: Venkata Mohan Reddy Koppula <mohanreddykv@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-18Merge branch 'ebt_config_compat_v4' of git://git.breakpoint.cc/fw/nf-next-2.6Patrick McHardy
2010-02-17IPv6: convert mc_lock to spinlockStephen Hemminger
Only used for writing, so convert to spinlock Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17net: export attach/detach filter routinesMichael S. Tsirkin
Export sk_attach_filter/sk_detach_filter routines, so that tun module can use them. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17net: bug fix for vlan + gro issueAjit Khaparde
Traffic (tcp) doesnot start on a vlan interface when gro is enabled. Even the tcp handshake was not taking place. This is because, the eth_type_trans call before the netif_receive_skb in napi_gro_finish() resets the skb->dev to napi->dev from the previously set vlan netdev interface. This causes the ip_route_input to drop the incoming packet considering it as a packet coming from a martian source. I could repro this on 2.6.32.7 (stable) and 2.6.33-rc7. With this fix, the traffic starts and the test runs fine on both vlan and non-vlan interfaces. CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: Ajit Khaparde <ajitk@serverengines.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17xfrm: Revert false event eliding commits.David S. Miller
As reported by Alexey Dobriyan: -------------------- setkey now takes several seconds to run this simple script and it spits "recv: Resource temporarily unavailable" messages. #!/usr/sbin/setkey -f flush; spdflush; add A B ipcomp 44 -m tunnel -C deflate; add B A ipcomp 45 -m tunnel -C deflate; spdadd A B any -P in ipsec ipcomp/tunnel/192.168.1.2-192.168.1.3/use; spdadd B A any -P out ipsec ipcomp/tunnel/192.168.1.3-192.168.1.2/use; -------------------- Obviously applications want the events even when the table is empty. So we cannot make this behavioral change. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17ethtool: Don't flush n-tuple list from ethtool_reset()Ben Hutchings
The n-tuple list should be flushed if and only if the ETH_RESET_FILTER flag is set and the driver is able to reset filtering/flow direction hardware without also resetting a component whose flag is not set. This test is best left to the driver. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17net: use kasprintf() for socket cache namesAlexey Dobriyan
kasprintf() makes code smaller. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17xt_hashlimit: fix lockingEric Dumazet
Commit 2eff25c18c3d332d3c4dd98f2ac9b7114e9771b0 (netfilter: xt_hashlimit: fix race condition and simplify locking) added a mutex deadlock : htable_create() is called with hashlimit_mutex already locked Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17ipmr: remove useless checks from ipmr_device_eventPavel Emelyanov
The net being checked there is dev_net(dev) and thus this if is always false. Fits both net and net-next trees. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17net: remove INIT_RCU_HEAD() usageAlexey Dobriyan
call_rcu() will unconditionally reinitialize RCU head anyway. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16percpu: add __percpu sparse annotations to netTejun Heo
Add __percpu sparse annotations to net. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. The macro and type tricks around snmp stats make things a bit interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All snmp_mib_*() users which used to cast the argument to (void **) are updated to cast it to (void __percpu **). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2010-02-16xfrm: avoid spinlock in get_acqseq() used by xfrm userjamal
Eric's version fixed it for pfkey. This one is for xfrm user. I thought about amortizing those two get_acqseq()s but it seems reasonable to have two of these sequence spaces for the two different interfaces. cheers, jamal commit d5168d5addbc999c94aacda8f28a4a173756a72b Author: Jamal Hadi Salim <hadi@cyberus.ca> Date: Tue Feb 16 06:51:22 2010 -0500 xfrm: avoid spinlock in get_acqseq() used by xfrm user This is in the same spirit as commit 28aecb9d7728dc26bf03ce7925fe622023a83a2a by Eric Dumazet. Use atomic_inc_return() in get_acqseq() to avoid taking a spinlock Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits) be2net: set proper value to version field in req hdr xfrm: Fix xfrm_state_clone leak ipcomp: Avoid duplicate calls to ipcomp_destroy ethtool: allow non-admin user to read GRO settings. ixgbe: fix WOL register setup for 82599 ixgbe: Fix - Do not allow Rx FC on 82598 at 1G due to errata sfc: Fix SFE4002 initialisation mac80211: fix handling of null-rate control in rate_control_get_rate inet: Remove bogus IGMPv3 report handling iwlwifi: fix AMSDU Rx after paged Rx patch tcp: fix ICMP-RTO war via-velocity: Fix races on shared interrupts via-velocity: Take spinlock on set coalesce via-velocity: Remove unused IRQ status parameter from rx_srv and tx_srv rtl8187: Add new device ID iwmc3200wifi: Test of wrong pointer after kzalloc in iwm_mlme_update_bss_table() ath9k: Fix sequence numbers for PAE frames mac80211: fix deferred hardware scan requests iwlwifi: Fix to set correct ht configuration mac80211: Fix probe request filtering in IBSS mode ...
2010-02-16net neigh: Decouple per interface neighbour table controls from binary sysctlsEric W. Biederman
Stop computing the number of neighbour table settings we have by counting the number of binary sysctls. This behaviour was silly and meant that we could not add another neighbour table setting without also adding another binary sysctl. Don't pass the binary sysctl path for neighour table entries into neigh_sysctl_register. These parameters are no longer used and so are just dead code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16net ipv4: Decouple ipv4 interface parameters from binary sysctl numbersEric W. Biederman
Stop using the binary sysctl enumeartion in sysctl.h as an index into a per interface array. This leads to unnecessary binary sysctl number allocation, and a fragility in data structure and implementation because of unnecessary coupling. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16tunnels: fix netns vs proto registration orderingAlexey Dobriyan
Same stuff as in ip_gre patch: receive hook can be called before netns setup is done, oopsing in net_generic(). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16gre: fix netns vs proto registration orderingAlexey Dobriyan
GRE protocol receive hook can be called right after protocol addition is done. If netns stuff is not yet initialized, we're going to oops in net_generic(). This is remotely oopsable if ip_gre is compiled as module and packet comes at unfortunate moment of module loading. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16xfrm: Fix xfrm_state_clone leakHerbert Xu
xfrm_state_clone calls kfree instead of xfrm_state_put to free a failed state. Depending on the state of the failed state, it can cause leaks to things like module references. All states should be freed by xfrm_state_put past the point of xfrm_init_state. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16ipcomp: Avoid duplicate calls to ipcomp_destroyHerbert Xu
When ipcomp_tunnel_attach fails we will call ipcomp_destroy twice. This may lead to double-frees on certain structures. As there is no reason to explicitly call ipcomp_destroy, this patch removes it from ipcomp*.c and lets the standard xfrm_state destruction take place. This is based on the discovery and patch by Alexey Dobriyan. Tested-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16ethtool: allow non-admin user to read GRO settings.stephen hemminger
Looks like an oversight in GRO design. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16mac80211: split ieee80211_drop_unencryptedJohannes Berg
Currently, ieee80211_drop_unencrypted is called from management and data frame context, and the different contexts pass different frames. This could lead to it processing an 802.3 frame as an 802.11 frame when MFP is enabled. Move the MFP part of ieee80211_drop_unencrypted into a new function that is only called for mgmt frames. Cc: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-16Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2010-02-16netfilter: ebtables: mark: add CONFIG_COMPAT supportFlorian Westphal
Add the required handlers to convert 32 bit ebtables mark match and match target structs to 64bit layout. Signed-off-by: Florian Westphal <fwestphal@astaro.com>
2010-02-16netfilter: ebt_limit: add CONFIG_COMPAT supportFlorian Westphal
ebt_limit structure is larger on 64 bit systems due to "long" type used in the (kernel-only) data section. Setting .compatsize is enough in this case, these values have no meaning in userspace. Signed-off-by: Florian Westphal <fwestphal@astaro.com>
2010-02-16netfilter: ebtables: try native set/getsockopt handlers, tooFlorian Westphal
ebtables can be compiled to perform userspace-side padding of structures. In that case, all the structures are already in the 'native' format expected by the kernel. This tries to determine what format the userspace program is using. For most set/getsockopts, this can be done by checking the len argument for sizeof(compat_ebt_replace) and re-trying the native handler on error. In case of EBT_SO_GET_ENTRIES, the native handler is tried first, it will error out early when checking the *len argument (the compat version has to defer this check until after iterating over the kernel data set once, to adjust for all the structure size differences). As this would cause error printks, remove those as well, as recommended by Bart de Schuymer. Signed-off-by: Florian Westphal <fw@strlen.de>
2010-02-16netfilter: ebtables: add CONFIG_COMPAT supportFlorian Westphal
Main code for 32 bit userland ebtables binary with 64 bit kernels support. Tested on x86_64 kernel only, using 64bit ebtables binary for output comparision. At least ebt_mark, m_mark and ebt_limit need CONFIG_COMPAT hooks, too. remaining problem: The ebtables userland makefile has: ifeq ($(shell uname -m),sparc64) CFLAGS+=-DEBT_MIN_ALIGN=8 -DKERNEL_64_USERSPACE_32 endif struct ebt_replace, ebt_entry_match etc. then contain userland-side padding, i.e. even if we are called from a 32 bit userland, the structures may already be in the right format. This problem is addressed in a follow-up patch. Signed-off-by: Florian Westphal <fwestphal@astaro.com>
2010-02-16netfilter: ebtables: split update_counters into two functionsFlorian Westphal
allows to call do_update_counters() from upcoming CONFIG_COMPAT code instead of copy&pasting the same code. Signed-off-by: Florian Westphal <fw@strlen.de>
2010-02-16netfilter: ebtables: split copy_everything_to_user into two functionsFlorian Westphal
once CONFIG_COMPAT support is added to ebtables, the new copy_counters_to_user function can be called instead of duplicating code. Also remove last use of MEMPRINT, as requested by Bart De Schuymer. Signed-off-by: Florian Westphal <fw@strlen.de>
2010-02-16netfilter: ebtables: split do_replace into two functionsFlorian Westphal
once CONFIG_COMPAT support is merged this allows to call do_replace_finish() after doing the CONFIG_COMPAT conversion instead of copy & pasting this. Signed-off-by: Florian Westphal <fw@strlen.de>
2010-02-15ethtool: reduce stack usageEric Dumazet
dev_ethtool() is currently using 604 bytes of stack, even with gcc-4.4.2 objdump -d vmlinux | scripts/checkstack.pl ... 0xc04bbc33 dev_ethtool [vmlinux]: 604 ... Adding noinline attributes to selected functions can reduce stack usage. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-15X25: Dont let x25_bind use addresses containing charactersandrew hendry
Addresses should be all digits. Stops x25_bind using addresses containing characters. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-15X25: Fix x25_create errors for bad protocol and ENOBUFSandrew hendry
alloc_socket failures should return -ENOBUFS a bad protocol should return -EINVAL Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>