aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2007-05-24[XFRM]: Allow packet drops during larval state resolution.David S. Miller
The current IPSEC rule resolution behavior we have does not work for a lot of people, even though technically it's an improvement from the -EAGAIN buisness we had before. Right now we'll block until the key manager resolves the route. That works for simple cases, but many folks would rather packets get silently dropped until the key manager resolves the IPSEC rules. We can't tell these folks to "set the socket non-blocking" because they don't have control over the non-block setting of things like the sockets used to resolve DNS deep inside of the resolver libraries in libc. With that in mind I coded up the patch below with some help from Herbert Xu which provides packet-drop behavior during larval state resolution, controllable via sysctl and off by default. This lays the framework to either: 1) Make this default at some point or... 2) Move this logic into xfrm{4,6}_policy.c and implement the ARP-like resolution queue we've all been dreaming of. The idea would be to queue packets to the policy, then once the larval state is resolved by the key manager we re-resolve the route and push the packets out. The packets would timeout if the rule didn't get resolved in a certain amount of time. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[NETFILTER]: nf_nat_h323: call set_h225_addr instead of set_h225_addr_hookJing Min Zhao
They're the same. Signed-off-by: Jing Min Zhao <zhaojingmin@vivecode.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[NETFILTER]: nf_conntrack_ftp: fix newline sequence number calculationPatrick McHardy
When the packet size is changed by the FTP NAT helper, the connection tracking helper adjusts the sequence number of the newline character by the size difference. This is wrong because NAT sequence number adjustment happens after helpers are called, so the unadjusted number is compared to the already adjusted one. Based on report by YU, Haitao <yuhaitao@tsinghua.org.cn> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[RTNETLINK]: Fix sending netlink message when replace route.Milan Kocian
When you replace route via ip r r command the netlink multicast message is not send. This patch corrects it. NL message is sent with NLM_F_REPLACE flag. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=8320 Signed-off-by: Milan Kocian <milon@wq.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[IPVS]: Use menuconfig objects.Jan Engelhardt
Use menuconfigs instead of menus, so the whole menu can be disabled at once instead of going through all options. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-19[IPV4]: icmp: fix crash with sysctl_icmp_errors_use_inbound_ifaddrPatrick McHardy
When icmp_send is called on the local output path before the packet hits ip_output, skb->dev is not set, causing a crash when sysctl_icmp_errors_use_inbound_ifaddr is set. This can happen with the netfilter REJECT target or IPsec tunnels. Let routing decide the ICMP source address in that case, since the packet is locally generated there is no inbound interface and the sysctl should not apply. The option actually seems to be unfixable broken, on the path after ip_output() skb->dev points to the outgoing device and we don't know the incoming device anymore, so its going to do the absolute wrong thing and pick the address of the outgoing interface. Add a comment about this. Reported by Curtis Doty <Curtis@GreenKey.net>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-19[NETFILTER]: nf_conntrack_ipv4: fix incorrect #ifdef config namePatrick McHardy
The option is named CONFIG_NF_NAT not CONFIG_IP_NF_NAT. Remove the ifdef completely since helpers also expect defragmented packet even without NAT. Noticed by Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-19[TCP] FRTO: Prevent state inconsistency in corner casesIlpo Järvinen
State could become inconsistent in two cases: 1) Userspace disabled FRTO by tuning sysctl when one of the TCP flows was in the middle of FRTO algorithm (and then RTO is again triggered) 2) SACK reneging occurs during FRTO algorithm A simple solution is just to abort the previous FRTO when such obscure condition occurs... Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-19[TCP] FRTO: Add missing ECN CWR sending to one of the responsesIlpo Järvinen
The conservative spurious RTO response did not queue CWR even though the sending rate was lowered. Whenever reduction happens regardless of reason, CWR should be sent (forgetting to send it is not very fatal though). A better approach would be to queue CWR when one of the sending rate reducing responses (rate-halving one or this conservative response) is used already at RTO. Doing that would allow CWR to be sent along with the two new data segments that are sent during FRTO. However, it's a bit "racy" because userland could tune the response sysctl to a more aggressive one in between. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-18[IPV4]: Remove IPVS icmp hack from route.c for now.David S. Miller
Revert: 2d771cd86d4c3af26f34a7bcdc1b87696824cad9 This is dangerous if enabled and a better solution to the problem is being worked on. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-17[IPV4]: Correct rp_filter help text.Dave Jones
As mentioned in http://bugzilla.kernel.org/show_bug.cgi?id=5015 The helptext implies that this is on by default. This may be true on some distros (Fedora/RHEL have it enabled in /etc/sysctl.conf), but the kernel defaults to it off. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-17[TCP]: TCP_CONG_YEAH requires TCP_CONG_VEGASDavid S. Miller
These two congestion control modules share code. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-17[TCP] slow start: Make comments and code logic clearer.Stephen Hemminger
Add more comments to describe our version of tcp_slow_start(). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-14[IPV4] SNMP: Display new statistics at /proc/net/netstatMitsuru Chinen
This displays the statistics specified in the updated IP-MIB RFC (RFC4293) in /proc/net/netstat. The reason why these are not displayed in /proc/net/snmp is that some existing utilities are developed under the assumption which ipstat items in /proc/net/snmp is unchanged. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10[NETFILTER]: iptable_raw: ignore short packets sent by SOCK_RAW socketsPatrick McHardy
iptables matches and targets expect packets to have at least a full IP header and a valid header length. Ignore packets sent through raw sockets for which this isn't true as in the other tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10[NETFILTER]: iptable_{filter,mangle}: more descriptive "happy cracking" messagePatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10[NETFILTER]: nf_nat: remove unused argument of function allocating bindingYasuyuki Kozakai
nf_nat_rule_find, alloc_null_binding and alloc_null_binding_confirmed do not use the argument 'info', which is actually ct->nat.info. If they are necessary to access it again, we can use the argument 'ct' instead. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10[NETFILTER]: Clean up table initializationPatrick McHardy
- move arp_tables initial table structure definitions to arp_tables.h similar to ip_tables and ip6_tables - use C99 initializers - use initializer macros where possible Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10[UDP]: Fix AF-specific references in AF-agnostic code.David S. Miller
__udp_lib_port_inuse() cannot make direct references to inet_sk(sk)->rcv_saddr as that is ipv4 specific state and this code is used by ipv6 too. Use an operations vector to solve this, and this also paves the way for ipv6 support for non-wild saddr hashing in UDP. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits) sound: convert "sound" subdirectory to UTF-8 MAINTAINERS: Add cxacru website/mailing list include files: convert "include" subdirectory to UTF-8 general: convert "kernel" subdirectory to UTF-8 documentation: convert the Documentation directory to UTF-8 Convert the toplevel files CREDITS and MAINTAINERS to UTF-8. remove broken URLs from net drivers' output Magic number prefix consistency change to Documentation/magic-number.txt trivial: s/i_sem /i_mutex/ fix file specification in comments drivers/base/platform.c: fix small typo in doc misc doc and kconfig typos Remove obsolete fat_cvf help text Fix occurrences of "the the " Fix minor typoes in kernel/module.c Kconfig: Remove reference to external mqueue library Kconfig: A couple of grammatical fixes in arch/i386/Kconfig Correct comments in genrtc.c to refer to correct /proc file. Fix more "deprecated" spellos. Fix "deprecated" typoes. ... Fix trivial comment conflict in kernel/relay.c.
2007-05-09unify flush_work/flush_work_keventd and rename it to cancel_work_syncOleg Nesterov
flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq (this was possible from the very beginnig, I missed this). So we can unify flush_work_keventd and flush_work. Also, rename flush_work() to cancel_work_sync() and fix all callers. Perhaps this is not the best name, but "flush_work" is really bad. (akpm: this is why the earlier patches bypassed maintainers) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Auke Kok <auke-jan.h.kok@intel.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09ipvs: flush defense_work before module unloadOleg Nesterov
net/ipv4/ipvs/ip_vs_core.c module_exit ip_vs_cleanup ip_vs_control_cleanup cancel_rearming_delayed_work // done This is unsafe. The module may be unloaded and the memory may be freed while defense_work's handler is still running/preempted. Do flush_work(&defense_work.work) after cancel_rearming_delayed_work(). Alternatively, we could add flush_work() to cancel_rearming_delayed_work(), but note that we can't change cancel_delayed_work() in the same manner because it may be called from atomic context. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Fix occurrences of "the the "Michael Opdenacker
Signed-off-by: Michael Opdenacker <michael@free-electrons.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09Fix trivial typos in Kconfig* filesDavid Sterba
Fix several typos in help text in Kconfig* files. Signed-off-by: David Sterba <dave@jikos.cz> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-08header cleaning: don't include smp_lock.h when not usedRandy Dunlap
Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-03[TCP]: zero out rx_opt in tcp_disconnect()Srinivas Aji
When the server drops its connection, NFS client reconnects using the same socket after disconnecting. If the new connection's SYN,ACK doesn't contain the TCP timestamp option and the old connection's did, tp->tcp_header_len is recomputed assuming no timestamp header but tp->rx_opt.tstamp_ok remains set. Then tcp_build_and_update_options() adds in a timestamp option past the end of the allocated TCP header, overwriting TCP data, or when the data is in skb_shinfo(skb)->frags[], overwriting skb_shinfo(skb) causing a crash soon after. (The issue was debugged from such a crash.) Similarly, wscale_ok and sack_ok also get set based on the SYN,ACK packet but not reset on disconnect, since they are zeroed out at initialization. The patch zeroes out the entire tp->rx_opt struct in tcp_disconnect() to avoid this sort of problem. Signed-off-by: Srinivas Aji <Aji_Srinivas@emc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03[NET]: Rework dev_base via list_head (v3)Pavel Emelianov
Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03[TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_startIlpo Järvinen
Reuse limited slow-start (RFC3742) included into tcp_cong instead of having another implementation in High Speed TCP. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03[NETFILTER]: sip: Fix RTP address NATHerbert Xu
I needed to use this recently to talk to a Cisco server. In my case I only did SNAT while the Cisco server used a different address for RTP traffic than the one for SIP. I discovered that nf_nat_sip NATed the RTP address to the SIP one which was unnecessary but OK. However, in doing so it did not DNAT the destination address on the RTP traffic to the Cisco back to the original RTP address. This patch corrects this by noting down the RTP address and using it when the expectation fires. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03[NETFILTER]: nf_nat_proto_gre: do not modify/corrupt GREv0 packets through NATJorge Boncompte
While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack and nat modules to a 2.4.32 kernel I noticed that the gre_key function returns a wrong pointer to the GRE key of a version 0 packet thus corrupting the packet payload. The intended behaviour for GREv0 packets is to act like nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the offending functions (not used anymore) and modified the nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets. Signed-off-by: Jorge Boncompte <jorge@dti2.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03[NETFILTER]: ipt_DNAT: accept port randomization optionPatrick McHardy
Also accept the --random option for DNAT to allow randomly selecting a destination port from the given range. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03[TCP]: Delete unused header file net/ipv4/tcp_yeah.h.Robert P. J. Day
Delete the apparently unused header file net/ipv4/tcp_yeah.h. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[UDP]: Do not allow specific bind when wildcard bind exists.David S. Miller
When allocating local ports, do not allow a bind to a port with a specific local address when a bind to that port with a wildcard local address already exists. Noticed by Linus. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[IPV4] UDP: Fix endianness bugs in hashing changes.David S. Miller
I accidently applied an earlier version of Eric Dumazet's patch, from March 21st. His version from March 30th didn't have these bugs, so this just interdiffs to the correct patch. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[IPV4] SNMP: Support OutMcastPkts and OutBcastPktsMitsuru Chinen
A transmitted IP multicast datagram should be counted as OutMcastPkts. By the same token, a transmitted IP broadcast datagram should be counted as OutBcastPkts. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[IPV4] SNMP: Support InMcastPkts and InBcastPktsMitsuru Chinen
A received IP multicast datagram should be counted as InMcastPkts. By the same token, a received IP broadcast datagram should be counted as InBcastPkts. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[IPV4] SNMP: Support InTruncatedPktsMitsuru Chinen
An IP datagram which is being discarded because the datagram frame didn't carry enough data should be counted as InTruncatedPkts. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[IPV4] SNMP: Support InNoRoutesMitsuru Chinen
An IP datagram which is being discarded because of no routes in the forwarding path should be counted as InNoRoutes. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[TCP] FRTO: RFC4138 allows Nagle override when new data must be sentIlpo Järvinen
This is a corner case where less than MSS sized new data thingie is awaiting in the send queue. For F-RTO to work correctly, a new data segment must be sent at certain point or F-RTO cannot be used at all. RFC4138 allows overriding of Nagle at that point. Implementation uses frto_counter states 2 and 3 to distinguish when Nagle override is needed. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[TCP] FRTO: Delay skb available check until it's mandatoryIlpo Järvinen
No new data is needed until the first ACK comes, so no need to check for application limitedness until then. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[PATCH] INET : IPV4 UDP lookups converted to a 2 pass algoEric Dumazet
Some people want to have many UDP sockets, binded to a single port but many different addresses. We currently hash all those sockets into a single chain. Processing of incoming packets is very expensive, because the whole chain must be examined to find the best match. I chose in this patch to hash UDP sockets with a hash function that take into account both their port number and address : This has a drawback because we need two lookups : one with a given address, one with a wildcard (null) address. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-28[TCP]: Update references in two old commentsGerrit Renker
This updates references to drafts in comments which must be about 10 years old. Internet draft draft-ietf-tcpimpl-prob-03.txt expired in 1998 and was replaced by RFC 2525 in March 1999. Section 3.10 of the draft maps almost identically into section 2.17 of RFC 2525: both are entitled "Failure to RST on close with data pending", the differences in text body amount to a typo and minor sentence change. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: selinux: preserve boolean values across policy reloads selinux: change numbering of boolean directory inodes in selinuxfs selinux: remove unused enumeration constant from selinuxfs selinux: explicitly number all selinuxfs inodes selinux: export initial SID contexts via selinuxfs selinux: remove userland security class and permission definitions SELinux: move security_skb_extlbl_sid() out of the security server MAINTAINERS: update selinux entry SELinux: rename selinux_netlabel.h to netlabel.h SELinux: extract the NetLabel SELinux support from the security server NetLabel: convert a BUG_ON in the CIPSO code to a runtime check NetLabel: cleanup and document CIPSO constants
2007-04-27[IPV4] nl_fib_lookup: Initialise res.r before fib_res_put(&res)Sergey Vlasov
When CONFIG_IP_MULTIPLE_TABLES is enabled, the code in nl_fib_lookup() needs to initialize the res.r field before fib_res_put(&res) - unlike fib_lookup(), a direct call to ->tb_lookup does not set this field. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26NetLabel: convert a BUG_ON in the CIPSO code to a runtime checkPaul Moore
This patch changes a BUG_ON in the CIPSO code to a runtime check. It should also increase the readability of the code as it replaces an unexplained constant with a well defined macro. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>
2007-04-26NetLabel: cleanup and document CIPSO constantsPaul Moore
This patch collects all of the CIPSO constants and puts them in one place; it also documents each value explaining how the value is derived. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>
2007-04-25[IPV4] IP_GRE: Unify code path to get hash array index.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-04-25[IPV4] IPIP: Unify code path to get hash array index.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-04-25[IPV4]: Consolidate common SNMP codeHerbert Xu
This patch moves the SNMP code shared between IPv4/IPv6 from proc.c into net/ipv4/af_inet.c. This makes sense because these functions aren't specific to /proc. As a result we can again skip proc.o if /proc is disabled. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[IPV4]: Fix build without procfs.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>