aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2008-03-06[UDP]: Revert udplite and code split.David S. Miller
This reverts commit db1ed684f6c430c4cdad67d058688b8a1b5e607c ("[IPV6] UDP: Rename IPv6 UDP files."), commit 8be8af8fa4405652e6c0797db5465a4be8afb998 ("[IPV4] UDP: Move IPv4-specific bits to other file.") and commit e898d4db2749c6052072e9bc4448e396cbdeb06a ("[UDP]: Allow users to configure UDP-Lite."). First, udplite is of such small cost, and it is a core protocol just like TCP and normal UDP are. We spent enormous amounts of effort to make udplite share as much code with core UDP as possible. All of that work is less valuable if we're just going to slap a config option on udplite support. It is also causing build failures, as reported on linux-next, showing that the changeset was not tested very well. In fact, this is the second build failure resulting from the udplite change. Finally, the config options provided was a bool, instead of a modular option. Meaning the udplite code does not even get build tested by allmodconfig builds, and furthermore the user is not presented with a reasonable modular build option which is particularly needed by distribution vendors. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05net: replace remaining __FUNCTION__ occurrencesHarvey Harrison
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05[IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid castsEric Dumazet
(Anonymous) unions can help us to avoid ugly casts. A common cast it the (struct rtable *)skb->dst one. Defining an union like : union { struct dst_entry *dst; struct rtable *rtable; }; permits to use skb->rtable in place. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/rc80211_pid_algo.c
2008-03-04[IPCONFIG]: The kernel gets no IP from some DHCP serversStephen Hemminger
From: Stephen Hemminger <shemminger@linux-foundation.org> Based upon a patch by Marcel Wappler: This patch fixes a DHCP issue of the kernel: some DHCP servers (i.e. in the Linksys WRT54Gv5) are very strict about the contents of the DHCPDISCOVER packet they receive from clients. Table 5 in RFC2131 page 36 requests the fields 'ciaddr' and 'siaddr' MUST be set to '0'. These DHCP servers ignore Linux kernel's DHCP discovery packets with these two fields set to '255.255.255.255' (in contrast to popular DHCP clients, such as 'dhclient' or 'udhcpc'). This leads to a not booting system. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04[ESP]: Add select on AUTHENCHerbert Xu
Now the ESP uses the AEAD interface even for algorithms which are not combined mode, we need to select CONFIG_CRYPTO_AUTHENC as otherwise only combined mode algorithms will work. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04[TCP]: TCP cubic v2.2Sangtae Ha
We have updated CUBIC to fix some issues with slow increase in large BDP networks. We also improved its convergence speed. The fix is in fact very simple -- the window increase limit of smax during the window probing phase (i.e., convex growth phase) is removed. We found that this does not affect TCP friendliness, but only improves its scalability. We have run some tests in our lab and also over the Internet path from NCSU to Japan. These results can be seen from the following page: http://netsrv.csc.ncsu.edu/wiki/index.php/Intra_protocol_fairness_testing_with_linux-2.6.23.9 http://netsrv.csc.ncsu.edu/wiki/index.php/RTT_fairness_testing_with_linux-2.6.23.9 http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_friendliness_testing_with_linux-2.6.23.9 Signed-off-by: Sangtae Ha <sha2@ncsu.edu> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04[IPV4] UDP: Move IPv4-specific bits to other file.YOSHIFUJI Hideaki
Move IPv4-specific UDP bits from net/ipv4/udp.c into (new) net/ipv4/udp_ipv4.c. Rename net/ipv4/udplite.c to net/ipv4/udplite_ipv4.c. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04[UDP]: Allow users to configure UDP-Lite.YOSHIFUJI Hideaki
Let's give users an option for disabling UDP-Lite (~4K). old: | text data bss dec hex filename | 286498 12432 6072 305002 4a76a net/ipv4/built-in.o | 193830 8192 3204 205226 321aa net/ipv6/ipv6.o new (without UDP-Lite): | text data bss dec hex filename | 284086 12136 5432 301654 49a56 net/ipv4/built-in.o | 191835 7832 3076 202743 317f7 net/ipv6/ipv6.o Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04[TCP]: Add IPv6 support to TCP SYN cookiesGlenn Griffin
Updated to incorporate Eric's suggestion of using a per cpu buffer rather than allocating on the stack. Just a two line change, but will resend in it's entirety. Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04[TCP]: lower stack usage in cookie_hash() functionEric Dumazet
400 bytes allocated on stack might be a litle bit too much. Using a per_cpu var is more friendly. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-03[ARP]: Introduce the arp_hdr_len helper.Pavel Emelyanov
There are some place, that calculate the ARP header length. These calculations are correct, but a) some operate with "magic" constants, b) enlarge the code length (sometimes at the cost of coding style), c) are not informative from the first glance. The proposal is to introduce a helper, that includes all the good sides of these calculations. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03[TCP]: Must count fack_count also when skippingIlpo Järvinen
It makes fackets_out to grow too slowly compared with the real write queue. This shouldn't cause those BUG_TRAP(packets <= tp->packets_out) to trigger but how knows how such inconsistent fackets_out affects here and there around TCP when everything is nowadays assuming accurate fackets_out. So lets see if this silences them all. Reported by Guillaume Chazarain <guichaz@gmail.com>. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03[TCP]: Merge exit paths in tcp_v4_conn_request.Denis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03[IPV4]: skb->dst can't be NULL in ip_options_echo.Denis V. Lunev
ip_options_echo is called on the packet input path after the initial routing. The dst entry on the packet is cleared only in the several very specific places and immidiately assigned back (may be new). Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[ICMP]: Section conflict between icmp_sk_init/icmp_sk_exit.Denis V. Lunev
Functions from __exit section should not be called from ones in __init section. Fix this conflict. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack.Denis V. Lunev
It looks like dst parameter is used in this API due to historical reasons. Actually, it is really used in the direct call to tcp_v4_send_synack only. So, create a wrapper for tcp_v4_send_synack and remove dst from rtx_syn_ack. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[NETFILTER/RXRPC]: Don't use seq_release_private where inappropriate.Pavel Emelyanov
Some netfilter code and rxrpc one use seq_open() to open a proc file, but seq_release_private to release one. This is harmless, but ambiguous. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[NETNS]: Make icmp_sk per namespace.Denis V. Lunev
All preparations are done. Now just add a hook to perform an initialization on namespace startup and replace icmp_sk macro with proper inline call. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[NETNS]: icmp(v6)_sk should not pin a namespace.Denis V. Lunev
So, change icmp(v6)_sk creation/disposal to the scheme used in the netlink for rtnl, i.e. create a socket in the context of the init_net and assign the namespace without getting a referrence later. Also use sk_release_kernel instead of sock_release to properly destroy such sockets. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[ICMP]: Allocate data for __icmp(v6)_sk dynamically.Denis V. Lunev
Own __icmp(v6)_sk should be present in each namespace. So, it should be allocated dynamically. Though, alloc_percpu does not fit the case as it implies additional dereferrence for no bonus. Allocate data for pointers just like __percpu_alloc_mask does and place pointers to struct sock into this array. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[ICMP]: Pass proper ICMP socket into icmp(v6)_xmit_(un)lock.Denis V. Lunev
We have to get socket lock inside icmp(v6)_xmit_lock/unlock. The socket is get from global variable now. When this code became namespaces, one should pass a namespace and get socket from it. Though, above is useless. Socket is available in the caller, just pass it inside. This saves a bit of code now and saves more later. add/remove: 0/0 grow/shrink: 1/3 up/down: 1/-169 (-168) function old new delta icmp_rcv 718 719 +1 icmpv6_rcv 2343 2303 -40 icmp_send 1566 1518 -48 icmp_reply 549 468 -81 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[ICMP]: Store sock rather than socket for ICMP flow control.Denis V. Lunev
Basically, there is no difference, what to store: socket or sock. Though, sock looks better as there will be 1 less dereferrence on the fast path. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[ICMP]: Optimize icmp_socket usage.Denis V. Lunev
Use this macro only once in a function to save a bit of space. add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-98 (-98) function old new delta icmp_reply 562 561 -1 icmp_push_reply 305 258 -47 icmp_init 273 223 -50 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[ICMP]: Add return code to icmp_init.Denis V. Lunev
icmp_init could fail and this is normal for namespace other than initial. So, the panic should be triggered only on init_net initialization path. Additionally create rollback path for icmp_init as a separate function. It will also be used later during namespace destruction. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[INET]: Remove struct net_proto_family* from _init calls.Denis V. Lunev
struct net_proto_family* is not used in icmp[v6]_init, ndisc_init, igmp_init and tcp_v4_init. Remove it. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[TCP]: BIC web page link is corrected.Sangtae Ha
Signed-off-by: Sangtae Ha <sha2@ncsu.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[NETNS]: Process inet_select_addr inside a namespace.Denis V. Lunev
The context is available from a network device passed in. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[NETNS]: Enable IPv4 address manipulations inside namespace.Denis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[NETNS]: Enable all routing manipulation via netlink inside namespace.Denis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[NETNS]: Process devinet ioctl in the correct namespace.Denis V. Lunev
Add namespace parameter to devinet_ioctl and locate device inside it for state changes. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[NETNS]: Register /proc/net/rt_cache for each namespace.Denis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[NETNS]: Process /proc/net/rt_cache inside a namespace.Denis V. Lunev
Show routing cache for a particular namespace only. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[IPV4]: rt_cache_get_next should take rt_genid into account.Denis V. Lunev
In the other case /proc/net/rt_cache will look inconsistent in respect to genid. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[NETNS]: Process ip_rt_redirect in the correct namespace.Denis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[NETNS]: Enable inetdev_event notifier.Denis V. Lunev
After all these preparations it is time to enable main IPv4 device initialization routine inside namespace. It is safe do this now. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[NETNS]: Disable multicaststing configuration inside non-initial namespace.Denis V. Lunev
Do not calls hooks from device notifiers and disallow configuration from ioctl/netlink layer. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[NETFILTER]: Consolidate masq_inet_event and masq_device_event.Denis V. Lunev
They do exactly the same job. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[IPV4]: Use proc_create() to setup ->proc_fops firstWang Chen
Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to main tree. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28[IPCOMP]: Disable BH on output when using shared tfmHerbert Xu
Because we use shared tfm objects in order to conserve memory, (each tfm requires 128K of vmalloc memory), BH needs to be turned off on output as that can occur in process context. Previously this was done implicitly by the xfrm output code. That was lost when it became lockless. So we need to add the BH disabling to IPComp directly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-26[INET]: Don't create tunnels with '%' in name.Pavel Emelyanov
Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a pre-defined name for a device from the userspace. Since these drivers call the register_netdevice() (rtnl_lock, is held), which does _not_ generate the device's name, this name may contain a '%' character. Not sure how bad is this to have a device with a '%' in its name, but all the other places either use the register_netdev(), which call the dev_alloc_name(), or explicitly call the dev_alloc_name() before registering, i.e. do not allow for such names. This had to be prior to the commit 34cc7b, but I forgot to number the patches and this one got lost, sorry. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-26[IPV4]: Reset scope when changing addressBjorn Mork
This bug did bite at least one user, who did have to resort to rebooting the system after an "ifconfig eth0 127.0.0.1" typo. Deleting the address and adding a new is a less intrusive workaround. But I still beleive this is a bug that should be fixed. Some way or another. Another possibility would be to remove the scope mangling based on address. This will always be incomplete (are 127/8 the only address space with host scope requirements?) We set the scope to RT_SCOPE_HOST if an IPv4 interface is configured with a loopback address (127/8). The scope is never reset, and will remain set to RT_SCOPE_HOST after changing the address. This patch resets the scope if the address is changed again, to restore normal functionality. Signed-off-by: Bjorn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-23[IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.Pavel Emelyanov
Use the added dev_alloc_name() call to create tunnel device name, rather than iterate in a hand-made loop with an artificial limit. Thanks Patrick for noticing this. [ The way this works is, when the device is actually registered, the generic code noticed the '%' in the name and invokes dev_alloc_name() to fully resolve the name. -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19[NETFILTER]: Fix incorrect use of skb_make_writableJoonwoo Park
http://bugzilla.kernel.org/show_bug.cgi?id=9920 The function skb_make_writable returns true or false. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19[NETFILTER]: {ip,ip6,nfnetlink}_queue: fix SKB_LINEAR_ASSERT when mangling ↵Patrick McHardy
packet data As reported by Tomas Simonaitis <tomas.simonaitis@gmail.com>, inserting new data in skbs queued over {ip,ip6,nfnetlink}_queue triggers a SKB_LINEAR_ASSERT in skb_put(). Going back through the git history, it seems this bug is present since at least 2.6.12-rc2, probably even since the removal of skb_linearize() for netfilter. Linearize non-linear skbs through skb_copy_expand() when enlarging them. Tested by Thomas, fixes bugzilla #9933. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19ipv4/fib_hash.c: fix NULL dereferenceAdrian Bunk
Unless I miss a guaranteed relation between between "f" and "new_fa->fa_info" this patch is required for fixing a NULL dereference introduced by commit a6501e080c318f8d4467679d17807f42b3a33cd5 ("[IPV4] FIB_HASH: Reduce memory needs and speedup lookups") and spotted by the Coverity checker. Eric Dumazet says: Hum, you are right, kmem_cache_free() doesnt allow a NULL object, like kfree() does. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17[TCP]: Fix tcp_v4_send_synack() commentKris Katterjohn
Signed-off-by: Kris Katterjohn <katterjohn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17[IPV4]: fix alignment of IP-Config outputUwe Kleine-Koenig
Make the indented lines aligned in the output (not in the code). Signed-off-by: Uwe Kleine-Koenig <Uwe.Kleine-Koenig@digi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17Revert "[NDISC]: Fix race in generic address resolution"David S. Miller
This reverts commit 69cc64d8d92bf852f933e90c888dfff083bd4fc9. It causes recursive locking in IPV6 because unlike other neighbour layer clients, it even needs neighbour cache entries to send neighbour soliciation messages :-( We'll have to find another way to fix this race. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-13[INET]: Unexport inet_listen_wlockAdrian Bunk
This patch removes the no longer used EXPORT_SYMBOL(inet_listen_wlock). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>