aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2008-01-28[NETNS][FRAGS]: Isolate the secret interval from namespaces.Pavel Emelyanov
Since we have one hashtable to lookup the fragment, having different secret_interval-s for hash rebuild doesn't make sense, so move this one to inet_frags. The inet_frags_ctl becomes empty after this, so remove it. The appropriate ctl table is kept read-only in namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][FRAGS]: Make thresholds work in namespaces.Pavel Emelyanov
This is the same as with the timeout variable. Currently, after exceeding the high threshold _all_ the fragments are evicted, but it will be fixed in later patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][FRAGS]: Make the net.ipv4.ipfrag_timeout work in namespaces.Pavel Emelyanov
Move it to the netns_frags, adjust the usage and make the appropriate ctl table writable. Now fragment, that live in different namespaces can live for different times. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][FRAGS]: Duplicate sysctl tables for new namespaces.Pavel Emelyanov
Each namespace has to have own tables to tune their different parameters, so duplicate the tables and register them. All the tables in sub-namespaces are temporarily made read-only. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][FRAGS]: Make the mem counter per-namespace.Pavel Emelyanov
This is also simple, but introduces more changes, since then mem counter is altered in more places. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][FRAGS]: Make the nqueues counter per-namespace.Pavel Emelyanov
This is simple - just move the variable from struct inet_frags to struct netns_frags and adjust the usage appropriately. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces.Pavel Emelyanov
Since fragment management code is consolidated, we cannot have the pointer from inet_frag_queue to struct net, since we must know what king of fragment this is. So, I introduce the netns_frags structure. This one is currently empty, but will be eventually filled with per-namespace attributes. Each inet_frag_queue is tagged with this one. The conntrack_reasm is not "netns-izated", so it has one static netns_frags instance to keep working in init namespace. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][FRAGS]: Move ctl tables around.Pavel Emelyanov
This is a preparation for sysctl netns-ization. Move the ctl tables to the files, where the tuning variables reside. Plus make the helpers to register the tables. This will simplify the later patches and will keep similar things closer to each other. ipv4, ipv6 and conntrack_reasm are patched differently, but the result is all the tables are in appropriate files. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV4] UDP,UDPLITE: Sparse: {__udp4_lib,udp,udplite}_err() are of void.YOSHIFUJI Hideaki
Fix following sparse warnings: | net/ipv4/udp.c:421:2: warning: returning void-valued expression | net/ipv4/udplite.c:38:2: warning: returning void-valued expression Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28[NETNS]: Pass correct namespace in ip_rt_get_source.Denis V. Lunev
ip_rt_get_source is the infamous place for which dst_ifdown kludges have been implemented. This means that rt->u.dst.dev can be safely dereferrenced obtain nd_net. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Pass correct namespace in ip_route_input_slow.Denis V. Lunev
The packet on the input path always has a referrence to an input network device it is passed from. Extract network namespace from it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Pass correct namespace in context fib_check_nh.Denis V. Lunev
Correct network namespace is already used in fib_check_nh. Re-work its usage for better readability and pass into fib_lookup & inetdev_by_index. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Pass correct namespace in fib_validate_source.Denis V. Lunev
Correct network namespace is available inside fib_validate_source. It can be obtained from the device passed in. The device is not NULL as in_device is obtained from it just above. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Add netns parameter to inetdev_by_index.Denis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Add netns parameter to fib_lookup.Denis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV4]: ipmr sparse warningsStephen Hemminger
Get rid of some of the sparse warnings. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV4]: igmp sparse warningsStephen Hemminger
Partial sparse warning fix. The other conditional locking is too much for sparse to handle. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV4]: Enable use of 240/4 address space.Jan Engelhardt
This short patch modifies the IPv4 networking to enable use of the 240.0.0.0/4 (aka "class-E") address space as propsed in the internet draft draft-fuller-240space-00.txt. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Process FIB rule action in the context of the namespace.Denis V. Lunev
Save namespace context on the fib rule at the rule creation time and call routing lookup in the correct namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: FIB rules API cleanup.Denis V. Lunev
Remove struct net from fib_rules_register(unregister)/notify_change paths and diet code size a bit. add/remove: 0/0 grow/shrink: 10/12 up/down: 35/-100 (-65) function old new delta notify_rule_change 273 280 +7 trie_show_stats 471 475 +4 fn_trie_delete 473 477 +4 fib_rules_unregister 144 148 +4 fib4_rule_compare 119 123 +4 resize 2842 2845 +3 fn_trie_select_default 515 518 +3 inet_sk_rebuild_header 836 838 +2 fib_trie_seq_show 764 766 +2 __devinet_sysctl_register 276 278 +2 fn_trie_lookup 1124 1123 -1 ip_fib_check_default 133 131 -2 devinet_conf_sysctl 223 221 -2 snmp_fold_field 126 123 -3 fn_trie_insert 2091 2086 -5 inet_create 876 870 -6 fib4_rules_init 197 191 -6 fib_sync_down 452 444 -8 inet_gso_send_check 334 325 -9 fib_create_info 3003 2991 -12 fib_nl_delrule 568 553 -15 fib_nl_newrule 883 852 -31 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[FIB]: Add netns to fib_rules_ops.Denis V. Lunev
The backward link from FIB rules operations to the network namespace will allow to simplify the API a bit. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Namespace stop vs 'ip r l' race.Denis V. Lunev
During network namespace stop process kernel side netlink sockets belonging to a namespace should be closed. They should not prevent namespace to stop, so they do not increment namespace usage counter. Though this counter will be put during last sock_put. The raplacement of the correct netns for init_ns solves the problem only partial as socket to be stoped until proper stop is a valid netlink kernel socket and can be looked up by the user processes. This is not a problem until it resides in initial namespace (no processes inside this net), but this is not true for init_net. So, hold the referrence for a socket, remove it from lookup tables and only after that change namespace and perform a last put. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Consolidate kernel netlink socket destruction.Denis V. Lunev
Create a specific helper for netlink kernel socket disposal. This just let the code look better and provides a ground for proper disposal inside a namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Memory leak on network namespace stop.Denis V. Lunev
Network namespace allocates 2 kernel netlink sockets, fibnl & rtnl. These sockets should be disposed properly, i.e. by sock_release. Plain sock_put is not enough. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][DST] dst: pass the dst_ops as parameter to the gc functionsDaniel Lezcano
The garbage collection function receive the dst_ops structure as parameter. This is useful for the next incoming patchset because it will need the dst_ops (there will be several instances) and the network namespace pointer (contained in the dst_ops). The protocols which do not take care of the namespaces will not be impacted by this change (expect for the function signature), they do just ignore the parameter. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV4] FIB_HASH: Reduce memory needs and speedup lookupsEric Dumazet
Currently, sizeof(struct fib_alias) is 24 or 48 bytes on 32/64 bits arches. Because of SLAB_HWCACHE_ALIGN requirement, these are rounded to 32 and 64 bytes respectively. This patch moves rcu to the end of fib_alias, and conditionally defines it only for CONFIG_IP_FIB_TRIE. We also remove SLAB_HWCACHE_ALIGN requirement for fib_alias and fib_node objects because it is not necessary. (BTW SLUB currently denies it for objects smaller than cache_line_size() / 2, but not SLAB) Finally, sizeof(fib_alias) go back to 16 and 32 bytes. Then, we can embed one fib_alias on each fib_node, to favor locality. Most of the time access to the fib_alias will be free because one cache line contains both the list head (fn_alias) and (one of) the list element. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[FIB]: Fix rcu_dereference() abuses in fib_trie.cEric Dumazet
node_parent() and tnode_get_child() currently use rcu_dereference(). These functions are called from both - readers only paths (where rcu_dereference() is needed), and - writer path (where rcu_dereference() is not needed) To make explicit where rcu_dereference() is really needed, I introduced new node_parent_rcu() and tnode_get_child_rcu() functions which use rcu_dereference(), while node_parent() and tnode_get_child() dont use it. Then I changed calling sites where rcu_dereference() was really needed to call the _rcu() variants. This should have no impact but for alpha architecture, and may help future sparse checks. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: nf_conntrack: make print_conntrack function optional for l4protosPatrick McHardy
Allows to remove five empty implementations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: nf_conntrack: remove print_conntrack function from l3protosPatrick McHardy
Its unused and unlikely to ever be used. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: kill nf_sysctl.cPatrick McHardy
Since there now is generic support for shared sysctl paths, the only remains are the net/netfilter and net/ipv4/netfilter paths. Move them to net/netfilter/core.c and net/ipv4/netfilter.c and kill nf_sysctl.c. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: ipt_REJECT: properly handle IP optionsDenys Vlasenko
The current TCP RST construction reuses the old packet and can't deal with IP options as a consequence of that. Construct the RST from scratch instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: {ip,ip6}_tables: remove some inlinesDenys Vlasenko
This patch removes inlines except those which are used by packet matching code and thus are performance-critical. Before: $ size */*/*/ip*tables*.o text data bss dec hex filename 6402 500 16 6918 1b06 net/ipv4/netfilter/ip_tables.o 7130 500 16 7646 1dde net/ipv6/netfilter/ip6_tables.o After: $ size */*/*/ip*tables*.o text data bss dec hex filename 6307 500 16 6823 1aa7 net/ipv4/netfilter/ip_tables.o 7010 500 16 7526 1d66 net/ipv6/netfilter/ip6_tables.o Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: Rename ipt_iprange to xt_iprangeJan Engelhardt
This patch moves ipt_iprange to xt_iprange, in preparation for adding IPv6 support to xt_iprange. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: Update modules' descriptionsJan Engelhardt
Updates the MODULE_DESCRIPTION() tags for all Netfilter modules, actually describing what the module does and not just "netfilter XYZ target". Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: remove ipt_TOS.cJan Engelhardt
Commit 88c85d81f74f92371745158aebc5cbf490412002 forgot to remove the old ipt_TOS file (whose code has been merged into xt_DSCP). Remove it now. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: Remove some EXPERIMENTAL dependenciesPatrick McHardy
Most of the netfilter modules are not considered experimental anymore, the only ones I want to keep marked as EXPERIMENTAL are: - TCPOPTSTRIP target, which is brand new. - SANE helper, which is quite new. - CLUSTERIP target, which I believe hasn't had much testing despite being in the kernel for quite a long time. - SCTP match and conntrack protocol, which are a mess and need to be reviewed and cleaned up before I would trust them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV4]: fib hash|trie initializationStephen Hemminger
Initialization of the slab cache's should be done when IP is initialized to make sure of available memory, and that code can be marked __init. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV4] fib_trie: size and statisticsStephen Hemminger
Show number of entries in trie, the size field was being set but never used, but it only counted leaves, not all entries. Refactor the two cases in fib_triestat_seq_show into a single routine. Note: the stat structure was being malloc'd but the stack usage isn't so high (288 bytes) that it is worth the additional complexity. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[FIB]: Avoid using static variables without proper lockingEric Dumazet
fib_trie_seq_show() uses two helper functions, rtn_scope() and rtn_type() that can write to static storage without locking. Just pass to them a temporary buffer to avoid potential corruption (probably not triggerable but still...) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Process inet_confirm_addr in the correct namespace.Denis V. Lunev
inet_confirm_addr can be called with NULL in_dev from arp_ignore iff scope is RT_SCOPE_LINK. Lets always pass the device and check for RT_SCOPE_LINK scope inside inet_confirm_addr. This let us take network namespace from in_device a need for an additional argument. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV4]: Remove extra argument from arp_ignore.Denis V. Lunev
arp_ignore has two arguments: dev & in_dev. dev is used for inet_confirm_addr calling only. inet_confirm_addr, in turn, either gets in_dev from the device passed or iterates over all network devices if the device passed is NULL. It seems logical to directly pass in_dev into inet_confirm_addr. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Make arp code network namespace consistent.Denis V. Lunev
Some calls in the arp.c have network namespace as an argument. Getting init_net inside these functions is simply inconsistent. Fix this. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[ARP]: Move inet_addr_type call after simple error checks in arp_contructor.Denis V. Lunev
The neighbour entry will be destroyed in the case of error, so it is pointless to perform constly routing table lookup in this case. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][RAW]: Create the /proc/net/raw(6) in each namespace.Pavel Emelyanov
To do so, just register the proper subsystem and create files in ->init callbacks. No other special per-namespace handling for raw sockets is required. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][RAW]: Eliminate explicit init_net references.Pavel Emelyanov
Happily, in all the rest places (->bind callbacks only), that require the struct net, we have a socket, so get the net from it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][RAW]: Make /proc/net/raw(6) show per-namespace socket list.Pavel Emelyanov
Pull the struct net pointer up to the showing functions to filter the sockets depending on their namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][RAW]: Make ipv[46] raw sockets lookup namespaces aware.Pavel Emelyanov
This requires just to pass the appropriate struct net pointer into __raw_v[46]_lookup and skip sockets that do not belong to a needed namespace. The proper net is get from skb->dev in all the cases. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[FIB]: full_children & empty_children should be uint, not ushortEric Dumazet
If declared as unsigned short, these fields can overflow, and whole trie logic is broken. I could not make the machine crash, but some tnode can never be freed. Note for 64 bit arches : By reordering t_key and parent in [node, leaf, tnode] structures, we can use 32 bits hole after t_key so that sizeof(struct tnode) doesnt change after this patch. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV4] fib_trie: removes a memset() call in tnode_new()Eric Dumazet
tnode_alloc() already clears allocated memory, using kcalloc() or alloc_pages(GFP_KERNEL|__GFP_ZERO, ...) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV4] FIB: Include nexthop device indexes in fib_info hashfn.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>