aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2008-01-28[NETNS]: Should build with CONFIG_SYSCTL=nEric Dumazet
Previous NETNS patches broke CONFIG_SYSCTL=n case Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[ICMP]: Avoid sparse warnings in net/ipv4/icmp.cEric Dumazet
CHECK net/ipv4/icmp.c net/ipv4/icmp.c:249:13: warning: context imbalance in 'icmp_xmit_unlock' - unexpected unlock net/ipv4/icmp.c:376:13: warning: context imbalance in 'icmp_reply' - different lock contexts for basic block net/ipv4/icmp.c:430:6: warning: context imbalance in 'icmp_send' - different lock contexts for basic block Solution is to declare both icmp_xmit_lock() and icmp_xmit_unlock() as inline Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: prot_inuse cleanups and optimizationsEric Dumazet
1) Cleanups (all functions are prefixed by sock_prot_inuse) sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1) sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1) sock_prot_inuse() -> sock_prot_inuse_get() New functions : sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use. 2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto", since nobody wants to read the inuse value. This saves 1372 bytes on i386/SMP and some cpu cycles. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Perform setting of common control fields in one placeIlpo Järvinen
In case of segments which are purely for control without any data (SYN/ACK/FIN/RST), many fields are set to common values in multiple places. i386 results: $ gcc --version gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13) $ codiff tcp_output.o.old tcp_output.o.new net/ipv4/tcp_output.c: tcp_xmit_probe_skb | -48 tcp_send_ack | -56 tcp_retransmit_skb | -79 tcp_connect | -43 tcp_send_active_reset | -35 tcp_make_synack | -42 tcp_send_fin | -48 7 functions changed, 351 bytes removed net/ipv4/tcp_output.c: tcp_init_nondata_skb | +90 1 function changed, 90 bytes added tcp_output.o.mid: 8 functions changed, 90 bytes added, 351 bytes removed, diff: -261 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Urgent parameter effect can be simplified.Ilpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: cleanup tcp_parse_options deep indented switchIlpo Järvinen
Removed case indentation level & combined some nested ifs, mostly within 80 lines now. This is a leftover from indent patch, it just had to be done manually to avoid messing it up completely. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: Add some acquires/releases sparse annotations.Eric Dumazet
Add __acquires() and __releases() annotations to suppress some sparse warnings. example of warnings : net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong count at exit net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' - unexpected unlock Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Remove unnecessary local variableIlpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Code duplication removal, added tcp_bound_to_half_wnd()Ilpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: cleanup tcp_{in,out}put.c styleIlpo Järvinen
These were manually selected from indent's results which as is are too noisy to be of any use without human reason. In addition, some extra newlines between function and its comment were removed too. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: reduce tcp_output's indentation levels a bitIlpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Remove TCPCB_URG & TCPCB_AT_TAIL as unnecessaryIlpo Järvinen
The snd_up check should be enough. I suspect this has been there to provide a minor optimization in clean_rtx_queue which used to have a small if (!->sacked) block which could skip snd_up check among the other work. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Dropped unnecessary skb/sacked accessing in renegingIlpo Järvinen
SACK reneging can be precalculated to a FLAG in clean_rtx_queue which has the right skb looked up. This will help a bit in future because skb->sacked access will be changed eventually, changing it already won't hurt any. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Introduce tcp_wnd_end() to reduce line lengthsIlpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Rename update_send_head & include related increment to itIlpo Järvinen
There's very little need to have the packets_out incrementing in a separate function. Also name the combined function appropriately. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Make invariant check complain about invalid sacked_outIlpo Järvinen
Earlier resolution for NewReno's sacked_out should now keep it small enough for this to become invariant-like check. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[UDP]: Add memory accounting.Hideo Aoki
Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET] CORE: Introducing new memory accounting interface.Hideo Aoki
This patch introduces new memory accounting functions for each network protocol. Most of them are renamed from memory accounting functions for stream protocols. At the same time, some stream memory accounting functions are removed since other functions do same thing. Renaming: sk_stream_free_skb() -> sk_wmem_free_skb() __sk_stream_mem_reclaim() -> __sk_mem_reclaim() sk_stream_mem_reclaim() -> sk_mem_reclaim() sk_stream_mem_schedule -> __sk_mem_schedule() sk_stream_pages() -> sk_mem_pages() sk_stream_rmem_schedule() -> sk_rmem_schedule() sk_stream_wmem_schedule() -> sk_wmem_schedule() sk_charge_skb() -> sk_mem_charge() Removeing sk_stream_rfree(): consolidates into sock_rfree() sk_stream_set_owner_r(): consolidates into skb_set_owner_r() sk_stream_mem_schedule() The following functions are added. sk_has_account(): check if the protocol supports accounting sk_mem_uncharge(): do the opposite of sk_mem_charge() In addition, to achieve consolidation, updating sk_wmem_queued is removed from sk_mem_charge(). Next, to consolidate memory accounting functions, this patch adds memory accounting calls to network core functions. Moreover, present memory accounting call is renamed to new accounting call. Finally we replace present memory accounting calls with new interface in TCP and SCTP. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Move all calls to xfrm_audit_state_icvfail to xfrm_inputHerbert Xu
Let's nip the code duplication in the bud :) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Fix transport-mode async resume on intput without netfilterHerbert Xu
When netfilter is off the transport-mode async resumption doesn't work because we don't push back the IP header. This patch fixes that by moving most of the code outside of ifdef NETFILTER since the only part that's not common is the short-circuit in the protocol handler. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Remove seq_rtt ptr from clean_rtx_queue argsIlpo Järvinen
While checking Gavin's patch I noticed that the returned seq_rtt is not used by the caller. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Force TSO splits to MSS boundariesIlpo Järvinen
If snd_wnd - snd_nxt wasn't multiple of MSS, skb was split on odd boundary by the callers of tcp_window_allows. We try really hard to avoid unnecessary modulos. Therefore the old caller side check "if (skb->len < limit)" was too wide as well because limit is not bound in any way to skb->len and can cause spurious testing for trimming in the middle of the queue while we only wanted that to happen at the tail of the queue. A simple additional caller side check for tcp_write_queue_tail would likely have resulted 2 x modulos because the limit would have to be first calculated from window, however, doing that unnecessary modulo is not mandatory. After a minor change to the algorithm, simply determine first if the modulo is needed at all and at that point immediately decide also from which value it should be calculated from. This approach also kills some duplicated code. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Modify the neighbour table code so it handles multiple network ↵Eric W. Biederman
namespaces I'm actually surprised at how much was involved. At first glance it appears that the neighbour table data structures are already split by network device so all that should be needed is to modify the user interface commands to filter the set of neighbours by the network namespace of their devices. However a couple things turned up while I was reading through the code. The proxy neighbour table allows entries with no network device, and the neighbour parms are per network device (except for the defaults) so they now need a per network namespace default. So I updated the two structures (which surprised me) with their very own network namespace parameter. Updated the relevant lookup and destroy routines with a network namespace parameter and modified the code that interacts with users to filter out neighbour table entries for devices of other namespaces. I'm a little concerned that we can modify and display the global table configuration and from all network namespaces. But this appears good enough for now. I keep thinking modifying the neighbour table to have per network namespace instances of each table type would should be cleaner. The hash table is already dynamically sized so there are it is not a limiter. The default parameter would be straight forward to take care of. However when I look at the how the network table is built and used I still find some assumptions that there is only a single neighbour table for each type of table in the kernel. The netlink operations, neigh_seq_start, the non-core network users that call neigh_lookup. So while it might be doable it would require more refactoring than my current approach of just doing a little extra filtering in the code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[XFRM]: RFC4303 compliant auditingPaul Moore
This patch adds a number of new IPsec audit events to meet the auditing requirements of RFC4303. This includes audit hooks for the following events: * Could not find a valid SA [sections 2.1, 3.4.2] . xfrm_audit_state_notfound() . xfrm_audit_state_notfound_simple() * Sequence number overflow [section 3.3.3] . xfrm_audit_state_replay_overflow() * Replayed packet [section 3.4.3] . xfrm_audit_state_replay() * Integrity check failure [sections 3.4.4.1, 3.4.4.2] . xfrm_audit_state_icvfail() While RFC4304 deals only with ESP most of the changes in this patch apply to IPsec in general, i.e. both AH and ESP. The one case, integrity check failure, where ESP specific code had to be modified the same was done to the AH code for the sake of consistency. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Avoid two divides in __tcp_grow_window()Eric Dumazet
tcp_win_from_space() being signed, compiler might emit an integer divide to compute tcp_win_from_space()/2 . Using right shifts is OK here and less expensive. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Avoid a divide in tcp_mtu_probing()Eric Dumazet
tcp_mtu_to_mss() being signed, compiler might emit an integer divide to compute tcp_mtu_to_mss()/2 . Using a right shift is OK here and less expensive. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Move mss variable in tcp_mtu_probing()David S. Miller
Down into the only scope where it is used. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: tcp_write_timeout.c cleanupEric Dumazet
Before submiting a patch to change a divide to a right shift, I felt necessary to create a helper function tcp_mtu_probing() to reduce length of lines exceeding 100 chars in tcp_write_timeout(). Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[INET]: Avoid an integer divide in rt_garbage_collect()Eric Dumazet
Since 'goal' is a signed int, compiler may emit an integer divide to compute goal/2. Using a right shift is OK here and less expensive. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Convert several length variable to unsigned.YOSHIFUJI Hideaki
Several length variables cannot be negative, so convert int to unsigned int. This also allows us to do sane shift operations on those variables. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP] Avoid two divides in tcp_output.cEric Dumazet
Because 'free_space' variable in __tcp_select_window() is signed, expression (free_space / 2) forces compiler to emit an integer divide. This can be changed to a plain right shift, less expensive. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[XFRM] IPv6: Fix dst/routing check at transformation.Masahide NAKAMURA
IPv6 specific thing is wrongly removed from transformation at net-2.6.25. This patch recovers it with current design. o Update "path" of xfrm_dst since IPv6 transformation should care about routing changes. It is required by MIPv6 and off-link destined IPsec. o Rename nfheader_len which is for non-fragment transformation used by MIPv6 to rt6i_nfheader_len as IPv6 name space. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[TCP]: Fix TSO deferringIlpo Järvinen
I'd say that most of what tcp_tso_should_defer had in between there was dead code because of this. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[INET]: Uninline the inet_twsk_put function.Pavel Emelyanov
This one is not that big, but is widely used: saves 1200 bytes from net/ipv4/built-in.o add/remove: 1/0 grow/shrink: 1/12 up/down: 97/-1300 (-1203) function old new delta inet_twsk_put - 87 +87 __inet_lookup_listener 274 284 +10 tcp_sacktag_write_queue 2255 2254 -1 tcp_time_wait 482 411 -71 __inet_check_established 796 722 -74 tcp_v4_err 973 898 -75 __inet_twsk_kill 230 154 -76 inet_twsk_deschedule 180 103 -77 tcp_v4_do_rcv 462 384 -78 inet_hash_connect 686 607 -79 inet_twdr_do_twkill_work 236 150 -86 inet_twdr_twcal_tick 395 307 -88 tcp_v4_rcv 1744 1480 -264 tcp_timewait_state_process 975 644 -331 Export it for ipv6 module. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[INET]: Uninline the __inet_lookup_established function.Pavel Emelyanov
This is -700 bytes from the net/ipv4/built-in.o add/remove: 1/0 grow/shrink: 1/3 up/down: 340/-1040 (-700) function old new delta __inet_lookup_established - 339 +339 tcp_sacktag_write_queue 2254 2255 +1 tcp_v4_err 1304 973 -331 tcp_v4_rcv 2089 1744 -345 tcp_v4_do_rcv 826 462 -364 Exporting is for dccp module (used via e.g. inet_lookup). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[INET]: Uninline the __inet_hash function.Pavel Emelyanov
This one is used in quite many places in the networking code and seems to big to be inline. After the patch net/ipv4/build-in.o loses ~650 bytes: add/remove: 2/0 grow/shrink: 0/5 up/down: 461/-1114 (-653) function old new delta __inet_hash_nolisten - 282 +282 __inet_hash - 179 +179 tcp_sacktag_write_queue 2255 2254 -1 __inet_lookup_listener 284 274 -10 tcp_v4_syn_recv_sock 755 493 -262 tcp_v4_hash 389 35 -354 inet_hash_connect 1086 599 -487 This version addresses the issue pointed by Eric, that while being inline this function was optimized by gcc in respect to the 'listen_possible' argument. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Rename tunnel-mode functions to avoid collisions with tunnelsHerbert Xu
It appears that I've managed to create two different functions both called xfrm6_tunnel_output. This is because we have the plain tunnel encapsulation named xfrmX_tunnel as well as the tunnel-mode encapsulation which lives in the files xfrmX_mode_tunnel.c. This patch renames functions from the latter to use the xfrmX_mode_tunnel prefix to avoid name-space conflicts. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: Add CONFIG_NETFILTER_ADVANCED optionPatrick McHardy
The NETFILTER_ADVANCED option hides lots of the rather obscure netfilter options when disabled and provides defaults (M) that should allow to run a distribution firewall without further thinking. Defaults to 'y' to avoid breaking current configurations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: non-power-of-two jhash optimizationsPatrick McHardy
Apply Eric Dumazet's jhash optimizations where applicable. Quoting Eric: Thanks to jhash, hash value uses full 32 bits. Instead of returning hash % size (implying a divide) we return the high 32 bits of the (hash * size) that will give results between [0 and size-1] and same hash distribution. On most cpus, a multiply is less expensive than a divide, by an order of magnitude. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: Parenthesize macro parametersJan Engelhardt
Parenthesize macro parameters. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: Introduce nf_inet_addressJan Engelhardt
A few netfilter modules provide their own union of IPv4 and IPv6 address storage. Will unify that in this patch series. (1/4): Rename union nf_conntrack_address to union nf_inet_addr and move it to x_tables.h. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: x_tables: use %u format specifiersJan Engelhardt
Use %u format specifiers as ->family is unsigned. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: nf_nat: properly use RCU for ip_nat_decode_sessionPatrick McHardy
We need to use rcu_assign_pointer/rcu_dereference to avoid races. Also remove an obsolete CONFIG_IP_NAT_NEEDED ifdef. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: constify nf_afinfoPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: nf_log: constify struct nf_logger and nf_log_packet loginfo argPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: nf_log: move logging stuff to seperate headerPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: nf_nat: pass manip type instead of hook to nf_nat_setup_infoPatrick McHardy
nf_nat_setup_info gets the hook number and translates that to the manip type to perform. This is a relict from the time when one manip per hook could exist, the exact hook number doesn't matter anymore, its converted to the manip type. Most callers already know what kind of NAT they want to perform, so pass the maniptype in directly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: nf_nat: sprinkle a few __read_mostlysPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: nf_nat: mark NAT protocols constPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: nf_nat_proto_gre: add missing module referencePatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>