aboutsummaryrefslogtreecommitdiff
path: root/net/dccp
AgeCommit message (Collapse)Author
2007-07-10loss_interval: Fix timeval initialisationIan McDonald
When compiling with EXTRA_CFLAGS=-W noticed that tstamp is not initialised correctly in dccp_li_calc_first_li. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2007-07-10Fix dccp_sum_coverageIan McDonald
When compiling with EXTRA_CFLAGS=-W notice that we have signed/unsigned issue in dccp.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2007-07-10ccid3: Update copyrightsIan McDonald
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-06-03[NET]: Fix comparisons of unsigned < 0.Bill Nottingham
Recent gcc versions emit warnings when unsigned variables are compared < 0 or >= 0. Signed-off-by: Bill Nottingham <notting@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[XFRM]: Allow packet drops during larval state resolution.David S. Miller
The current IPSEC rule resolution behavior we have does not work for a lot of people, even though technically it's an improvement from the -EAGAIN buisness we had before. Right now we'll block until the key manager resolves the route. That works for simple cases, but many folks would rather packets get silently dropped until the key manager resolves the IPSEC rules. We can't tell these folks to "set the socket non-blocking" because they don't have control over the non-block setting of things like the sockets used to resolve DNS deep inside of the resolver libraries in libc. With that in mind I coded up the patch below with some help from Herbert Xu which provides packet-drop behavior during larval state resolution, controllable via sysctl and off by default. This lays the framework to either: 1) Make this default at some point or... 2) Move this logic into xfrm{4,6}_policy.c and implement the ARP-like resolution queue we've all been dreaming of. The idea would be to queue packets to the policy, then once the larval state is resolved by the key manager we re-resolve the route and push the packets out. The packets would timeout if the rule didn't get resolved in a certain amount of time. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[DCCP]: Fix build warning when debugging is disabled.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[DCCP]: Use menuconfig objects.Jan Engelhardt
Use menuconfigs instead of menus, so the whole menu can be disabled at once instead of going through all options. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET]: SPIN_LOCK_UNLOCKED cleanup in drivers/atm, netMilind Arun Choudhary
SPIN_LOCK_UNLOCKED cleanup,use __SPIN_LOCK_UNLOCKED instead Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[DCCP]: Debug statements for Elapsed Time optionGerrit Renker
This prints the value of the parsed Elapsed Time when received via a Timestamp Echo option [RFC 4342, 13.3]. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[DCCP]: Fix bug in the calculation of very low sending ratesGerrit Renker
This fixes an error in the calculation of t_ipi when X converges towards very low sending rates (between 1 and 64 bytes per second). Although this case may not sound likely, it can be reproduced by connecting, hitting enter (1 byte sent) and waiting for some time, during which the nofeedback timer halves the sending rate until finally it reaches the region 1..64 bytes/sec. Computing X is handled correctly (tested separately); but by dividing X _before_ entering the calculation of t_ipi, X becomes zero as a result. This in turn triggers a BUG condition caught in scaled_div(). Fixed by replacing with equivalent statement and explicit typecast for good measure. Calculation verified and effect of patch tested - reduced never below 1 byte per 64 seconds afterwards, i.e. not allowing divide-by-zero. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[CCID3]: Use initial RTT sample from SYN exchangeGerrit Renker
The patch follows the following recommendation made in an erratum to RFC 4342: "Senders MAY additionally make use of other available RTT measurements, including those from the initial Request-Response packet exchange." It implements larger initial windows with regard to this inital RTT measurement, using the mechanism suggested in draft-ietf-dccp-rfc3448bis, section 4.2. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[DCCP]: Sample RTT from SYN exchangeGerrit Renker
Function:
2007-04-25[CCID3]: Use function for RTT samplingGerrit Renker
This replaces the existing occurrences of RTT sampling with the use of the new function dccp_sample_rtt. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[DCCP]: Provide function for RTT samplingGerrit Renker
A recurring problem, in particular in the CCID code, is that RTT samples from packets with timestamp echo and elapsed time options need to be taken. This service is provided via a new function dccp_sample_rtt in this patch. Furthermore, to protect against `insane' RTT samples, the sampled value is bounded between 100 microseconds and 4 seconds - for which u32 is sufficient. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[CCID3]: Handle Idle and Application-Limited periodsGerrit Renker
This updates the code with regard to handling idle and application-limited periods as specified in [RFC 4342, 5.1]. Background:
2007-04-25[CCID3]: Wrap computation of RFC3390-initial rate into separate functionGerrit Renker
The CCID 3 and TFRC specs (RFC 4342, RFC 3448, draft-3448bis) make frequent reference to the computation of the RFC-3390 initial sending rate: 1. Initial sending rate when RTT is known (RFC 4342, p. 6) 2. Response to Idle/Application-Limited periods (RFC 4342, 5.1) This warrants putting the code into its own function, for later code reuse. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[CCID3]: Remove build warnings for 64bitGerrit Renker
This clears the following sparc64 build warnings: 1) warning: format "%ld" expects type "long int", but argument 3 has type "suseconds_t" 2) warning: format "%llu" expects type "long long unsigned int", but argument 3 has type "__u64" Fixed by using typecast to unsigned. This is argued to be safe, since the quantities, after de-scaling (factor 2^6) fit all in u32. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[CCID3]: More to see in dccp_probeGerrit Renker
This adds a few more fields of interest to /proc/net/dccpprobe, the following output ensues: 1 2 3 4 5 6 7 8 9 10 11 sec.usec src:sport dst:dport size s rtt p X_calc X_recv X t_ipi Also made the formatting consistent. Scripts that go with this can be downloaded from http://139.133.210.30/users/gerrit/dccp/dccp_probe/ Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[DCCP]: More debug information for dccp_wait_for_ccidGerrit Renker
This adds more detail in the wait_for_ccid packet scheduling loop. In particular, it informs about (i) when delay is used and (ii) why a packet is discarded. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[DCCP]: Always use debug-toggle parametersGerrit Renker
Currently debugging output (when configured) is automatically enabled when DCCP modules are compiled into the kernel rather than built as loadable modules. This is not necessary, since the module parameters in this case become kernel commandline parameters, e.g. DCCP or CCID3 debug output can be enabled for a static build by appending the following at the boot prompt: dccp.dccp_debug=1 dccp_ccid3.ccid3_debug=1 This patch therefore does away with the more complicated way of always enabling debug output for static builds Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[CCID3]: Remove race condition and update t_ipi when `s' changesGerrit Renker
This: 1. removes a race condition in the access to the scheduled send time t_nom which results from allowing asynchronous r/w access to t_nom without locks; 2. updates the inter-packet interval t_ipi = s/X when `s' changes, following a suggestion by Ian McDonald. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[CCID3]: More verbose debuggingIan McDonald
This adds a few debugging statements to ccid3.c Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[CCID3]: Fix use of invalid loss intervalsIan McDonald
This fixes a bug which uses an invalid comparison. The bug resulted in the use of invalid loss intervals. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[CCID3]: Use MSS for larger initial windowsGerrit Renker
This improves the slow-start phase by using the MSS (as suggested in RFC 4342, sec. 5) instead of the packet size s. Also figured out that __u32 is ample resource enough. After applying, I got the following in the logs: ccid3_hc_tx_packet_recv: client(f7421700), s=6, MSS=1424, w_init=4380, R_sample=176us, X=24886363 Had the previous variant been used, w_init would have been as low as 24. Committer note: removed unneeded cast to unsigned long long that was causing a compiler warning on 64bit architectures. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[CCID3]: Re-order CCID 3 source fileGerrit Renker
No code change at all. This splits ccid3.c into a RX and a TX section, so that the file has an organisation similar to the other ones (e.g. packet_history.{h,c}). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[CCID3]: Remove redundant `len' testGerrit Renker
Since CCID3 avoids sending 0-byte data packets (cf. ccid3_hc_tx_send_packet), testing for zero-payload length, as performed by ccid3_hc_tx_update_s, is redundant - hence removed by this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[DCCP]: Remove ambiguity in the way before48 is usedGerrit Renker
This removes two ambiguities in employing the new definition of before48, following the analysis on http://www.mail-archive.com/dccp@vger.kernel.org/msg01295.html (1) Updating GSR when P.seqno >= S.SWL With the old definition we did not update when P.seqno and S.SWL are 2^47 apart. To ensure the same behaviour as with the old definition, this is replaced with the equivalent condition dccp_delta_seqno(S.SWL, P.seqno) >= 0 (2) Sending SYNC when P.seqno >= S.OSR Here it is debatable whether the new definition causes an ambiguity: the case is similar to (1); and to have consistency with the case (1), we use the equivalent condition dccp_delta_seqno(S.OSR, P.seqno) >= 0 Detailed Justification
2007-04-25[DCCP]: Fix for follows48Gerrit Renker
The follows48 relation identifies whether 48-bit sequence number x is the direct successor of y. Currently, it does not handle cases of the following type correctly: follows48(0x(prefix)10000LL, 0x(prefix)0FFFFLL) where prefix is an arbitrary hex sequence of up to 7 digits. This is fixed by reusing the new dccp_delta_seqno function. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[DCCP]: Make `before' relation unambiguousGerrit Renker
Problem:
2007-04-25[DCCP]: Make dccp_delta_seqno return signed numbersGerrit Renker
Problem:
2007-04-25[DCCP]: 48-bit sequence number arithmeticGerrit Renker
This patch * organizes the sequence arithmetic functions into one corner of dccp.h * performs a small modification of dccp_set_seqno to make it more widely reusable (now it is safe to use any number, since it performs modulo-2^48 assignment) * adds functions and generic macros for 48-bit sequence arithmetic: --48 bit complement --modulo-48 addition and modulo-48 subtraction --dccp_inc_seqno now a special case of add48 Constants renamed following a suggestion by Arnaldo. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce icmp_hdr(), remove skb->h.icmphArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6hArnaldo Carvalho de Melo
Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iphArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce skb_network_header()Arnaldo Carvalho de Melo
For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[TCP/DCCP/RANDOM]: Remove unused exports.Adrian Bunk
This patch removes the following not or no longer used exports: - drivers/char/random.c: secure_tcp_sequence_number - net/dccp/options.c: sysctl_dccp_feat_sequence_window - net/netlink/af_netlink.c: netlink_set_err Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-28[DCCP] getsockopt: Fix DCCP_SOCKOPT_[SEND,RECV]_CSCOVArnaldo Carvalho de Melo
We were only checking if there was enough space to put the int, but left len as specified by the (malicious) user, sigh, fix it by setting len to sizeof(val) and transfering just one int worth of data, the one asked for. Also check for negative len values. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25[DCCP]: make dccp_write_xmit_timer() static againAdrian Bunk
dccp_write_xmit_timer() needlessly became global. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-09[DCCP]: Initialise write_xmit_timer also on passive socketsGerrit Renker
The TX CCID needs the write_xmit_timer for delaying packet sends. Previously this timer was only activated on active (connecting) sockets. This patch initialises the write_xmit_timer in sync with the other timers, i.e. the timer will be ready on any socket. This is used by applications with a listening socket which start to stream after receiving an initiation by the client. The write_xmit_timer is stopped when the application closes, as before. Was tested to work and to remove the timer bug reported on dccp@vger. Also moved timer initialisation into timer.c (static). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-07[DCCP]: Revert patch which disables bidirectional modeGerrit Renker
This reverts an earlier patch which disabled bidirectional mode, meaning that a listening (passive) socket was not allowed to write to the other (active) end of the connection. This mode had been disabled when there were problems with CCID3, but it imposes a constraint on socket programming and thus hinders deployment. A change is included to ignore RX feedback received by the TX CCID3 module. Many thanks to Andre Noll for pointing out this issue. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-06[DCCP]: Set RTO for newly created child socketGerrit Renker
This mirrors a recent change in tcp_open_req_child, whereby the icsk_rto of the newly created child socket was not set (but rather on the parent socket). Same fix for DCCP. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-06[DCCP]: Correctly split CCID half connectionsGerrit Renker
This fixes a bug caused by a previous patch, which causes DCCP servers in LISTEN state to not receive packets. This patch changes the logic so that * servers in either LISTEN or OPEN state get the RX half connection packets * clients in OPEN state get the TX half connection packets Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-28[NET]: Fix kfree(skb)Patrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-14[PATCH] sysctl: remove insert_at_head from register_sysctlEric W. Biederman
The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14[PATCH] sysctl: dccp: remove unnecessary insert_at_head flagEric W. Biederman
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12[PATCH] mark struct file_operations const 7Arjan van de Ven
Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-10[NET] DCCP: Fix whitespace errors.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NET]: change layout of ehash tableEric Dumazet
ehash table layout is currently this one : First half of this table is used by sockets not in TIME_WAIT state Second half of it is used by sockets in TIME_WAIT state. This is non optimal because of for a given hash or socket, the two chain heads are located in separate cache lines. Moreover the locks of the second half are never used. If instead of this halving, we use two list heads in inet_ehash_bucket instead of only one, we probably can avoid one cache miss, and reduce ram usage, particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep together related chains to speedup lookups and socket state change. In this patch I did not try to align struct inet_ehash_bucket, but a future patch could try to make this structure have a convenient size (a power of two or a multiple of L1_CACHE_SIZE). I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so maybe we dont need to scratch our heads to align the bucket... Note : In case struct inet_ehash_bucket is not a power of two, we could probably change alloc_large_system_hash() (in case it use __get_free_pages()) to free the unused space. It currently allocates a big zone, but the last quarter of it could be freed. Again, this should be a temporary 'problem'. Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[DCCP]: Warning fixes.Andrew Morton
net/dccp/ccids/ccid3.c: In function `ccid3_hc_rx_packet_recv': net/dccp/ccids/ccid3.c:1007: warning: long int format, different type arg (arg 3) net/dccp/ccids/ccid3.c:1007: warning: long int format, different type arg (arg 4) opaque types must be suitably cast for printing. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts.David S. Miller
Do this even for non-blocking sockets. This avoids the silly -EAGAIN that applications can see now, even for non-blocking sockets in some cases (f.e. connect()). With help from Venkat Tekkirala. Signed-off-by: David S. Miller <davem@davemloft.net>