aboutsummaryrefslogtreecommitdiff
path: root/net/dccp/ipv4.c
AgeCommit message (Collapse)Author
2005-10-03[INET]: speedup inet (tcp/dccp) lookupsEric Dumazet
Arnaldo and I agreed it could be applied now, because I have other pending patches depending on this one (Thank you Arnaldo) (The other important patch moves skc_refcnt in a separate cache line, so that the SMP/NUMA performance doesnt suffer from cache line ping pongs) 1) First some performance data : -------------------------------- tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established() The most time critical code is : sk_for_each(sk, node, &head->chain) { if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif)) goto hit; /* You sunk my battleship! */ } The sk_for_each() does use prefetch() hints but only the begining of "struct sock" is prefetched. As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far away from the begining of "struct sock", it has to bring into CPU cache cold cache line. Each iteration has to use at least 2 cache lines. This can be problematic if some chains are very long. 2) The goal ----------- The idea I had is to change things so that INET_MATCH() may return FALSE in 99% of cases only using the data already in the CPU cache, using one cache line per iteration. 3) Description of the patch --------------------------- Adds a new 'unsigned int skc_hash' field in 'struct sock_common', filling a 32 bits hole on 64 bits platform. struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse; int skc_bound_dev_if; struct hlist_node skc_node; struct hlist_node skc_bind_node; atomic_t skc_refcnt; + unsigned int skc_hash; struct proto *skc_prot; }; Store in this 32 bits field the full hash, not masked by (ehash_size - 1) Using this full hash as the first comparison done in INET_MATCH permits us immediatly skip the element without touching a second cache line in case of a miss. Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to sk_hash and tw_hash) already contains the slot number if we mask with (ehash_size - 1) File include/net/inet_hashtables.h 64 bits platforms : #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \ ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) 32bits platforms: #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) && \ (inet_sk(__sk)->daddr == (__saddr)) && \ (inet_sk(__sk)->rcv_saddr == (__daddr)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) - Adds a prefetch(head->chain.first) in __inet_lookup_established()/__tcp_v4_check_established() and __inet6_lookup_established()/__tcp_v6_check_established() and __dccp_v4_check_established() to bring into cache the first element of the list, before the {read|write}_lock(&head->lock); Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18[DCCP]: Don't use necessarily the same CCID for tx and rxArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18[DCCP]: Move the ack vector code to net/dccp/ackvec.[ch]Arnaldo Carvalho de Melo
Isolating it, that will be used when we introduce a CCID2 (TCP-Like) implementation. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-16[DCCP]: Introduce DCCP_SOCKOPT_SERVICEArnaldo Carvalho de Melo
As discussed in the dccp@vger mailing list: Now applications have to use setsockopt(DCCP_SOCKOPT_SERVICE, service[s]), prior to calling listen() and connect(). An array of unsigned ints can be passed meaning that the listening sock accepts connection requests for several services. With this we can ditch struct sockaddr_dccp and use only sockaddr_in (and sockaddr_in6 in the future). Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-16[DCCP]: More precisely set reset_code when sending RESET packetsArnaldo Carvalho de Melo
Moving the setting of DCCP_SKB_CB(skb)->dccpd_reset_code to the places where events happen that trigger sending a RESET packet. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-09[DCCP] Only call the HC _exit() routines in dccp_v4_destroy_sockArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-09[DCCP] Introduce dccp_timestampArnaldo Carvalho de Melo
To start the timestamps with 0.0ms, easing the integer maths in the CCIDs, this probably will be reworked to use the to be introduced struct timeval_offset infrastructure out of skb_get_timestamp, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-08-29[CCID3]: Call sk->sk_write_space(sk) when receiving a feedback packetArnaldo Carvalho de Melo
This makes the send rate calculations behave way more closely to what is specified, with the jitter previously seen on x and x_recv disappearing completely on non lossy setups. This resembles the tcp_data_snd_check code, that possibly we'll end up using in DCCP as well, perhaps moving this code to inet_connection_sock. For now I'm doing the simplest implementation tho. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Call the HC exit routines at dccp_v4_destroy_sockArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Fix the ACK and SEQ window variables settingsArnaldo Carvalho de Melo
This is from a first audit, more eyeballs are more than welcome. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Fix seqno setting in dccp_v4_ctl_send_resetArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Use LIMIT_NETDEBUG in some debugging printksArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Send SYNCACK packets in response to SYNC packetsArnaldo Carvalho de Melo
Also fix step 6 when receiving SYNC or SYNCACK packets, i.e. we were not using the updated swl. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Fix sparse warningsArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Just reflow the source code to fit in 80 columnsArnaldo Carvalho de Melo
Andrew Morton should be happy now 8) Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[TCPDIAG]: Implement cheapest way of supporting DCCPDIAG_GETSOCKArnaldo Carvalho de Melo
With ugly ifdefs, etc, but this actually: 1. keeps the existing ABI, i.e. no need to recompile the iproute2 utilities if not interested in DCCP. 2. Provides all the tcp_diag functionality in DCCP, with just a small patch that makes iproute2 support DCCP. Of course I'll get this cleaned-up in time, but for now I think its OK to be this way to quickly get this functionality. iproute2-ss050808 patch at: http://vger.kernel.org/~acme/iproute2-ss050808.dccp.patch Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Finish the TIMEWAIT minisock supportArnaldo Carvalho de Melo
Using most of the infrastructure TCP uses, with a dccp_death_row, etc. As per my current interpretation of the draft what we have with this changeset seems to be all we need (or very close to it 8)). Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Initialize icsk_rto in dccp_v4_init_sockArnaldo Carvalho de Melo
Fixes nasty bug related to the retransmit timer (yeah, DCCP does retransmits) firing too early. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Fix u64 printf format warnings.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Fix checksum routinesYoshifumi Nishida
Signed-off-by: Yoshifumi Nishida <nishida@csl.sony.co.jp> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[DCCP]: Initial implementationArnaldo Carvalho de Melo
Development to this point was done on a subversion repository at: http://oops.ghostprotocols.net:81/cgi-bin/viewcvs.cgi/dccp-2.6/ This repository will be kept at this site for the foreseable future, so that interested parties can see the history of this code, attributions, etc. If I ever decide to take this offline I'll provide the full history at some other suitable place. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>