Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2007-10-10	[NET]: Move hardware header operations out of netdevice.	Stephen Hemminger
	Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[NET]: Wrap hard_header_parse	Stephen Hemminger
	Wrap the hard_header_parse function to simplify next step of header_ops conversion. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[NET]: Wrap netdevice hardware header creation.	Stephen Hemminger
	Add inline for common usage of hardware header creation, and fix bug in IPV6 mcast where the assumption about negative return is an errno. Negative return from hard_header means not enough space was available,(ie -N bytes). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[NET]: Make the loopback device per network namespace.	Eric W. Biederman
	This patch makes loopback_dev per network namespace. Adding code to create a different loopback device for each network namespace and adding the code to free a loopback device when a network namespace exits. This patch modifies all users the loopback_dev so they access it as init_net.loopback_dev, keeping all of the code compiling and working. A later pass will be needed to update the users to use something other than the initial network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[IPV4]: When possible test for IFF_LOOPBACK and not dev == loopback_dev	Eric W. Biederman
	Now that multiple loopback devices are becoming possible it makes the code a little cleaner and more maintainable to test if a deivice is th a loopback device by testing dev->flags & IFF_LOOPBACK instead of dev == loopback_dev. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[IPV4]: Remove unnecessary test for the loopback device from inetdev_destroy	Eric W. Biederman
	Currently we never call unregister_netdev for the loopback device so it is impossible for us to reach inetdev_destroy with the loopback device. So the test in inetdev_destroy is unnecessary. Further when testing with my network namespace patches removing unregistering the loopback device and calling inetdev_destroy works fine so there appears to be no reason for avoiding unregistering the loopback device. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[NET]: Add network namespace clone & unshare support.	Eric W. Biederman
	This patch allows you to create a new network namespace using sys_clone, or sys_unshare. As the network namespace is still experimental and under development clone and unshare support is only made available when CONFIG_NET_NS is selected at compile time. As this patch introduces network namespace support into code paths that exist when the CONFIG_NET is not selected there are a few additions made to net_namespace.h to allow a few more functions to be used when the networking stack is not compiled in. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[NET]: Fix running without sysfs	Eric W. Biederman
	When sysfs support is compiled out the kernel still keeps and maintains the kobject tree. So it is not safe to skip our kobject reference counting or to avoid becoming members of the kobject tree. It is safe to not add the networking specific sysfs attributes. This patch removes the sysfs special cases from net/core/dev.c renames functions from netdev_sysfs_xxxx to netdev_kobject_xxxx and always compiles in net-sysfs.c net-sysfs.c is modified with a CONFIG_SYSFS guard around the parts that are actually sysfs specific. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[CCID3]: Remove ifdef surrounding BUG_ON	Arnaldo Carvalho de Melo
	As suggested by DaveM. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10	[DCCP]: Reduce the number of writable states	Gerrit Renker
	Since DCCP requires to close both ends of a connection simultaneously, permission to write in state DCCP_CLOSING is removed in dccp_sendmsg(): * if the sending end closed, it would encounter a write error anyhow; * if the other end has closed the connection, it accepts no more data. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10	[DCCP]: Factor out common code for generating Resets	Gerrit Renker
	This factors code common to dccp_v{4,6}_ctl_send_reset into a separate function, and adds support for filling in the Data 1 ... Data 3 fields from RFC 4340, 5.6. It is useful to have this separate, since the following Reset codes will always be generated from the control socket rather than via dccp_send_reset: * Code 3, "No Connection", cf. 8.3.1; * Code 4, "Packet Error" (identification for Data 1 added); * Code 5, "Option Error" (identification for Data 1..3 added, will be used later); * Code 6, "Mandatory Error" (same as Option Error); * Code 7, "Connection Refused" (what on Earth is the difference to "No Connection"?); * Code 8, "Bad Service Code"; * Code 9, "Too Busy"; * Code 10, "Bad Init Cookie" (not used). Code 0 is not recommended by the RFC, the following codes would be used in dccp_send_reset() instead, since they all relate to an established DCCP connection: * Code 1, "Closed"; * Code 2, "Aborted"; * Code 11, "Aggression Penalty" (12.3). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10	[DCCP]: Sequence number wrap-around when sending reset	Gerrit Renker
	This replaces normal addition with mod-48 addition so that sequence number wraparound is respected. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10	[DCCP]: Rate-limit DCCP-Syncs	Gerrit Renker
	This implements a SHOULD from RFC 4340, 7.5.4: "To protect against denial-of-service attacks, DCCP implementations SHOULD impose a rate limit on DCCP-Syncs sent in response to sequence-invalid packets, such as not more than eight DCCP-Syncs per second." The rate-limit is maintained on a per-socket basis. This is a more stringent policy than enforcing the rate-limit on a per-source-address basis and protects against attacks with forged source addresses. Moreover, the mechanism is deliberately kept simple. In contrast to xrlim_allow(), bursts of Sync packets in reply to sequence-invalid packets are not supported. This foils such attacks where the receipt of a Sync triggers further sequence-invalid packets. (I have tested this mechanism against xrlim_allow algorithm for Syncs, permitting bursts just increases the problems.) In order to keep flexibility, the timeout parameter can be set via sysctl; and the whole mechanism can even be disabled (which is however not recommended). The algorithm in this patch has been improved with regard to wrapping issues thanks to a suggestion by Arnaldo. Commiter note: Rate limited the step 6 DCCP_WARN too, as it says we're sending a sync. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10	[DCCP]: Remove duplicate code for Reset from connected socket	Gerrit Renker
	In this patch, duplicated code is removed for the case when a Reset packet is sent from a connected socket. This code duplication is between dccp_make_reset and dccp_transmit_skb, which already contained an (up to now entirely unused) switch statement to fill in the reset code from the DCCP_SKB_CB. The only thing that has been removed is the call to dst_clone(dst), since the queue_xmit functions use sk_dst_cache anyway. I wasn't sure which purpose inet_sk_rebuild_header served, so I left it in. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10	[DCCP]: Add Support for Data 1 .. 3 fields of Reset packets	Gerrit Renker
	This adds fields to support the informational Data 1..3 fields of the DCCP-Reset packets (RFC 4340, 5.6), and makes minor cosmetic changes to documentation. Code which fills in these fields follows in subsequent patches, it is primarily used for reporting option-processing and feature-negotiation errors. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10	[DCCP]: Add FIXME for send_delayed_ack	Gerrit Renker
	This adds a FIXME to signal that the function dccp_send_delayed_ack is nowhere used in the entire DCCP/CCID code. Using a delayed Ack timer is suggested in 11.3 of RFC 4340, but it has also rather subtle implications for the Ack-Ratio-accounting. CCID2 does not use this (maybe it should). I think leaving the function in is good, in case someone wants to implement this. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10	[CCID3]: Move NULL-protection into function	Gerrit Renker
	This moves several instances of testing against NULL into the function which is used to de-reference the CCID-private data. Committer note: Made the BUG_ON depend on having CONFIG_IP_DCCP_CCID3_DEBUG, as it is too much to have this on production code. Also made sure that the macro is used only after checking if sk_state is not LISTEN, to make it equivalent to what we had before. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10	[DCCP]: Send Reset upon Sync in state REQUEST	Gerrit Renker
	This fixes the code to correspond to RFC 4340, 7.5.4, which states the exception that a Sync received in state REQUEST generates a Reset (not a SyncAck). To achieve this, only a small change is required. Since dccp_rcv_request_sent_state_process() already uses the correct Reset Code number 4 ("Packet Error"), we only need to shift the if-statement a few lines further down. (To test this case: replace DCCP_PKT_RESPONSE with DCCP_PKT_SYNC in dccp_make_response.) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10	[BLUETOOTH]: Make hidp_setup_input() return int	WANG Cong
	This patch: - makes hidp_setup_input() return int to indicate errors; - checks its return value to handle errors. And this time it is against -rc7-mm1 tree. Thanks to roel and Marcel Holtmann for comments. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[TCP] MIB: Count FRTO's successfully detected spurious RTOs	Ilpo Järvinen
	Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[TCP]: Reordered ACK's (old) SACKs not included to discarded MIB	Ilpo Järvinen
	In case of ACK reordering, the SACK block might be valid in it's time but is already obsoleted since we've received another kind of confirmation about arrival of the segments through snd_una advancement of an earlier packet. I didn't bother to build distinguishing of valid and invalid SACK blocks but simply made reordered SACK blocks that are too old always not counted regardless of their "real" validity which could be determined by using the ack field of the reordered packet (won't be significant IMHO). DSACKs can very well be considered useful even in this situation, so won't do any of this for them. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[TCP]: Re-place highest_sack check to a more robust position	Ilpo Järvinen
	I previously added checking to position that is rather poor as state has already been adjusted quite a bit. Re-placing it above all state changes should be more robust though the return should never ever get executed regardless of its place :-). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[DCCP]: Parameter renaming	Gerrit Renker
	The parameter `seq' of dccp_send_sync() is in fact an acknowledgement number and not a sequence number - thus renamed by this patch into `ackno'. Secondly, a `critical' warning is added when a Sync/SyncAck could not be sent. Sanity: I have checked all other functions that are called in dccp_transmit_skb, there are no clashes with the use of dccpd_ack_seq; no other function is using this slot at the same time. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[DCCP]: Fix Reset/Sync-Flood Bug	Gerrit Renker
	This updates sequence number checking with regard to RFC 4340, 7.5.4. Missing in the code was an exception for sequence-invalid Reset packets, which get a Sync acknowledging GSR, instead of (as usual) P.seqno. This can lead to an oscillating ping-pong flood of Reset packets. In fact, it has been observed on the wire as follows: 1. client establishes connection to server; 2. before server can write to client, client crashes without notifying the server (NB: now no longer possible due to ABORT function); 3. server sends DCCP-Data packet (has no ackno); 4. client generates Reset "No Connection", seqno=0, increments seqno; 5. server replies with Sync, using ackno = P.seqno; 6. client generates Reset "No Connection" with seqno = ackno + 1; 7. goto (5). The difference is that now in (5) the server uses GSR. This causes the Reset sent by the client in (6) to become sequence-valid, so that in (7) the vicious circle is broken; the Reset is then enqueued and causes the socket to enter TIMEWAIT state. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[DCCP]: Shorten variable names in dccp_check_seqno	Gerrit Renker
	This patch is in part required by the next patch; it * replaces 6 instances of `DCCP_SKB_CB(skb)->dccpd_seq' with `seqno'; * replaces 7 instances of `DCCP_SKB_CB(skb)->dccpd_ack_seq' with `ackno'; * replaces 1 use of dccp_inc_seqno() by unfolding `ADD48' macro in place. No changes in algorithm, all changes are text replacement/substitution. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[DCCP]: Simplify interface of dccp_sample_rtt	Gerrit Renker
	The third parameter of dccp_sample_rtt now becomes useless and is removed. Also combined the subtraction of the timestamp echo and the elapsed time. This is safe, since (a) presence of timestamp echo is tested first and (b) elapsed time is either present and non-zero or it is not set and equals 0 due to the memset in dccp_parse_options. To avoid measuring option-processing time, the timestamp for measuring the initial Request/Response RTT sample is taken directly when the function is called (the Linux implementation always adds a timestamp on the Request, so there is no loss in doing this). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[DCCP]: Provide 10s of microsecond timesource	Gerrit Renker
	This provides a timesource, conveniently used for DCCP timestamps, which returns the elapsed time in 10s of microseconds since initialisation. This makes for a wrap-around time of about 11.9 hours, which should be sufficient for most applications. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[DCCP]: Reuse ktime_get_real() calls again	Gerrit Renker
	This patch reduces the number of timestamps taken in the receive path for each packet. The ccid3_hc_tx_update_x() routine is called in * the receive path for each CCID3-controlled packet * for the nofeedback timer (if no feedback arrives during 4 RTT) Currently, when there is no loss, each packet gets timestamped twice. The patch resolves this by recycling the first timestamp taken on packet reception for RTT sampling. When the no_feedback_timer() is called, then the timestamp argument is simply set to NULL - so that ccid3_hc_tx_update_x() takes care of the logic. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[MAC80211]: rename ieee80211_cfg.h to cfg.h	Michael Wu
	Might as well rename ieee80211_cfg.h to cfg.h to keep things consistent. Signed-off-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[MAC80211]: kill vlan_id	Johannes Berg
	Each station has a vlan_id that is useless. Remove it. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[MAC80211]: kill IE parse typedef	Johannes Berg
	The parse result typedef isn't needed. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[MAC80211]: rename ieee80211_cfg.c to cfg.c	Johannes Berg
	It's just painful to have the extra ieee80211_ prefix. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[MAC80211]: print out wiphy name instead of master device	Johannes Berg
	This makes mac80211 print out the wiphy name instead of the master device name where appropriate. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[MAC80211]: fix warnings introduced by the doc patches	Johannes Berg
	This fixes a warning about NUM_IEEE80211_MODES missing in a switch statement. Intentionally do not add a default case so we get warnings at these places if we need to add new modes. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[MAC80211]: remove key threshold stuff	Johannes Berg
	This patch removes the key threshold stuff from mac80211. I have patches for later that add it as a per-key setting to nl/cfg80211. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[MAC80211]: allow drivers to indicate failed FCS/PLCP checksum	Johannes Berg
	This patch allows drivers to indicate bad FCS/PLCP CRC to the stack and have the stack drop packets like that except for monitor interfaces. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[MAC80211]: Add support for setting TX power and radio status	Michael Buesch
	This adds support for disabling the radio and setting the TXpower through wext. This also fixes the prism TXpower ioctl (It always overwrote the TXpower value in ieee80211_hw_config()) Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[IEEE80211]: Fix softmac lockdep reports.	Johannes Berg
	It seems I was actually able to hit this deadlock, on my quad G5 softmac locks up more often than not. This fixes it by using an own workqueue that can safely be flushed under RTNL. Not sure if the patch is correct with the workqueue naming. And don't think with the patch it doesn't continually lock up. It still does, just doesn't invoke lockdep warnings all the time. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[NET_SCHED]: explict hold dev tx lock	Jamal Hadi Salim
	For N cpus, with full throttle traffic on all N CPUs, funneling traffic to the same ethernet device, the devices queue lock is contended by all N CPUs constantly. The TX lock is only contended by a max of 2 CPUS. In the current mode of operation, after all the work of entering the dequeue region, we may endup aborting the path if we are unable to get the tx lock and go back to contend for the queue lock. As N goes up, this gets worse. The changes in this patch result in a small increase in performance with a 4CPU (2xdual-core) with no irq binding. Both e1000 and tg3 showed similar behavior; Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[NET]: Dynamically allocate the loopback device, part 1.	Daniel Lezcano
	This patch replaces all occurences to the static variable loopback_dev to a pointer loopback_dev. That provides the mindless, trivial, uninteressting change part for the dynamic allocation for the loopback. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-By: Kirill Korotaev <dev@sw.ru> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[NL80211]: add netlink interface to cfg80211	Johannes Berg
	Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[TCP]: Avoid clearing sacktag hint in trivial situations	Ilpo Järvinen
	There's no reason to clear the sacktag skb hint when small part of the rexmit queue changes. Account changes (if any) instead when fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so no need to discard SACK tag hint at all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[TCP]: Enable SACK enhanced FRTO (RFC4138) by default	Ilpo Järvinen
	Most of the description that follows comes from my mail to netdev (some editing done): Main obstacle to FRTO use is its deployment as it has to be on the sender side where as wireless link is often the receiver's access link. Take initiative on behalf of unlucky receivers and enable it by default in future Linux TCP senders. Also IETF seems to interested in advancing FRTO from experimental [1]. How does FRTO help? =================== FRTO detects spurious RTOs and avoids a number of unnecessary retransmissions and a couple of other problems that can arise due to incorrect guess made at RTO (i.e., that segments were lost when they actually got delayed which is likely to occur e.g. in wireless environments with link-layer retransmission). Though FRTO cannot prevent the first (potentially unnecessary) retransmission at RTO, I suspect that it won't cost that much even if you have to pay for each bit (won't be that high percentage out of all packets after all :-)). However, usually when you have a spurious RTO, not only the first segment unnecessarily retransmitted but the whole window. It goes like this: all cumulative ACKs got delayed due to in-order delivery, then TCP will actually send 1.5*original cwnd worth of data in the RTO's slow-start when the delayed ACKs arrive (basically the original cwnd worth of it unnecessarily). In case one is interested in minimizing unnecessary retransmissions e.g. due to cost, those rexmissions must never see daylight. Besides, in the worst case the generated burst overloads the bottleneck buffers which is likely to significantly delay the further progress of the flow. In case of ll rexmissions, ACK compression often occurs at the same time making the burst very "sharp edged" (in that case TCP often loses most of the segments above high_seq => very bad performance too). When FRTO is enabled, those unnecessary retransmissions are fully avoided except for the first segment and the cwnd behavior after detected spurious RTO is determined by the response (one can tune that by sysctl). Basic version (non-SACK enhanced one), FRTO can fail to detect spurious RTO as spurious and falls back to conservative behavior. ACK lossage is much less significant than reordering, usually the FRTO can detect spurious RTO if at least 2 cumulative ACKs from original window are preserved (excluding the ACK that advances to high_seq). With SACK-enhanced version, the detection is quite robust. FRTO should remove the need to set a high lower bound for the RTO estimator due to delay spikes that occur relatively common in some environments (esp. in wireless/cellular ones). [1] http://www1.ietf.org/mail-archive/web/tcpm/current/msg02862.html Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[TCP] FRTO: Improve interoperability with other undo_marker users	Ilpo Järvinen
	Basically this change enables it, previously other undo_marker users were left with nothing. Reverse undo_marker logic completely to get it set right in CA_Loss. On the other hand, when spurious RTO is detected, clear it. Clearing might be too heavy for some scenarios but seems safe enough starting point for now and shouldn't have much effect except in majority of cases (if in any). By adding a new FLAG_ we avoid looping through write_queue when RTO occurs. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue	Ilpo Järvinen
	Implements following cleanups: - Comment re-placement (CodingStyle) - tcp_tso_acked() local (wrapper-like) variable removal (readability) - __-types removed (IMHO they make local variables jumpy looking and just was space) - acked -> flag (naming conventions elsewhere in TCP code) - linebreak adjustments (readability) - nested if()s combined (reduced indentation) - clarifying newlines added Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[TCP]: Move accounting from tso_acked to clean_rtx_queue	Ilpo Järvinen
	The accounting code is pretty much the same, so it's a shame we do it in two places. I'm not too sure if added fully_acked check in MTU probing is really what we want perhaps the added end_seq could be used in the after() comparison. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[TCP]: clear_all_retrans_hints prefixed by tcp_	Ilpo Järvinen
	In addition, fix its function comment spacing. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
2007-10-10	[TCP]: Make fackets_out accurate	Ilpo Järvinen
	Substraction for fackets_out is unconditional when snd_una advances, thus there's no need to do it inside the loop. Just make sure correct bounds are honored. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[TCP]: Maintain highest_sack accurately to the highest skb	Ilpo Järvinen
	In general, it should not be necessary to call tcp_fragment for already SACKed skbs, but it's better to be safe than sorry. And indeed, it can be called from sacktag when a DSACK arrives or some ACK (with SACK) reordering occurs (sacktag could be made to avoid the call in the latter case though I'm not sure if it's worth of the trouble and added complexity to cover such marginal case). The collapse case has return for SACKED_ACKED case earlier, so just WARN_ON if internal inconsistency is detected for some reason. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10	[NET]: Introduce and use print_mac() and DECLARE_MAC_BUF()	Joe Perches
	This is nicer than the MAC_FMT stuff. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>