aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2005-05-23[TCP]: Fix stretch ACK performance killer when doing ucopy.David S. Miller
When we are doing ucopy, we try to defer the ACK generation to cleanup_rbuf(). This works most of the time very well, but if the ucopy prequeue is large, this ACKing behavior kills performance. With TSO, it is possible to fill the prequeue so large that by the time the ACK is sent and gets back to the sender, most of the window has emptied of data and performance suffers significantly. This behavior does help in some cases, so we should think about re-enabling this trick in the future, using some kind of limit in order to avoid the bug case. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19[NETFILTER]: Do not be clever about SKB ownership in ip_ct_gather_frags().David S. Miller
Just do an skb_orphan() and be done with it. Based upon discussions with Herbert Xu on netdev. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19[IP_VS]: Remove extra __ip_vs_conn_put() for incoming ICMP.Julian Anastasov
Remove extra __ip_vs_conn_put for incoming ICMP in direct routing mode. Mark de Vries reports that IPVS connections are not leaked anymore. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-18[IPV4/IPV6] Ensure all frag_list members have NULL skHerbert Xu
Having frag_list members which holds wmem of an sk leads to nightmares with partially cloned frag skb's. The reason is that once you unleash a skb with a frag_list that has individual sk ownerships into the stack you can never undo those ownerships safely as they may have been cloned by things like netfilter. Since we have to undo them in order to make skb_linearize happy this approach leads to a dead-end. So let's go the other way and make this an invariant: For any skb on a frag_list, skb->sk must be NULL. That is, the socket ownership always belongs to the head skb. It turns out that the implementation is actually pretty simple. The above invariant is actually violated in the following patch for a short duration inside ip_fragment. This is OK because the offending frag_list member is either destroyed at the end of the slow path without being sent anywhere, or it is detached from the frag_list before being sent. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-05[PATCH] update Ross Biro bouncing email addressJesper Juhl
Ross moved. Remove the bad email address so people will find the correct one in ./CREDITS. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05[IPV4]: multipath_wrandom.c GPF fixesPatrick McHardy
multipath_wrandom needs to use GFP_ATOMIC. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[IPSEC]: Store idev entriesHerbert Xu
I found a bug that stopped IPsec/IPv6 from working. About a month ago IPv6 started using rt6i_idev->dev on the cached socket dst entries. If the cached socket dst entry is IPsec, then rt6i_idev will be NULL. Since we want to look at the rt6i_idev of the original route in this case, the easiest fix is to store rt6i_idev in the IPsec dst entry just as we do for a number of other IPv6 route attributes. Unfortunately this means that we need some new code to handle the references to rt6i_idev. That's why this patch is bigger than it would otherwise be. I've also done the same thing for IPv4 since it is conceivable that once these idev attributes start getting used for accounting, we probably need to dereference them for IPv4 IPsec entries too. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NETFILTER]: Drop conntrack reference in ip_dev_loopback_xmit()Patrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NETLINK]: Synchronous message processing.Herbert Xu
Let's recap the problem. The current asynchronous netlink kernel message processing is vulnerable to these attacks: 1) Hit and run: Attacker sends one or more messages and then exits before they're processed. This may confuse/disable the next netlink user that gets the netlink address of the attacker since it may receive the responses to the attacker's messages. Proposed solutions: a) Synchronous processing. b) Stream mode socket. c) Restrict/prohibit binding. 2) Starvation: Because various netlink rcv functions were written to not return until all messages have been processed on a socket, it is possible for these functions to execute for an arbitrarily long period of time. If this is successfully exploited it could also be used to hold rtnl forever. Proposed solutions: a) Synchronous processing. b) Stream mode socket. Firstly let's cross off solution c). It only solves the first problem and it has user-visible impacts. In particular, it'll break user space applications that expect to bind or communicate with specific netlink addresses (pid's). So we're left with a choice of synchronous processing versus SOCK_STREAM for netlink. For the moment I'm sticking with the synchronous approach as suggested by Alexey since it's simpler and I'd rather spend my time working on other things. However, it does have a number of deficiencies compared to the stream mode solution: 1) User-space to user-space netlink communication is still vulnerable. 2) Inefficient use of resources. This is especially true for rtnetlink since the lock is shared with other users such as networking drivers. The latter could hold the rtnl while communicating with hardware which causes the rtnetlink user to wait when it could be doing other things. 3) It is still possible to DoS all netlink users by flooding the kernel netlink receive queue. The attacker simply fills the receive socket with a single netlink message that fills up the entire queue. The attacker then continues to call sendmsg with the same message in a loop. Point 3) can be countered by retransmissions in user-space code, however it is pretty messy. In light of these problems (in particular, point 3), we should implement stream mode netlink at some point. In the mean time, here is a patch that implements synchronous processing. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[TCP]: Optimize check in port-allocation code.Folkert van Heusden
Signed-off-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[RTNETLINK] Cleanup rtnetlink_link tablesThomas Graf
Converts remaining rtnetlink_link tables to use c99 designated initializers to make greping a little bit easier. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NETFILTER]: Don't checksum CHECKSUM_UNNECESSARY skbs in TCP connection trackingPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NETFILTER]: Missing owner-field initialization in iptable_rawPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28[NET]: /proc/net/stat/* header cleanupOlaf Rempel
Signed-off-by: Olaf Rempel <razzor@kopf-tisch.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28[IPV4]: Incorrect permissions on route flush sysctlDave Jones
This has been brought up before.. http://lkml.org/lkml/2000/1/21/116 but didnt seem to get resolved. This morning I got someone file a bugzilla about it breaking sysctl(8). Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-25[NET]: kill gratitious includes of major.hAl Viro
A lot of places in there are including major.h for no reason whatsoever. Removed. And yes, it still builds. The history of that stuff is often amusing. E.g. for net/core/sock.c the story looks so, as far as I've been able to reconstruct it: we used to need major.h in net/socket.c circa 1.1.early. In 1.1.13 that need had disappeared, along with register_chrdev(SOCKET_MAJOR, "socket", &net_fops) in sock_init(). Include had not. When 1.2 -> 1.3 reorg of net/* had moved a lot of stuff from net/socket.c to net/core/sock.c, this crap had followed... Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-25[TCP]: Trivial tcp_data_queue() cleanupJames Morris
This patch removes a superfluous intialization from tcp_data_queue(). Signed-off-by: James Morris <jmorris@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-25[NETFILTER]: Drop conntrack reference when packet leaves IPPatrick McHardy
In the event a raw socket is created for sending purposes only, the creator never bothers to check the socket's receive queue. But we continue to add skbs to its queue until it fills up. Unfortunately, if ip_conntrack is loaded on the box, each skb we add to the queue potentially holds a reference to a conntrack. If the user attempts to unload ip_conntrack, we will spin around forever since the queued skbs are pinned. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-25[NETFILTER]: Fix truncated sequence numbers in FTP helperYasuyuki KOZAKAI
Signed-off-by: Yasuyuki KOZAKAI <yasuyuki.kozkaai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-24[TCP]: skb pcount with MTU discoveryDavid S. Miller
The problem is that when doing MTU discovery, the too-large segments in the write queue will be calculated as having a pcount of >1. When tcp_write_xmit() is trying to send, tcp_snd_test() fails the cwnd test when pcount > cwnd. The segments are eventually transmitted one at a time by keepalive, but this can take a long time. This patch checks if TSO is enabled when setting pcount. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-24[NETFILTER]: Ignore PSH on SYN/ACK in TCP connection trackingPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-24[NETFILTER]: Fix NAT sequence number adjustmentPatrick McHardy
The NAT changes in 2.6.11 changed the position where helpers are called and perform packet mangling. Before 2.6.11, a NAT helper was called before the packet was NATed and had its sequence number adjusted. Since 2.6.11, the helpers get packets with already adjusted sequence numbers. This breaks sequence number adjustment, adjust_tcp_sequence() needs the original sequence number to determine whether a packet was a retransmission and to store it for further corrections. It can't be reconstructed without more information than available, so this patch restores the old order by calling helpers from a new conntrack hook two priorities below ip_conntrack_confirm() and adjusting the sequence number from a new NAT hook one priority below ip_conntrack_confirm(). Tracked down by Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-19[IPSEC]: COW skb header in UDP decapHerbert Xu
The following patch just makes the header part of the skb writeable. This is needed since we modify the IP headers just a few lines below. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-19[NET]: skbuff: remove old NET_CALLER macroStephen Hemminger
Here is a revised alternative that uses BUG_ON/WARN_ON (as suggested by Herbert Xu) to eliminate NET_CALLER. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-16Linux-2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!