aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2008-04-02IPv6: do not create temporary adresses with too short preferred lifetimeBenoit Boissinot
From RFC341: A temporary address is created only if this calculated Preferred Lifetime is greater than REGEN_ADVANCE time units. In particular, an implementation must not create a temporary address with a zero Preferred Lifetime. Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-02IPv6: only update the lifetime of the relevant temporary addressBenoit Boissinot
When receiving a prefix information from a routeur, only update the lifetimes of the temporary address associated with that prefix. Otherwise if one deprecated prefix is advertized, all your temporary addresses will become deprecated. Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-01bluetooth : __rfcomm_dlc_close lock fixDave Young
Lockdep warning will be trigged while rfcomm connection closing. The locks taken in rfcomm_dev_add: rfcomm_dev_lock --> d->lock In __rfcomm_dlc_close: d->lock --> rfcomm_dev_lock (in rfcomm_dev_state_change) There's two way to fix it, one is in rfcomm_dev_add we first locking d->lock then the rfcomm_dev_lock The other (in this patch), remove the locking of d->lock for rfcomm_dev_state_change because just locking "d->state = BT_CLOSED;" is enough. [ 295.002046] ======================================================= [ 295.002046] [ INFO: possible circular locking dependency detected ] [ 295.002046] 2.6.25-rc7 #1 [ 295.002046] ------------------------------------------------------- [ 295.002046] krfcommd/2705 is trying to acquire lock: [ 295.002046] (rfcomm_dev_lock){-.--}, at: [<f89a090a>] rfcomm_dev_state_change+0x6a/0xd0 [rfcomm] [ 295.002046] [ 295.002046] but task is already holding lock: [ 295.002046] (&d->lock){--..}, at: [<f899c533>] __rfcomm_dlc_close+0x43/0xd0 [rfcomm] [ 295.002046] [ 295.002046] which lock already depends on the new lock. [ 295.002046] [ 295.002046] [ 295.002046] the existing dependency chain (in reverse order) is: [ 295.002046] [ 295.002046] -> #1 (&d->lock){--..}: [ 295.002046] [<c0149b23>] check_prev_add+0xd3/0x200 [ 295.002046] [<c0149ce5>] check_prevs_add+0x95/0xe0 [ 295.002046] [<c0149f6f>] validate_chain+0x23f/0x320 [ 295.002046] [<c014b7b1>] __lock_acquire+0x1c1/0x760 [ 295.002046] [<c014c349>] lock_acquire+0x79/0xb0 [ 295.002046] [<c03d6b99>] _spin_lock+0x39/0x80 [ 295.002046] [<f89a01c0>] rfcomm_dev_add+0x240/0x360 [rfcomm] [ 295.002046] [<f89a047e>] rfcomm_create_dev+0x6e/0xe0 [rfcomm] [ 295.002046] [<f89a0823>] rfcomm_dev_ioctl+0x33/0x60 [rfcomm] [ 295.002046] [<f899facc>] rfcomm_sock_ioctl+0x2c/0x50 [rfcomm] [ 295.002046] [<c0363d38>] sock_ioctl+0x118/0x240 [ 295.002046] [<c0194196>] vfs_ioctl+0x76/0x90 [ 295.002046] [<c0194446>] do_vfs_ioctl+0x56/0x140 [ 295.002046] [<c0194569>] sys_ioctl+0x39/0x60 [ 295.002046] [<c0104faa>] syscall_call+0x7/0xb [ 295.002046] [<ffffffff>] 0xffffffff [ 295.002046] [ 295.002046] -> #0 (rfcomm_dev_lock){-.--}: [ 295.002046] [<c0149a84>] check_prev_add+0x34/0x200 [ 295.002046] [<c0149ce5>] check_prevs_add+0x95/0xe0 [ 295.002046] [<c0149f6f>] validate_chain+0x23f/0x320 [ 295.002046] [<c014b7b1>] __lock_acquire+0x1c1/0x760 [ 295.002046] [<c014c349>] lock_acquire+0x79/0xb0 [ 295.002046] [<c03d6639>] _read_lock+0x39/0x80 [ 295.002046] [<f89a090a>] rfcomm_dev_state_change+0x6a/0xd0 [rfcomm] [ 295.002046] [<f899c548>] __rfcomm_dlc_close+0x58/0xd0 [rfcomm] [ 295.002046] [<f899d44f>] rfcomm_recv_ua+0x6f/0x120 [rfcomm] [ 295.002046] [<f899e061>] rfcomm_recv_frame+0x171/0x1e0 [rfcomm] [ 295.002046] [<f899e357>] rfcomm_run+0xe7/0x550 [rfcomm] [ 295.002046] [<c013c18c>] kthread+0x5c/0xa0 [ 295.002046] [<c0105c07>] kernel_thread_helper+0x7/0x10 [ 295.002046] [<ffffffff>] 0xffffffff [ 295.002046] [ 295.002046] other info that might help us debug this: [ 295.002046] [ 295.002046] 2 locks held by krfcommd/2705: [ 295.002046] #0: (rfcomm_mutex){--..}, at: [<f899e2eb>] rfcomm_run+0x7b/0x550 [rfcomm] [ 295.002046] #1: (&d->lock){--..}, at: [<f899c533>] __rfcomm_dlc_close+0x43/0xd0 [rfcomm] [ 295.002046] [ 295.002046] stack backtrace: [ 295.002046] Pid: 2705, comm: krfcommd Not tainted 2.6.25-rc7 #1 [ 295.002046] [<c0128a38>] ? printk+0x18/0x20 [ 295.002046] [<c014927f>] print_circular_bug_tail+0x6f/0x80 [ 295.002046] [<c0149a84>] check_prev_add+0x34/0x200 [ 295.002046] [<c0149ce5>] check_prevs_add+0x95/0xe0 [ 295.002046] [<c0149f6f>] validate_chain+0x23f/0x320 [ 295.002046] [<c014b7b1>] __lock_acquire+0x1c1/0x760 [ 295.002046] [<c014c349>] lock_acquire+0x79/0xb0 [ 295.002046] [<f89a090a>] ? rfcomm_dev_state_change+0x6a/0xd0 [rfcomm] [ 295.002046] [<c03d6639>] _read_lock+0x39/0x80 [ 295.002046] [<f89a090a>] ? rfcomm_dev_state_change+0x6a/0xd0 [rfcomm] [ 295.002046] [<f89a090a>] rfcomm_dev_state_change+0x6a/0xd0 [rfcomm] [ 295.002046] [<f899c548>] __rfcomm_dlc_close+0x58/0xd0 [rfcomm] [ 295.002046] [<f899d44f>] rfcomm_recv_ua+0x6f/0x120 [rfcomm] [ 295.002046] [<f899e061>] rfcomm_recv_frame+0x171/0x1e0 [rfcomm] [ 295.002046] [<c014abd9>] ? trace_hardirqs_on+0xb9/0x130 [ 295.002046] [<c03d6e89>] ? _spin_unlock_irqrestore+0x39/0x70 [ 295.002046] [<f899e357>] rfcomm_run+0xe7/0x550 [rfcomm] [ 295.002046] [<c03d4559>] ? __sched_text_start+0x229/0x4c0 [ 295.002046] [<c0120000>] ? cpu_avg_load_per_task+0x20/0x30 [ 295.002046] [<f899e270>] ? rfcomm_run+0x0/0x550 [rfcomm] [ 295.002046] [<c013c18c>] kthread+0x5c/0xa0 [ 295.002046] [<c013c130>] ? kthread+0x0/0xa0 [ 295.002046] [<c0105c07>] kernel_thread_helper+0x7/0x10 [ 295.002046] ======================= Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-01bluetooth : use lockdep sub-classes for diffrent bluetooth protocolDave Young
'rfcomm connect' will trigger lockdep warnings which is caused by locking diffrent kinds of bluetooth sockets at the same time. So using sub-classes per AF_BLUETOOTH sub-type for lockdep. Thanks for the hints from dave jones. --- > From: Dave Jones <davej@codemonkey.org.uk> > Date: Thu, 27 Mar 2008 12:21:56 -0400 > > > Mar 27 08:10:57 localhost kernel: Pid: 3611, comm: obex-data-serve Not tainted 2.6.25-0.121.rc5.git4.fc9 #1 > > Mar 27 08:10:57 localhost kernel: [__lock_acquire+2287/3089] __lock_acquire+0x8ef/0xc11 > > Mar 27 08:10:57 localhost kernel: [sched_clock+8/11] ? sched_clock+0x8/0xb > > Mar 27 08:10:57 localhost kernel: [lock_acquire+106/144] lock_acquire+0x6a/0x90 > > Mar 27 08:10:57 localhost kernel: [<f8bd9321>] ? l2cap_sock_bind+0x29/0x108 [l2cap] > > Mar 27 08:10:57 localhost kernel: [lock_sock_nested+182/198] lock_sock_nested+0xb6/0xc6 > > Mar 27 08:10:57 localhost kernel: [<f8bd9321>] ? l2cap_sock_bind+0x29/0x108 [l2cap] > > Mar 27 08:10:57 localhost kernel: [security_socket_post_create+22/27] ? security_socket_post_create+0x16/0x1b > > Mar 27 08:10:57 localhost kernel: [__sock_create+388/472] ? __sock_create+0x184/0x1d8 > > Mar 27 08:10:57 localhost kernel: [<f8bd9321>] l2cap_sock_bind+0x29/0x108 [l2cap] > > Mar 27 08:10:57 localhost kernel: [kernel_bind+10/13] kernel_bind+0xa/0xd > > Mar 27 08:10:57 localhost kernel: [<f8dad3d7>] rfcomm_dlc_open+0xc8/0x294 [rfcomm] > > Mar 27 08:10:57 localhost kernel: [lock_sock_nested+187/198] ? lock_sock_nested+0xbb/0xc6 > > Mar 27 08:10:57 localhost kernel: [<f8dae18c>] rfcomm_sock_connect+0x8b/0xc2 [rfcomm] > > Mar 27 08:10:57 localhost kernel: [sys_connect+96/125] sys_connect+0x60/0x7d > > Mar 27 08:10:57 localhost kernel: [__lock_acquire+1370/3089] ? __lock_acquire+0x55a/0xc11 > > Mar 27 08:10:57 localhost kernel: [sys_socketcall+140/392] sys_socketcall+0x8c/0x188 > > Mar 27 08:10:57 localhost kernel: [syscall_call+7/11] syscall_call+0x7/0xb --- Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-01[ROSE/AX25] af_rose: rose_release() fixJarek Poplawski
rose_release() doesn't release sockets properly, e.g. it skips sock_orphan(), so OOPSes are triggered in sock_def_write_space(), which was observed especially while ROSE skbs were kfreed from ax25_frames_acked(). There is also sock_hold() and lock_sock() added - similarly to ax25_release(). Thanks to Bernard Pidoux for substantial help in debugging this problem. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Reported-and-tested-by: Bernard Pidoux <bpidoux@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-01mac80211: correct use_short_preamble handlingVladimir Koutny
ERP IE bit for preamble mode is 0 for short and 1 for long, not the other way around. This fixes the value reported to the driver via bss_conf->use_short_preamble field. Signed-off-by: Vladimir Koutny <vlado@ksp.sk> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-04-01mac80211: trigger ieee80211_sta_work after opening interfaceJan Niehusmann
ieee80211_sta_work is disabled while network interface is down. Therefore, if you configure wireless parameters before bringing the interface up, these configurations are not yet effective and association fails. A workaround from userspace is calling a command like 'iwconfig wlan0 ap any' after the interface is brought up. To fix this behaviour, trigger execution of ieee80211_sta_work from ieee80211_open when in STA or IBSS mode. Signed-off-by: Jan Niehusmann <jan@gondor.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-31[LLC]: skb allocation size for responsesJoonwoo Park
Allocate the skb for llc responses with the received packet size by using the size adjustable llc_frame_alloc. Don't allocate useless extra payload. Cleanup magic numbers. So, this fixes oops. Reported by Jim Westfall: kernel: skb_over_panic: text:c0541fc7 len:1000 put:997 head:c166ac00 data:c166ac2f tail:0xc166b017 end:0xc166ac80 dev:eth0 kernel: ------------[ cut here ]------------ kernel: kernel BUG at net/core/skbuff.c:95! Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-31[IP] UDP: Use SEQ_START_TOKEN.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-31[IPV6] MCAST: Ensure to check multicast listener(s).YOSHIFUJI Hideaki
In ip6_mc_input(), we need to check whether we have listener(s) for the packet. After commit ae7bf20a6316272acfcaef5d265b18aaa54b41e4, all packets for multicast destinations are delivered to upper layer if IFF_PROMISC or IFF_ALLMULTI is set. In fact, bug was rather ancient; the original (before the commit) intent of the dev->flags check was to skip the ipv6_chk_mcast_addr() call, assuming L2 filters packets appropriately, but it was even not true. Let's explicitly check our multicast list. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28[LLC]: Kill llc_station_mac_sa symbol export.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28[INET]: inet_frag_evictor() must run with BH disabledDavid S. Miller
Based upon a lockdep trace from Dave Jones. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28[LLC]: station source mac addressJoonwoo Park
kill unnecessary llc_station_mac_sa. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28[LLC]: bogus llc packet lengthJoonwoo Park
discard llc packet which has bogus packet length. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28[NET]: Add preemption point in qdisc_runHerbert Xu
The qdisc_run loop is currently unbounded and runs entirely in a softirq. This is bad as it may create an unbounded softirq run. This patch fixes this by calling need_resched and breaking out if necessary. It also adds a break out if the jiffies value changes since that would indicate we've been transmitting for too long which starves other softirqs. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28[NET]: Don't send ICMP_FRAG_NEEDED for GSO packetsRusty Russell
Commit 9af3912ec9e30509b76cb376abb65a4d8af27df3 ("[NET] Move DF check to ip_forward") added a new check to send ICMP fragmentation needed for large packets. Unlike the check in ip_finish_output(), it doesn't check for GSO. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28bluetooth: replace deprecated RW_LOCK_UNLOCKED macrosRobert P. J. Day
The older RW_LOCK_UNLOCKED macros defeat lockdep state tracing so replace them with the newer __RW_LOCK_UNLOCKED macros. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27[LLC]: Restrict LLC sockets to rootPatrick McHardy
LLC currently allows users to inject raw frames, including IP packets encapsulated in SNAP. While Linux doesn't handle IP over SNAP, other systems do. Restrict LLC sockets to root similar to packet sockets. [ Modified Patrick's patch to use CAP_NEW_RAW --DaveM ] Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27[NETFILTER]: Replate direct proc_fops assignment with proc_create call.Denis V. Lunev
This elliminates infamous race during module loading when one could lookup proc entry without proc_fops assigned. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27[ESP]: Ensure IV is in linear part of the skb to avoid BUG() due to OOB accessThomas Graf
ESP does not account for the IV size when calling pskb_may_pull() to ensure everything it accesses directly is within the linear part of a potential fragment. This results in a BUG() being triggered when the both the IPv4 and IPv6 ESP stack is fed with an skb where the first fragment ends between the end of the esp header and the end of the IV. This bug was found by Dirk Nehring <dnehring@gmx.net> . Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26[IPSEC]: Fix BEET outputHerbert Xu
The IPv6 BEET output function is incorrectly including the inner header in the payload to be protected. This causes a crash as the packet doesn't actually have that many bytes for a second header. The IPv4 BEET output on the other hand is broken when it comes to handling an inner IPv6 header since it always assumes an inner IPv4 header. This patch fixes both by making sure that neither BEET output function touches the inner header at all. All access is now done through the protocol-independent cb structure. Two new attributes are added to make this work, the IP header length and the IPv4 option length. They're filled in by the inner mode's output function. Thanks to Joakim Koskela for finding this problem. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26[ICMP]: Dst entry leak in icmp_send host re-lookup code (v2).Pavel Emelyanov
Commit 8b7817f3a959ed99d7443afc12f78a7e1fcc2063 ([IPSEC]: Add ICMP host relookup support) introduced some dst leaks on error paths: the rt pointer can be forgotten to be put. Fix it bu going to a proper label. Found after net namespace's lo refused to unregister :) Many thanks to Den for valuable help during debugging. Herbert pointed out, that xfrm_lookup() will put the rtable in case of error itself, so the first goto fix is redundant. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26[AX25]: Remove obsolete references to BKL from TODO file.Robert P. J. Day
Given that there are no apparent calls to lock_kernel() or unlock_kernel() under net/ax25, delete the TODO reference related to that. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26[NET]: Fix multicast device ioctl checksPatrick McHardy
SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list method to determine whether it supports multicast. Drivers implementing secondary unicast support use set_rx_mode however. Check for both dev->set_multicast_mode and dev->set_rx_mode to determine multicast capabilities. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26[IRDA]: Store irnet_socket termios properly.David S. Miller
It should be a "struct ktermios" not a "struct termios". Based upon a build warning reported by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26[VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying devicePatrick McHardy
Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity or dev_change_flags. Setting it directly causes two unwanted effects: - the next dev_change_flags call will notice a difference between dev->gflags and the actual flags, enable promisc/allmulti mode and incorrectly update dev->gflags - this keeps the underlying device in promisc/allmulti mode until the VLAN device is deleted Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24[IPSEC]: Fix inter address family IPsec tunnel handling.Kazunori MIYAZAWA
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24[NEIGH]: Fix race between pneigh deletion and ipv6's ndisc_recv_ns (v3).Pavel Emelyanov
Proxy neighbors do not have any reference counting, so any caller of pneigh_lookup (unless it's a netlink triggered add/del routine) should _not_ perform any actions on the found proxy entry. There's one exception from this rule - the ipv6's ndisc_recv_ns() uses found entry to check the flags for NTF_ROUTER. This creates a race between the ndisc and pneigh_delete - after the pneigh is returned to the caller, the nd_tbl.lock is dropped and the deleting procedure may proceed. One of the fixes would be to add a reference counting, but this problem exists for ndisc only. Besides such a patch would be too big for -rc4. So I propose to introduce a __pneigh_lookup() which is supposed to be called with the lock held and use it in ndisc code to check the flags on alive pneigh entry. Changes from v2: As David noticed, Exported the __pneigh_lookup() to ipv6 module. The checkpatch generates a warning on it, since the EXPORT_SYMBOL does not follow the symbol itself, but in this file all the exports come at the end, so I decided no to break this harmony. Changes from v1: Fixed comments from YOSHIFUJI - indentation of prototype in header and the pndisc_check_router() name - and a compilation fix, pointed by Daniel - the is_routed was (falsely) considered as uninitialized by gcc. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-23sch_htb: fix "too many events" situationMartin Devera
HTB is event driven algorithm and part of its work is to apply scheduled events at proper times. It tried to defend itself from livelock by processing only limited number of events per dequeue. Because of faster computers some users already hit this hardcoded limit. This patch limits processing up to 2 jiffies (why not 1 jiffie ? because it might stop prematurely when only fraction of jiffie remains). Signed-off-by: Martin Devera <devik@cdi.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-23[ATM]: When proc_create() fails, do some error handling work and return -ENOMEM.Wang Chen
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22[9P] net/9p/trans_fd.c: remove unused variableJulia Lawall
The variable cb is initialized but never used otherwise. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ type T; identifier i; constant C; @@ ( extern T i; | - T i; <+... when != i - i = C; ...+> ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22[IPV6] net/ipv6/ndisc.c: remove unused variableJulia Lawall
The variable hlen is initialized but never used otherwise. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ type T; identifier i; constant C; @@ ( extern T i; | - T i; <+... when != i - i = C; ...+> ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22[IPV4] fib_trie: fix warning from rcu_assign_poingerStephen Hemminger
This gets rid of a warning caused by the test in rcu_assign_pointer. I tried to fix rcu_assign_pointer, but that devolved into a long set of discussions about doing it right that came to no real solution. Since the test in rcu_assign_pointer for constant NULL would never succeed in fib_trie, just open code instead. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22[TCP]: Let skbs grow over a page on fast peersHerbert Xu
While testing the virtio-net driver on KVM with TSO I noticed that TSO performance with a 1500 MTU is significantly worse compared to the performance of non-TSO with a 16436 MTU. The packet dump shows that most of the packets sent are smaller than a page. Looking at the code this actually is quite obvious as it always stop extending the packet if it's the first packet yet to be sent and if it's larger than the MSS. Since each extension is bound by the page size, this means that (given a 1500 MTU) we're very unlikely to construct packets greater than a page, provided that the receiver and the path is fast enough so that packets can always be sent immediately. The fix is also quite obvious. The push calls inside the loop is just an optimisation so that we don't end up doing all the sending at the end of the loop. Therefore there is no specific reason why it has to do so at MSS boundaries. For TSO, the most natural extension of this optimisation is to do the pushing once the skb exceeds the TSO size goal. This is what the patch does and testing with KVM shows that the TSO performance with a 1500 MTU easily surpasses that of a 16436 MTU and indeed the packet sizes sent are generally larger than 16436. I don't see any obvious downsides for slower peers or connections, but it would be prudent to test this extensively to ensure that those cases don't regress. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21[DLCI]: Fix tiny race between module unload and sock_ioctl.Pavel Emelyanov
This is a narrow pedantry :) but the dlci_ioctl_hook check and call should not be parted with the mutex lock. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21[IPV4]: Fix null dereference in ip_defragPhil Oester
Been seeing occasional panics in my testing of 2.6.25-rc in ip_defrag. Offending line in ip_defrag is here: net = skb->dev->nd_net where dev is NULL. Bisected the problem down to commit ac18e7509e7df327e30d6e073a787d922eaf211d ([NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces). Below patch (idea from Patrick McHardy) fixes the problem for me. Signed-off-by: Phil Oester <kernel@linuxace.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-20[IPV6] KCONFIG: Fix description about IPV6_TUNNEL.YOSHIFUJI Hideaki
Based on notice from "Colin" <colins@sjtu.edu.cn>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-20[TCP]: Fix shrinking windows with window scalingPatrick McHardy
When selecting a new window, tcp_select_window() tries not to shrink the offered window by using the maximum of the remaining offered window size and the newly calculated window size. The newly calculated window size is always a multiple of the window scaling factor, the remaining window size however might not be since it depends on rcv_wup/rcv_nxt. This means we're effectively shrinking the window when scaling it down. The dump below shows the problem (scaling factor 2^7): - Window size of 557 (71296) is advertised, up to 3111907257: IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...> - New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes below the last end: IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...> The number 40 results from downscaling the remaining window: 3111907257 - 3111841425 = 65832 65832 / 2^7 = 514 65832 % 2^7 = 40 If the sender uses up the entire window before it is shrunk, this can have chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq() will notice that the window has been shrunk since tcp_wnd_end() is before tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number. This will fail the receivers checks in tcp_sequence() however since it is before it's tp->rcv_wup, making it respond with a dupack. If both sides are in this condition, this leads to a constant flood of ACKs until the connection times out. Make sure the window is never shrunk by aligning the remaining window to the window scaling factor. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-20netpoll: zap_completion_queue: adjust skb->users counterJarek Poplawski
zap_completion_queue() retrieves skbs from completion_queue where they have zero skb->users counter. Before dev_kfree_skb_any() it should be non-zero yet, so it's increased now. Reported-and-tested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-20bridge: use time_before() in br_fdb_cleanup()Fabio Checconi
In br_fdb_cleanup() next_timer and this_timer are in jiffies, so they should be compared using the time_after() macro. Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-20[SCTP]: Fix a race between module load and protosw accessVlad Yasevich
There is a race is SCTP between the loading of the module and the access by the socket layer to the protocol functions. In particular, a list of addresss that SCTP maintains is not initialized prior to the registration with the protosw. Thus it is possible for a user application to gain access to SCTP functions before everything has been initialized. The problem shows up as odd crashes during connection initializtion when we try to access the SCTP address list. The solution is to refactor how we do registration and initialize the lists prior to registering with the protosw. Care must be taken since the address list initialization depends on some other pieces of SCTP initialization. Also the clean-up in case of failure now also needs to be refactored. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-20[NETFILTER]: ipt_recent: sanity check hit countDaniel Hokka Zakrisson
If a rule using ipt_recent is created with a hit count greater than ip_pkt_list_tot, the rule will never match as it cannot keep track of enough timestamps. This patch makes ipt_recent refuse to create such rules. With ip_pkt_list_tot's default value of 20, the following can be used to reproduce the problem. nc -u -l 0.0.0.0 1234 & for i in `seq 1 100`; do echo $i | nc -w 1 -u 127.0.0.1 1234; done This limits it to 20 packets: iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \ --rsource iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \ 60 --hitcount 20 --name test --rsource -j DROP While this is unlimited: iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \ --rsource iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \ 60 --hitcount 21 --name test --rsource -j DROP With the patch the second rule-set will throw an EINVAL. Reported-by: Sean Kennedy <skennedy@vcn.com> Signed-off-by: Daniel Hokka Zakrisson <daniel@hozac.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-20[NETFILTER]: nf_conntrack_h323: logical-bitwise & confusion in process_setup()Roel Kluin
logical-bitwise & confusion Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-17Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
2008-03-17[IPV4]: esp_output() misannotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-17[8021Q]: vlan_dev misannotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-17[SUNRPC]: net/* NULL noiseAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-17[SCTP]: fix misannotated __sctp_rcv_asconf_lookup()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-17[PKT_SCHED]: annotate cls_u32Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-17[NET] endianness noise: INADDR_ANYAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>