aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2008-04-10FRV: Don't make smp_{r, w, }mb() interpolate MEMBAR when CONFIG_SMP=n [try #2]David Howells
Don't make smp_{r,w,}mb() interpolate a MEMBAR instruction when CONFIG_SMP=n as SMP memory barries on UP systems should interpolate a compiler barrier only. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10FRV: Make NOMMU-mode work with base addresses other than 0xC0000000 [try #2]David Howells
Make NOMMU-mode work with base addresses other than 0xC0000000 by: (1) Giving the code that sets up the protection registers the right address in __sdram_base. Rather than being hard coded to 0xC0000000, the value of __page_offset is obtained from the linker script. (2) Eliminate the check in __switch_to() that verifies the current thread info is in the 0xCxxxxxxx region. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10FRV: Add support for emulation of userspace atomic ops [try #2]David Howells
Use traps 120-126 to emulate atomic cmpxchg32, xchg32, and XOR-, OR-, AND-, SUB- and ADD-to-memory operations for userspace. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10FRV: Move STACK_TOP_MAX up [try #2]David Howells
Move STACK_TOP_MAX up so that we don't try moving the stack above it as that causes setup_arg_pages() to malfunction. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10FRV: Handle update_mmu_cache() being called when current->mm is NULL [try #2]David Howells
Handle update_mmu_cache() being called when current->mm is NULL. We cache static TLB mappings for the current page table in DAMPR4 and DAMPR5 on the theory that the next data lookup is likely to be in the same general region, and thus is likely to be mapped by the same page table. However, we can't get this information if we can't access the appropriate mm_struct. If current->mm is NULL, we just clear the cache in the knowledge that the TLB miss handlers will load it. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6Linus Torvalds
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] Ensure "both" features2 slots are consistent [XFS] Fix superblock features2 field alignment problem [XFS] remove shouting-indirection macros from xfs_sb.h
2008-04-10Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: cfq-iosched: do not leak ioc_data across iosched switches splice: fix infinite loop in generic_file_splice_read()
2008-04-10HFS+: fix unlink of linksRoman Zippel
Some time ago while attempting to handle invalid link counts, I botched the unlink of links itself, so this patch fixes this now correctly, so that only the link count of nodes that don't point to links is ignored. Thanks to Vlado Plaga <rechner@vlado-do.de> to notify me of this problem. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10[IPV4]: Fix byte value boundary check in do_ip_getsockopt().David S. Miller
This fixes kernel bugzilla 10371. As reported by M.Piechaczek@osmosys.tv, if we try to grab a char sized socket option value, as in: unsigned char ttl = 255; socklen_t len = sizeof(ttl); setsockopt(socket, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, &len); getsockopt(socket, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, &len); The ttl returned will be wrong on big-endian, and on both little- endian and big-endian the next three bytes in userspace are written with garbage. It's because of this test in do_ip_getsockopt(): if (len < sizeof(int) && len > 0 && val>=0 && val<255) { It should allow a 'val' of 255 to pass here, but it doesn't so it copies a full 'int' back to userspace. On little-endian that will write the correct value into the location but it spams on the next three bytes in userspace. On big endian it writes the wrong value into the location and spams the next three bytes. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-10cfq-iosched: do not leak ioc_data across iosched switchesFabio Checconi
When switching scheduler from cfq, cfq_exit_queue() does not clear ioc->ioc_data, leaving a dangling pointer that can deceive the following lookups when the iosched is switched back to cfq. The pattern that can trigger that is the following: - elevator switch from cfq to something else; - module unloading, with elv_unregister() that calls cfq_free_io_context() on ioc freeing the cic (via the .trim op); - module gets reloaded and the elevator switches back to cfq; - reallocation of a cic at the same address as before (with a valid key). To fix it just assign NULL to ioc_data in __cfq_exit_single_io_context(), that is called from the regular exit path and from the elevator switching code. The only path that frees a cic and is not covered is the error handling one, but cic's freed in this way are never cached in ioc_data. Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-10[XFS] Ensure "both" features2 slots are consistentEric Sandeen
Since older kernels may look in the sb_bad_features2 slot for flags, rather than zeroing it out on fixup, we should make it equal to the sb_features2 value. Also, if the ATTR2 flag was not found prior to features2 fixup, it was not set in the mount flags, so re-check after the fixup so that the current session will use the feature. Also fix up the comments to reflect these changes. SGI-PV: 980085 SGI-Modid: xfs-linux-melb:xfs-kern:30778a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-10[XFS] Fix superblock features2 field alignment problemDavid Chinner
Due to the xfs_dsb_t structure not being 64 bit aligned, the last field of the on-disk superblock can vary in location This causes problems when the filesystem gets moved to a different platform, or there is a 32 bit userspace and 64 bit kernel. This patch detects the defect at mount time, logs a warning such as: XFS: correcting sb_features alignment problem in dmesg and corrects the problem so that everything is OK. it also blacklists the bad field in the superblock so it does not get used for something else later on. SGI-PV: 977636 SGI-Modid: xfs-linux-melb:xfs-kern:30539a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-10[XFS] remove shouting-indirection macros from xfs_sb.hEric Sandeen
Remove macro-to-small-function indirection from xfs_sb.h, and remove some which are completely unused. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30528a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-10splice: fix infinite loop in generic_file_splice_read()Jens Axboe
There's a quirky loop in generic_file_splice_read() that could go on indefinitely, if the file splice returns 0 permanently (and not just as a temporary condition). Get rid of the loop and pass back -EAGAIN correctly from __generic_file_splice_read(), so we handle that condition properly as well. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-09[SPARC]: Fix several regset and ptrace bugs.David S. Miller
1) ptrace should pass 'current' to task_user_regset_view() 2) When fetching general registers using a 64-bit view, and the target is 32-bit, we have to convert. 3) Skip the whole register window get/set code block if the user isn't asking to access anything in there. Otherwise we have problems if the user doesn't have an address space setup. Fetching ptrace register is still valid at such a time, and ptrace does not try to access the register window area of the regset. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-09pop previous section in alternative.cSteven Rostedt
gcc expects all toplevel assembly to return to the original section type. The code in alteranative.c does not do this. This caused some strange bugs in sched-devel where code would end up in the .rodata section and when the kernel sets the NX bit on all .rodata, the kernel would crash when executing this code. This patch adds a .previous marker to return the code back to the original section. Credit goes to Andrew Pinski for telling me it wasn't a gcc bug but a bug in the toplevel asm code in the kernel. ;-) Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: SELinux: don't BUG if fs reuses a superblock
2008-04-10SELinux: don't BUG if fs reuses a superblockEric Paris
I (wrongly) assumed that nfs_xdev_get_sb() would not ever share a superblock and so cloning mount options would always be correct. Turns out that isn't the case and we could fall over a BUG_ON() that wasn't a BUG at all. Since there is little we can do to reconcile different mount options this patch just leaves the sb alone and the first set of options wins. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: James Morris <jmorris@namei.org>
2008-04-09BNX2X: Correct bringing chip out of resetEliezer Tamir
Fixed bug: Wrong register was written to when bringing the chip out of reset. [ Bump driver version and release date -DaveM ] Signed-off-by: Eliezer Tamir <eliezert@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-09[NETFILTER]: nf_nat: autoload IPv4 connection trackingJan Engelhardt
Without this patch, the generic L3 tracker would kick in if nf_conntrack_ipv4 was not loaded before nf_nat, which would lead to translation problems with ICMP errors. NAT does not make sense without IPv4 connection tracking anyway, so just add a call to need_ipv4_conntrack(). Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-09[NETFILTER]: xt_hashlimit: fix mask calculationPatrick McHardy
Shifts larger than the data type are undefined, don't try to shift an u32 by 32. Also remove some special-casing of bitmasks divisible by 32. Based on patch by Jan Engelhardt <jengelh@computergmbh.de>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-09Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2008-04-09[XFRM]: xfrm_user: fix selector family initializationPatrick McHardy
Commit df9dcb45 ([IPSEC]: Fix inter address family IPsec tunnel handling) broke openswan by removing the selector initialization for tunnel mode in case it is uninitialized. This patch restores the initialization, fixing openswan, but probably breaking inter-family tunnels again (unknown since the patch author disappeared). The correct thing for inter-family tunnels is probably to simply initialize the selector family explicitly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-09rt61pci: rt61pci_beacon_update do not free skb twiceDaniel Wagner
The layer above will free the skb in an error case. Signed-off-by: Daniel Wagner <wagi@monom.org> Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-04-09Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: ata/sata_fsl: Remove unused variable in sata_fsl_probe pata_sil680: Fix build on arch/ppc
2008-04-09ssb-mipscore: Fix interrupt vectorsMichael Buesch
This fixes assignment of the interrupt vectors on the SSB MIPS core. Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-04-09ssb-pcicore: Fix IRQ TPS flag handlingLarry Finger
This fixes the TPS flag handling for the SSB pcicore driver. This fixes interrupts on some devices. Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-04-09mac80211: use short_preamble mode from capability if ERP IE not presentVladimir Koutny
When associating to a b-only AP where there is no ERP IE, short preamble mode is left at previous state (probably also protection mode). In this case, disable protection and use short preamble mode as specified in capability field. The same is done if capability field is changed on-the-fly. Signed-off-by: Vladimir Koutny <vlado@ksp.sk> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-04-09ata/sata_fsl: Remove unused variable in sata_fsl_probeJohann Felix Soden
In sata_fsl_probe memory is allocated but never used or deallocated. Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=10404 Thanks to Daniel Marjamäki for the bug report. Reported-by: Daniel Marjamäki <danielm77@spray.se> Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-04-09pata_sil680: Fix build on arch/ppcBenjamin Herrenschmidt
Commit 0f436eff54f90419ac1b8accfb3e6e17c4b49a4e breaks build on arch/ppc as it doesn't implement the machine_is() macro. This fixes it by using CONFIG_PPC_MERGE instead which represents arch/powerpc only, while CONFIG_PPC is set for both. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-04-08Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Fix a memory leak in rpc_create() fix bug - executing FDPIC ELF on NFS mount triggers BUG() at mm/nommu.c:862:/do_mmap_private() NFS: initialize flags field in nfs_open_context SUNRPC: don't call flush_dcache_page() with an invalid pointer
2008-04-08mtd/chips: add missing set_current_state() to cfi_{amdstd,staa}_sync()Dmitry Adamushko
cfi_amdstd_sync() and cfi_staa_sync() call schedule() without changing task's state appropriately. In case of e.g. chip->state == FL_ERASING, cfi_*_sync() will be busy-looping either redundantly for a fixed interval of time (for SCHED_NORMAL tasks) or possibly endlessly (for RT tasks and UP). Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-08spi: spi_bfin5xx: remove unused labelMichael Hennerich
Remove unused label, and associated compiler warning. Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-08spi: documentation tweaksDavid Brownell
Update SPI documentation to clarify some areas of recent confusion: clock polarity takes effect when chipselect goes active; and zero length buffers are OK in certain cases. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-08spi: spi_bfin5xx: fix probe() sequencingVitja Makarov
Fix bug in SPI probe: first initialize peripheral pins, and just after register spi master device. This fixes problems with SPI drivers built-in kernel. Singed-off-by: Vitja Makarov <vitja.makarov@gmail.com> Signed-off-by: Bryan Wu <cooloney@kernel.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-08spi: spi_bfin5xx build fixMike Frysinger
Fix breakage cause by overzealous line wrapping; there should be only one format string. Signed-off-by: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Bryan Wu <cooloney@kernel.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-08acpi: fix "buggy BIOS check" when CPUs are hot removedAlok Kataria
Fixes a BUG in ACPI hotplugging. processor_device_array[pr->id] needs to be set to NULL when removing a CPU. Else the "buggy BIOS check" in acpi_processor_start mistakenly fires when a CPU is removed from the system and then later re-added. Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Dan Arai <arai@vmware.com> Cc: Len Brown <lenb@kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-08es1968: fix sleep-while-holding-lock bugArjan van de Ven
snd_es1968_ac97_read() calls snd_es1968_ac97_wait() first outside a locked area, and later, while holding a lock. snd_es1968_ac97_wait() has a polling loop with a cond_resched() inside it.. which sleeps, so the second call is invalid. This patch adds a version of the wait function that just pure polls. While this is not very elegant in principle, it's very likely the easiest thing to do here, we already checked if the chip was ready (while yielding) just before, so it is very unlikely to take a long time here. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-08memcg: fix node_state handlingKAMEZAWA Hiroyuki
This should be N_NORMAL_MEMORY. N_NORMAL_MEMORY is "true" if a node has memory for the kernel. N_HIGH_MEMORY is "true" if a node has memory for HIGHMEM. (If CONFIG_HIGHMEM=n, always "true") This check is used for testing whether we can use kmalloc_node() on a node. Then, if there is a node which only contains HIGHMEM, the system will call kmalloc_node() which doesn't contain memory for the kernel. If it happens under SLUB, the kernel will panic. I think this only happens on x86_32-numa. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-08acpi thermal: fix result checkKrzysztof Helt
thermal_zone_device_register() uses the ERR_PTR macro on its return values. A correct check is to use the IS_ERR() macro. The 2.6.25 kernels panic on Compaq AP550 without this patch as it has more then 10 (THERMAL_MAX_TRIPS) trip points (there are 12). Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-08ub: remove BUG() after __blk_end_request and fix the condition causing itPete Zaitcev
When __blk_end_request returns nonzero, it means that the request was not completely processed and some BIOs are still attached. Since we have dequeued it by that time, it means leaking requests and hanging processes, which is why BUG() was in there. In ub this happens if a packet request ends normally, but with residue (e.g. when scsi_id issues INQUIRY). The fix is to make sure that arguments passed to __blk_end_request are correct: the full request length and not just transferred length. The transferred length is indicated to applications by adjusting rq->data_len with old, unchanged code outside of this patch. Signed-off-by: Pete Zaitcev <zaitcev@redhat.com> Cc: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Cc: Greg KH <greg@kroah.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-08SUNRPC: Fix a memory leak in rpc_create()Chuck Lever
Commit 510deb0d was supposed to move the xprt_create_transport() call in rpc_create(), but neglected to remove the old call site. This resulted in a transport leak after every rpc_create() call. This leak is present in 2.6.24 and 2.6.25. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-08fix bug - executing FDPIC ELF on NFS mount triggers BUG() at ↵Bryan Wu
mm/nommu.c:862:/do_mmap_private() NFS needs a NOMMU version mmap function to support uClinux on NOMMU machine http://blackfin.uclinux.org/gf/project/uclinux-dist/tracker/?action=TrackerItemEdit&tracker_id=141&tracker_item_id=3992 Signed-off-by: Bryan Wu <cooloney@kernel.org> Cc: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-08NFS: initialize flags field in nfs_open_contextJeff Layton
The nfs_open_context struct had a "flags" field added recently, but the allocator isn't initializing it. It also looks like the allocator isn't initializing the mode or list either, but they seem to be overwritten by the caller, so that's less of an issue. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-08SUNRPC: don't call flush_dcache_page() with an invalid pointerTrond Myklebust
Fix a problem in _copy_to_pages(), whereby it may call flush_dcache_page() with an invalid pointer due to the fact that 'pgto' gets incremented beyond the end of the page array. Fix is to exit the loop without this unnecessary increment of pgto. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-08[NET]: Undo code bloat in hot paths due to print_mac().David S. Miller
If print_mac() is used inside of a pr_debug() the compiler can't see that the call is redundant so still performs it even of pr_debug() ends up being a nop. So don't use print_mac() in such cases in hot code paths, use MAC_FMT et al. instead. As noted by Joe Perches, pr_debug() could be modified to handle this better, but that is a change to an interface used by the entire kernel and thus needs to be validated carefully. This here is thus the less risky fix for 2.6.25 Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-07[TCP]: Don't allow FRTO to take place while MTU is being probedIlpo Järvinen
MTU probe can cause some remedies for FRTO because the normal packet ordering may be violated allowing FRTO to make a wrong decision (it might not be that serious threat for anything though). Thus it's safer to not run FRTO while MTU probe is underway. It seems that the basic FRTO variant should also look for an skb at probe_seq.start to check if that's retransmitted one but I didn't implement it now (plain seqno in window check isn't robust against wraparounds). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-07[TCP]: tcp_simple_retransmit can cause S+LIlpo Järvinen
This fixes Bugzilla #10384 tcp_simple_retransmit does L increment without any checking whatsoever for overflowing S+L when Reno is in use. The simplest scenario I can currently think of is rather complex in practice (there might be some more straightforward cases though). Ie., if mss is reduced during mtu probing, it may end up marking everything lost and if some duplicate ACKs arrived prior to that sacked_out will be non-zero as well, leading to S+L > packets_out, tcp_clean_rtx_queue on the next cumulative ACK or tcp_fastretrans_alert on the next duplicate ACK will fix the S counter. More straightforward (but questionable) solution would be to just call tcp_reset_reno_sack() in tcp_simple_retransmit but it would negatively impact the probe's retransmission, ie., the retransmissions would not occur if some duplicate ACKs had arrived. So I had to add reno sacked_out reseting to CA_Loss state when the first cumulative ACK arrives (this stale sacked_out might actually be the explanation for the reports of left_out overflows in kernel prior to 2.6.23 and S+L overflow reports of 2.6.24). However, this alone won't be enough to fix kernel before 2.6.24 because it is building on top of the commit 1b6d427bb7e ([TCP]: Reduce sacked_out with reno when purging write_queue) to keep the sacked_out from overflowing. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-07[TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skbIlpo Järvinen
Fixes a long-standing bug which makes NewReno recovery crippled. With GSO the whole head skb was marked as LOST which is in violation of NewReno procedure that only wants to mark one packet and ended up breaking our TCP code by causing counter overflow because our code was built on top of assumption about valid NewReno procedure. This manifested as triggering a WARN_ON for the overflow in a number of places. It seems relatively safe alternative to just do nothing if tcp_fragment fails due to oom because another duplicate ACK is likely to be received soon and the fragmentation will be retried. Special thanks goes to Soeren Sonnenburg <kernel@nn7.de> who was lucky enough to be able to reproduce this so that the warning for the overflow was hit. It's not as easy task as it seems even if this bug happens quite often because the amount of outstanding data is pretty significant for the mismarkings to lead to an overflow. Because it's very late in 2.6.25-rc cycle (if this even makes in time), I didn't want to touch anything with SACK enabled here. Fragmenting might be useful for it as well but it's more or less a policy decision rather than mandatory fix. Thus there's no need to rush and we can postpone considering tcp_fragment with SACK for 2.6.26. In 2.6.24 and earlier, this very same bug existed but the effect is slightly different because of a small changes in the if conditions that fit to the patch's context. With them nothing got lost marker and thus no retransmissions happened. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-07[TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fackIlpo Järvinen
The fast retransmission can be forced locally to the rfc3517 branch in tcp_update_scoreboard instead of making such fragile constructs deeper in tcp_mark_head_lost. This is necessary for the next patch which must not have loopholes for cnt > packets check. As one can notice, readability got some improvements too because of this :-). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>