Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2005-07-07	[PATCH] alpha(): pgprot_noncached	Andrew Morton
	The infiniband code expects that the arch implements pgprot_noncached(). We're mapping PCI areas anyway, so this probabyl wasn't needed and we should make infiniband stop doing that.. Cc: Roland Dreier <rolandd@cisco.com> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] mostly_read data section	Christoph Lameter
	Add a new section called ".data.read_mostly" for data items that are read frequently and rarely written to like cpumaps etc. If these maps are placed in the .data section then these frequenly read items may end up in cachelines with data is is frequently updated. In that case all processors in an SMP system must needlessly reload the cachelines again and again containing elements of those frequently used variables. The ability to share these cachelines will allow each cpu in an SMP system to keep local copies of those shared cachelines thereby optimizing performance. Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com> Signed-off-by: Christoph Lameter <christoph@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] propagate __nocast annotations	Alexey Dobriyan
	Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] page_uptodate locking scalability	Nick Piggin
	Use a bit spin lock in the first buffer of the page to synchronise asynch IO buffer completions, instead of the global page_uptodate_lock, which is showing some scalabilty problems. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] xtensa: remove old syscalls	Arnd Bergmann
	xtensa is now in -rc1, with the obsolete syscalls still in there, so I guess this about the last chance to correct the ABI. Applying the patch obviously breaks all sorts of user space binaries and probably also requires the appropriate changes to be made to libc. On the other hand, if a decision is made to keep the broken interface, it should at least be a conscious one instead of an oversight. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] uml: skas0 - separate kernel address space on stock hosts	Jeff Dike
	UML has had two modes of operation - an insecure, slow mode (tt mode) in which the kernel is mapped into every process address space which requires no host kernel modifications, and a secure, faster mode (skas mode) in which the UML kernel is in a separate host address space, which requires a patch to the host kernel. This patch implements something very close to skas mode for hosts which don't support skas - I'm calling this skas0. It provides the security of the skas host patch, and some of the performance gains. The two main things that are provided by the skas patch, /proc/mm and PTRACE_FAULTINFO, are implemented in a way that require no host patch. For the remote address space changing stuff (mmap, munmap, and mprotect), we set aside two pages in the process above its stack, one of which contains a little bit of code which can call mmap et al. To update the address space, the system call information (system call number and arguments) are written to the stub page above the code. The %esp is set to the beginning of the data, the %eip is set the the start of the stub, and it repeatedly pops the information into its registers and makes the system call until it sees a system call number of zero. This is to amortize the cost of the context switch across multiple address space updates. When the updates are done, it SIGSTOPs itself, and the kernel process continues what it was doing. For a PTRACE_FAULTINFO replacement, we set up a SIGSEGV handler in the child, and let it handle segfaults rather than nullifying them. The handler is in the same page as the mmap stub. The second page is used as the stack. The handler reads cr2 and err from the sigcontext, sticks them at the base of the stack in a faultinfo struct, and SIGSTOPs itself. The kernel then reads the faultinfo and handles the fault. A complication on x86_64 is that this involves resetting the registers to the segfault values when the process is inside the kill system call. This breaks on x86_64 because %rcx will contain %rip because you tell SYSRET where to return to by putting the value in %rcx. So, this corrupts $rcx on return from the segfault. To work around this, I added an arch_finish_segv, which on x86 does nothing, but which on x86_64 ptraces the child back through the sigreturn. This causes %rcx to be restored by sigreturn and avoids the corruption. Ultimately, I think I will replace this with the trick of having it send itself a blocked signal which will be unblocked by the sigreturn. This will allow it to be stopped just after the sigreturn, and PTRACE_SYSCALLed without all the back-and-forth of PTRACE_SYSCALLing it through sigreturn. This runs on a stock host, so theoretically (and hopefully), tt mode isn't needed any more. We need to make sure that this is better in every way than tt mode, though. I'm concerned about the speed of address space updates and page fault handling, since they involve extra round-trips to the child. We can amortize the round-trip cost for large address space updates by writing all of the operations to the data page and having the child execute them all at the same time. This will help fork and exec, but not page faults, since they involve only one page. I can't think of any way to help page faults, except to add something like PTRACE_FAULTINFO to the host. There is PTRACE_SIGINFO, but UML doesn't use siginfo for SIGSEGV (or anything else) because there isn't enough information in the siginfo struct to handle page faults (the faulting operation type is missing). Adding that would make PTRACE_SIGINFO a usable equivalent to PTRACE_FAULTINFO. As for the code itself: - The system call stub is in arch/um/kernel/sys-$(SUBARCH)/stub.S. It is put in its own section of the binary along with stub_segv_handler in arch/um/kernel/skas/process.c. This is manipulated with run_syscall_stub in arch/um/kernel/skas/mem_user.c. syscall_stub will execute any system call at all, but it's only used for mmap, munmap, and mprotect. - The x86_64 stub calls sigreturn by hand rather than allowing the normal sigreturn to happen, because the normal sigreturn is a SA_RESTORER in UML's address space provided by libc. Needless to say, this is not available in the child's address space. Also, it does a couple of odd pops before that which restore the stack to the state it was in at the time the signal handler was called. - There is a new field in the arch mmu_context, which is now a union. This is the pid to be manipulated rather than the /proc/mm file descriptor. Code which deals with this now checks proc_mm to see whether it should use the usual skas code or the new code. - userspace_tramp is now used to create a new host process for every UML process, rather than one per UML processor. It checks proc_mm and ptrace_faultinfo to decide whether to map in the pages above its stack. - start_userspace now makes CLONE_VM conditional on proc_mm since we need separate address spaces now. - switch_mm_skas now just sets userspace_pid[0] to the new pid rather than PTRACE_SWITCH_MM. There is an addition to userspace which updates its idea of the pid being manipulated each time around the loop. This is important on exec, when the pid will change underneath userspace(). - The stub page has a pte, but it can't be mapped in using tlb_flush because it is part of tlb_flush. This is why it's required for it to be mapped in by userspace_tramp. Other random things: - The stub section in uml.lds.S is page aligned. This page is written out to the backing vm file in setup_physmem because it is mapped from there into user processes. - There's some confusion with TASK_SIZE now that there are a couple of extra pages that the process can't use. TASK_SIZE is considered by the elf code to be the usable process memory, which is reasonable, so it is decreased by two pages. This confuses the definition of USER_PGDS_IN_LAST_PML4, making it too small because of the rounding down of the uneven division. So we round it to the nearest PGDIR_SIZE rather than the lower one. - I added a missing PT_SYSCALL_ARG6_OFFSET macro. - um_mmu.h was made into a userspace-usable file. - proc_mm and ptrace_faultinfo are globals which say whether the host supports these features. - There is a bad interaction between the mm.nr_ptes check at the end of exit_mmap, stack randomization, and skas0. exit_mmap will stop freeing pages at the PGDIR_SIZE boundary after the last vma. If the stack isn't on the last page table page, the last pte page won't be freed, as it should be since the stub ptes are there, and exit_mmap will BUG because there is an unfreed page. To get around this, TASK_SIZE is set to the next lowest PGDIR_SIZE boundary and mm->nr_ptes is decremented after the calls to init_stub_pte. This ensures that we know the process stack (and all other process mappings) will be below the top page table page, and thus we know that mm->nr_ptes will be one too many, and can be decremented. Things that need fixing: - We may need better assurrences that the stub code is PIC. - The stub pte is set up in init_new_context_skas. - alloc_pgdir is probably the right place. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] pm: fix u32 vs. pm_message_t confusion in cpufreq	Bernard Blackham
	Fix u32 vs pm_message_t confusion in cpufreq. Signed-off-by: Bernard Blackham <bernard@blackham.com.au> Signed-off-by: Pavel Machek <pavel@suse.cz> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] Fix up non-NUMA breakage in mmzone.h	Dave Jones
	If CONFIG_NUMA isn't set, we use the define in <linux/mmzone.h> for early_pfn_to_nid (which defines it to 0). Because of this, the prototype needs to move inside the CONFIG_NUMA too, or anal gcc's get really confused. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] Clean up numa defines in mmzone.h	Dave Jones
	The recent cleanups to asm-i386/mmzone.h were suboptimal nesting an ifdef of the same symbol. This patch removes some of the ifdef'ery to make things more readable again. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] MTRR suspend/resume cleanup	Shaohua Li
	There has been some discuss about solving the SMP MTRR suspend/resume breakage, but I didn't find a patch for it. This is an intent for it. The basic idea is moving mtrr initializing into cpu_identify for all APs (so it works for cpu hotplug). For BP, restore_processor_state is responsible for restoring MTRR. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] ppc64: Make idle_loop a ppc_md function	Michael Ellerman
	This patch adds an idle member to the ppc_md structure and calls it from cpu_idle(). If a platform leaves ppc_md.idle as null it will get the default idle loop default_idle(). Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] hvc_console: Register ops when setting up hvc_console	Milton Miller
	When registering the hvc console port, register a list of ops (read and write) to go with it, instead of calling fixed function names. This allows different ports to encode the data differently. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] hvc_console: Separate hvc_console and vio code 2	Milton Miller
	Remove all the vio device driver code from hvc_console.c This will allow us to separate hvsi, hvc, and allow hvc_console to be used without the ppc64 vio layer. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] hvc_console: Separate hvc_console and vio code	Milton Miller
	Separate the console setup routines of the hvc_console and the vio layer. Remove the call to find_init_vty from hvc_console.c. Fail the setup routine if the console doesn't exist, but register the console again when the specified channel is instantiated. This scheme maintains the print buffer semantics while eliminating callout and call back for the console code. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] ppc64: remove duplicate syscall reservation	Anton Blanchard
	We already have a prototype for sys_remap_file_pages (239) so there is no need to reserve it twice. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] ppc64: add ioprio syscalls	Anton Blanchard
	- Clean up sys32_getpriority comment. - Add ioprio syscalls, and sign extend 32bit versions. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] move ioprio syscalls into syscalls.h	Anton Blanchard
	- Make ioprio syscalls return long, like set/getpriority syscalls. - Move function prototypes into syscalls.h so we can pick them up in the 32/64bit compat code. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] ppc64: Turn runlatch on in exception entry	Anton Blanchard
	Enable the runlatch at the start of each exception. Unfortunately we are out of space in the 0x300 handler, so I added it a bit later. The SPR write is fairly expensive, perhaps we should cache the runlatch state in the paca and avoid the write when possible. We don't need to turn the runlatch off, we do that in the idle loop. Better to take the hit in the idle loop than for each exception exit. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] ppc64: Fix runlatch code to work on pseries machines	Anton Blanchard
	Not all ppc64 CPUs have the CTRL SPR, so we need a cputable feature for it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] print order information when OOM killing	Marcelo Tosatti
	Dump the current allocation order when OOM killing. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	[PATCH] export generic_drop_inode() to modules	Mark Fasheh
	OCFS2 wants to mark an inode which has been orphaned by another node so that during final iput it takes the correct path through the VFS and can pass through the OCFS2 delete_inode callback. Since i_nlink can get out of date with other nodes, the best way I see to accomplish this is by clearing i_nlink on those inodes at drop_inode time. Other than this small amount of work, nothing different needs to happen, so I think it would be cleanest to be able to just call generic_drop_inode at the end of the OCFS2 drop_inode callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07	Merge master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6	Linus Torvalds

2005-07-06	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6	Linus Torvalds

2005-07-06	[IA64] fix generic/up builds	Tony Luck
	Jesse Barnes provided the original version of this patch months ago, but other changes kept conflicting with it, so it got deferred. Greg Edwards dug it out of obscurity just over a week ago, and almost immediately another conflicting patch appeared (Bob Picco's memory-less nodes). I've resolved the conflicts and got it running again. CONFIG_SGI_TIOCX is set to "y" in defconfig, which causes a Tiger to not boot (oops in tiocx_init). But that can be resolved later ... get this in now before it gets stale again. Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-07-06	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds

2005-07-06	[SPARC64]: Fix enable_dma() in asm-sparc64/parport.h	Eddie C. Dost
	Call ebus_dma_enable() before calling ebus_dma_request(), otherwise ebus_dma_request() returns -EINVAL and enable_dma() calls BUG()... Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-06	Merge master.kernel.org:/home/rmk/linux-2.6-arm	Linus Torvalds

2005-07-06	Auto merge with /home/aegl/GIT/linus	Tony Luck

2005-07-06	[IA64] hotplug/ia64: SN Hotplug Driver - PREEMPT/pcibus_info fix	Prarit Bhargava
	This patch fixes an issue with the PROM and a kernel running with CONFIG_PREEMPT enabled. When CONFIG_PREEMPT is enabled, the size of a spinlock_t changes -- resulting in the PROM writing to an incorrect location. Signed-off-by: Prarit Bhargava <prarit@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-07-06	[IA64] hotplug/ia64: SN Hotplug Driver - SN Hotplug Driver code	Prarit Bhargava
	This patch is the SGI hotplug driver and additional changes required for the driver. These modifications include changes to the SN io_init.c code for memory management, the inclusion of new SAL calls to enable and disable PCI slots, and a hotplug-style driver. Signed-off-by: Prarit Bhargava <prarit@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-07-06	[IA64] hotplug/ia64: SN Hotplug Driver - new SN PROM version code	Prarit Bhargava
	This patch is a rewrite of the code to check the PROM version. The current code has some deficiences in the way PROM comparisons were made. The minimum value of PROM that will boot has also been changed to 4.04. Signed-off-by: Prarit Bhargava <prarit@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-07-06	[IA64] hotplug/ia64: SN Hotplug Driver: moving of header files	Prarit Bhargava
	This patch moves header files out of the arch/ia64/sn directories and into include/asm-ia64/sn. These files were being included by other subsystems and should be under include/asm-ia64/sn. Signed-off-by: Prarit Bhargava <prarit@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-07-06	[PATCH] ARM: 2792/1: IXP4xx iomap API implementation	Deepak Saxena
	Patch from Deepak Saxena This patch implements the iomap API for Intel IXP4xx NPU systems. We need to implement our own version of the API functions b/c of the PCI hostbridge does not provide the capability to map PCI I/O space into the CPU's physical memory space. In addition, if a system has more than 64M of PCI memory mapped BARs, PCI memory must also be accessed indirectly. This patch changes the assignment of PCI I/O resources to fall into to 0x0000:0xffff range so that we can trap I/O areas in our ioread/iowrite macros. Signed-off-by: Deepak Saxena Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-07-06	[IA64] hotplug/ia64: SN Hotplug Driver: SN IRQ Fixes	Prarit Bhargava
	This patch fixes the SN IRQ code such that cpu affinity and Hotplug can modify IRQ values. The sn_irq_info structures are now locked using a RCU lock mechanism to avoid lock contention in the lost interrupt WAR code. Signed-off-by: Prarit Bhargava <prarit@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-07-06	[CRYPTO] Ensure cit_iv is aligned correctly	Herbert Xu
	This patch ensures that cit_iv is aligned according to cra_alignmask by allocating it as part of the tfm structure. As a side effect the crypto layer will also guarantee that the tfm ctx area has enough space to be aligned by cra_alignmask. This allows us to remove the extra space reservation from the Padlock driver. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-06	[CRYPTO] Add alignmask for low-level cipher implementations	Herbert Xu
	The VIA Padlock device requires the input and output buffers to be aligned on 16-byte boundaries. This patch adds the alignmask attribute for low-level cipher implementations to indicate their alignment requirements. The mid-level crypt() function will copy the input/output buffers if they are not aligned correctly before they are passed to the low-level implementation. Strictly speaking, some of the software implementations require the buffers to be aligned on 4-byte boundaries as they do 32-bit loads. However, it is not clear whether it is better to copy the buffers or pay the penalty for unaligned loads/stores. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-06	[CRYPTO] Add support for low-level multi-block operations	Herbert Xu
	This patch adds hooks for cipher algorithms to implement multi-block ECB/CBC operations directly. This is expected to provide significant performance boots to the VIA Padlock. It could also be used for improving software implementations such as AES where operating on multiple blocks at a time may enable certain optimisations. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-06	[PATCH] openfirmware: generate device table for userspace	Jeff Mahoney
	This converts the usage of struct of_match to struct of_device_id, similar to pci_device_id. This allows a device table to be generated, which can be parsed by depmod(8) to generate a map file for module loading. In order for hotplug to work with macio devices, patches to module-init-tools and hotplug must be applied. Those patches are available at: ftp://ftp.suse.com/pub/people/jeffm/linux/macio-hotplug/ Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-05	[PATCH] kprobes: fix namespace problem and sparc64 build	Rusty Lynch
	The following renames arch_init, a kprobes function for performing any architecture specific initialization, to arch_init_kprobes in order to cleanup the namespace. Also, this patch adds arch_init_kprobes to sparc64 to fix the sparc64 kprobes build from the last return probe patch. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-05	[PATCH] ppc32: add Freescale MPC885ADS board support	Andrei Konovalov
	This patch adds the Freescale MPC86xADS board support. The supported devices are SMC UART and 10Mbit ethernet on SCC1. The manual for the board says that it "is compatible with the MPC8xxFADS for software point of view". That's why this patch extends FADS instead of introducing a new platform. FEC is not supported as the "combined FCC/FEC ethernet driver" driver by Pantelis Antoniou should replace the current FEC driver. Signed-off-by: Gennadiy Kurtsman <gkurtsman@ru.mvista.com> Signed-off-by: Andrei Konovalov <akonovalov@ru.mvista.com> Acked-by: Tom Rini <trini@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-05	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds

2005-07-05	[TCP]: Move to new TSO segmenting scheme.	David S. Miller
	Make TSO segment transmit size decisions at send time not earlier. The basic scheme is that we try to build as large a TSO frame as possible when pulling in the user data, but the size of the TSO frame output to the card is determined at transmit time. This is guided by tp->xmit_size_goal. It is always set to a multiple of MSS and tells sendmsg/sendpage how large an SKB to try and build. Later, tcp_write_xmit() and tcp_push_one() chop up the packet if necessary and conditions warrant. These routines can also decide to "defer" in order to wait for more ACKs to arrive and thus allow larger TSO frames to be emitted. A general observation is that TSO elongates the pipe, thus requiring a larger congestion window and larger buffering especially at the sender side. Therefore, it is important that applications 1) get a large enough socket send buffer (this is accomplished by our dynamic send buffer expansion code) 2) do large enough writes. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05	[TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling.	David S. Miller
	'nonagle' should be passed to the tcp_snd_test() function as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the tail of the write_queue. This is because Nagle does not apply to such frames since we cannot possibly tack more data onto them. However, while doing this __tcp_push_pending_frames() makes all of the packets in the write_queue use this modified 'nonagle' value. Fix the bug and simplify this function by just calling tcp_write_xmit() directly if sk_send_head is non-NULL. As a result, we can now make tcp_data_snd_check() just call tcp_push_pending_frames() instead of the specialized __tcp_data_snd_check(). Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05	[TCP]: Fix redundant calculations of tcp_current_mss()	David S. Miller
	tcp_write_xmit() uses tcp_current_mss(), but some of it's callers, namely __tcp_push_pending_frames(), already has this value available already. While we're here, fix the "cur_mss" argument to be "unsigned int" instead of plain "unsigned". Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05	[TCP]: Kill extra cwnd validate in __tcp_push_pending_frames().	David S. Miller
	The tcp_cwnd_validate() function should only be invoked if we actually send some frames, yet __tcp_push_pending_frames() will always invoke it. tcp_write_xmit() does the call for us, so the call here can simply be removed. Also, tcp_write_xmit() can be marked static. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05	[TCP]: Move __tcp_data_snd_check into tcp_output.c	David S. Miller
	It reimplements portions of tcp_snd_check(), so it we move it to tcp_output.c we can consolidate it's logic much easier in a later change. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05	[TCP]: Move send test logic out of net/tcp.h	David S. Miller
	This just moves the code into tcp_output.c, no code logic changes are made by this patch. Using this as a baseline, we can begin to untangle the mess of comparisons for the Nagle test et al. We will also be able to reduce all of the redundant computation that occurs when outputting data packets. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05	[TCP]: Fix quick-ack decrementing with TSO.	David S. Miller
	On each packet output, we call tcp_dec_quickack_mode() if the ACK flag is set. It drops tp->ack.quick until it hits zero, at which time we deflate the ATO value. When doing TSO, we are emitting multiple packets with ACK set, so we should decrement tp->ack.quick that many segments. Note that, unlike this case, tcp_enter_cwr() should not take the tcp_skb_pcount(skb) into consideration. That function, one time, readjusts tp->snd_cwnd and moves into TCP_CA_CWR state. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05	[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.	David S. Miller
	The ideal and most optimal layout for an SKB when doing scatter-gather is to put all the headers at skb->data, and all the user data in the page array. This makes SKB splitting and combining extremely simple, especially before a packet goes onto the wire the first time. So, when sk_stream_alloc_pskb() is given a zero size, make sure there is no skb_tailroom(). This is achieved by applying SKB_DATA_ALIGN() to the header length used here. Next, make select_size() in TCP output segmentation use a length of zero when NETIF_F_SG is true on the outgoing interface. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05	[NET]: Remove __ARGS from include/net/slhc_vj.h	Alexey Dobriyan
	I suspect "#define __ARGS(x) ()" was deprecated before I was born. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>