Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2008-01-17	Merge branch 'for-2.6.25' of ↵	Paul Mackerras
	master.kernel.org:/pub/scm/linux/kernel/git/olof/pasemi into for-2.6.25
2008-01-17	[POWERPC] Add of_find_matching_node() helper function	Grant Likely
	Similar to of_find_compatible_node(), of_find_matching_node() and for_each_matching_node() allow you to iterate over the device tree looking for specific nodes, except that they take of_device_id tables instead of strings. This also moves of_match_node() from driver/of/device.c to driver/of/base.c to colocate it with the of_find_matching_node which depends on it. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-15	libata: pata_platform: make probe and remove functions device type neutral	Anton Vorontsov
	Split pata_platform_{probe,remove} into two pieces: 1. pata_platform_{probe,remove} -- platform_device-dependant bits; 2. __ptata_platform_{probe,remove} -- device type neutral bits. This is done to not duplicate code for the OF-platform driver. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2007-12-21	Merge branch 'linux-2.6'	Paul Mackerras

2007-12-21	[POWERPC] Fix for via-pmu based backlight control	Benjamin Herrenschmidt
	This fixes a few issues with via-pmu based backlight control. First, it fixes a sign problem with the setup of the backlight curve since the `range' value there -can- (and will) go negative. Then, it reworks the interaction between this and the via-pmu sleep code to properly restore backlight on wakeup from sleep. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-20	dm: merge max_hw_sector	Neil Brown
	Make sure dm honours max_hw_sectors of underlying devices We still have no firm testing evidence in support of this patch but believe it may help to resolve some bug reports. - agk Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2007-12-19	[POWERPC] via-pmu: Kill sleep notifiers completely	Johannes Berg
	This kills off the remnants of the old sleep notifiers now that they are no longer used. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: x86: fix "Kernel panic - not syncing: IO-APIC + timer doesn't work!" genirq: revert lazy irq disable for simple irqs x86: also define AT_VECTOR_SIZE_ARCH x86: kprobes bugfix x86: jprobe bugfix timer: kernel/timer.c section fixes genirq: add unlocked version of set_irq_handler() clockevents: fix reprogramming decision in oneshot broadcast oprofile: op_model_athlon.c support for AMD family 10h barcelona performance counters
2007-12-18	genirq: add unlocked version of set_irq_handler()	Kevin Hilman
	Add unlocked version for use by irq_chip.set_type handlers which may wish to change handler to level or edge handler when IRQ type is changed. The normal set_irq_handler() call cannot be used because it tries to take irq_desc.lock which is already held when the irq_chip.set_type hook is called. Signed-off-by: Kevin Hilman <khilman@mvista.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-12-18	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block	Linus Torvalds
	* 'for-linus' of git://git.kernel.dk/linux-2.6-block: Cleanup umem driver: fix most checkpatch warnings, conform to kernel block: let elv_register() return void as-iosched: fix write batch start point as-iosched: fix incorrect comments block: use jiffies conversion functions in scsi_ioctl.c
2007-12-18	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: mmc: remove unused 'mode' from the mmc_host structure sdhci: support JMicron JMB38x chips sdhci: use PIO when DMA can't satisfy the request sdhci: don't warn about sdhci 2.0 controllers sdhci: describe quirks
2007-12-18	block: let elv_register() return void	Adrian Bunk
	elv_register() always returns 0, and there isn't anything it does where it should return an error (the only error condition is so grave that it's handled with a BUG_ON). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-12-17	Merge branch 'upstream-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: libata: fix ATAPI draining libata: update atapi_eh_request_sense() such that lbam/lbah contains buffer size libata-acpi: implement _GTF command filtering libata-acpi: improve _GTF execution error handling and reporting libata-acpi: improve ACPI disabling libata-acpi: implement dev->gtf_cache and evaluate _GTF right after _STM during resume libata-acpi: implement and use ata_acpi_init_gtm() libata-acpi: add new hooks ata_acpi_dissociate() and ata_acpi_on_disable() libata: ata_dev_disable() should be called from EH context libata: add more opcodes to ata.h libata: update ata_*_printk() macros such that level can be a variable libata-acpi: adjust constness in ata_acpi_gtm/stm() parameters sata_mv: improve warnings about Highpoint RocketRAID 23xx cards libata: add ST3160023AS / 3.42 to NCQ blacklist libata: clear link->eh_info.serror from ata_std_postreset() sata_sil: fix spurious IRQ handling
2007-12-17	Revert "hugetlb: Add hugetlb_dynamic_pool sysctl"	Nishanth Aravamudan
	This reverts commit 54f9f80d6543fb7b157d3b11e2e7911dc1379790 ("hugetlb: Add hugetlb_dynamic_pool sysctl") Given the new sysctl nr_overcommit_hugepages, the boolean dynamic pool sysctl is not needed, as its semantics can be expressed by 0 in the overcommit sysctl (no dynamic pool) and non-0 in the overcommit sysctl (pool enabled). (Needed in 2.6.24 since it reverts a post-2.6.23 userspace-visible change) Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Adam Litke <agl@us.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17	hugetlb: introduce nr_overcommit_hugepages sysctl	Nishanth Aravamudan
	hugetlb: introduce nr_overcommit_hugepages sysctl While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I became convinced that having a boolean sysctl was insufficient: 1) To support per-node control of hugepages, I have previously submitted patches to add a sysfs attribute related to nr_hugepages. However, with a boolean global value and per-mount quota enforcement constraining the dynamic pool, adding corresponding control of the dynamic pool on a per-node basis seems inconsistent to me. 2) Administration of the hugetlb dynamic pool with multiple hugetlbfs mount points is, arguably, more arduous than it needs to be. Each quota would need to be set separately, and the sum would need to be monitored. To ease the administration, and to help make the way for per-node control of the static & dynamic hugepage pool, I added a separate sysctl, nr_overcommit_hugepages. This value serves as a high watermark for the overall hugepage pool, while nr_hugepages serves as a low watermark. The boolean sysctl can then be removed, as the condition nr_overcommit_hugepages > 0 indicates the same administrative setting as hugetlb_dynamic_pool == 1 Quotas still serve as local enforcement of the size of the pool on a per-mount basis. A few caveats: 1) There is a race whereby the global surplus huge page counter is incremented before a hugepage has allocated. Another process could then try grow the pool, and fail to convert a surplus huge page to a normal huge page and instead allocate a fresh huge page. I believe this is benign, as no memory is leaked (the actual pages are still tracked correctly) and the counters won't go out of sync. 2) Shrinking the static pool while a surplus is in effect will allow the number of surplus huge pages to exceed the overcommit value. As long as this condition holds, however, no more surplus huge pages will be allowed on the system until one of the two sysctls are increased sufficiently, or the surplus huge pages go out of use and are freed. Successfully tested on x86_64 with the current libhugetlbfs snapshot, modified to use the new sysctl. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Adam Litke <agl@us.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17	apm_event{,info}_t are userspace types	Adam Jackson
	These types define the size of data read from /dev/apm_bios. They should not be hidden behind #ifdef __KERNEL__. This is killing my xserver compile, apm_event_t is used in the xserver source. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17	fix headers_install	Andrew Morton
	make[3]: *** No rule to make target `/usr/src/devel/include/linux/ticable.h', needed by `/usr/src/devel/usr/include/linux/ticable.h'. Stop. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17	libata: fix ATAPI draining	Tejun Heo
	With ATAPI transfer chunk size properly programmed, libata PIO HSM should be able to handle full spurious data chunks. Also, it's a good idea to suppress trailing data warning for misc ATAPI commands as there can be many of them per command - for example, if the chunk size is 16 and the drive tries to transfer 510 bytes, there can be 31 trailing data messages. This patch makes the following updates to libata ATAPI PIO HSM implementation. * Make it drain full spurious chunks. * Suppress trailing data warning message for misc commands. * Put limit on how many bytes can be drained. * If odd, round up consumed bytes and the number of bytes to be drained. This gets the number of bytes to drain right for drivers which do 16bit PIO. This patch is partial backport of improve-ATAPI-data-xfer patchset pending for #upstream. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-17	libata-acpi: implement dev->gtf_cache and evaluate _GTF right after _STM ↵	Tejun Heo
	during resume On certain implementations, _GTF evaluation depends on preceding _STM and both can be pretty picky about the configuration. Using _GTM result cached during controller initialization satisfies the most neurotic _STM implementation. However, libata evaluates _GTF after reset during device configuration and the hardware state can be different from what _GTF expects and can cause evaluation failure. This patch adds dev->gtf_cache and updates ata_dev_get_GTF() such that it uses the cached value if available. Cache is cleared with a call to ata_acpi_clear_gtf(). Because for SATA ACPI nodes _GTF must be evaluated after _SDD which can't be done till IDENTIFY is complete, _GTF caching from ata_acpi_on_resume() is used only for IDE ACPI nodes. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-17	libata-acpi: implement and use ata_acpi_init_gtm()	Tejun Heo
	_GTM fetches currently configured transfer mode while _STM configures controller according to _GTM parameter and prepares transfer mode configuration TFs for _GTF. In many cases _GTM and _STM implementations are quite brittle and can't cope with configuration changed by libata. libata does not depend on ATA ACPI to configure devices. The only reason libata performs _GTM and _STM are to make _GTF evaluation succeed and libata also doesn't care about how _GTF TFs configure transfer mode. It overrides that configuration anyway, so from libata's POV, it doesn't matter what value is feeded to _STM as long as evaluation succeeds for _STM and following _GTF. This patch adds dev->__acpi_init_gtm and store initial _GTM values on host initialization before modified by reset and mode configuration. If the field is valid, ata_acpi_init_gtm() returns pointer to the saved _GTM structure; otherwise, NULL. This saved value is used for _STM during resume and peek at BIOS/firmware programmed initial timing for later use. The accessor is there to make building w/o ACPI easy as dev->__acpi_init doesn't exist if ACPI is not enabled. On driver detach, the initial BIOS configuration is restored by executing _STM with the initial _GTM values such that the next driver can also use the initial BIOS configured values. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-17	libata: add more opcodes to ata.h	Tejun Heo
	Add constants for DEVICE CONFIGURATION OVERLAY and SET_MAX to include/linux/ata.h. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-17	libata: update ata_*_printk() macros such that level can be a variable	Tejun Heo
	Make prink helpers format @lv together rather than prepending to the format string as constant. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-17	libata-acpi: adjust constness in ata_acpi_gtm/stm() parameters	Tejun Heo
	* No internal function uses const ata_port. Drop const from @ap. * Make ata_acpi_stm() copy @stm before using it and change @stm to const. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-17	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: HOWTO: update misspelling and word incorrected add stable_api_nonsense.txt in korean HOWTO: change addresses of maintainer and lxr url for Korean HOWTO Add Documentation for FAIR_USER_SCHED sysfs files HOWTO: Change man-page maintainer address for Japanese HOWTO tipar: remove obsolete module kobject: fix the documentation of how kobject_set_name works
2007-12-17	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: USB: revert portions of "UNUSUAL_DEV: Sync up some reported devices from Ubuntu" usb: Remove broken optimisation in OHCI IRQ handler USB: at91_udc: correct hanging while disconnecting usb cable USB: use IRQF_DISABLED for HCD interrupt handlers USB: fix locking loop by avoiding flush_scheduled_work usb.h: fix kernel-doc warning USB: option: Bind to the correct interface of the Huawei E220 USB: cp2101: new device id usb-storage: Fix devices that cannot handle 32k transfers USB: sierra: fix product id
2007-12-17	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: ide: fix ->io_32bit race in set_io_32bit() ide: remove stale changelog from ide-probe.c ide: remove stale changelog from ide-disk.c ide: remove dead code from __ide_dma_test_irq() hpt366: fix HPT37x PIO mode timings (take 2) pdc202xx_new: fix Promise TX4 support ide-cd: remove dead post_transform_command() ide: DMA reporting and validity checking fixes (take 3) ide: add /sys/bus/ide/devices/*/{model,firmware,serial} sysfs entries ide: coding style fixes for drivers/ide/setup-pci.c ide: fix ide_scan_pcibus() error message ide: deprecate CONFIG_BLK_DEV_OFFBOARD ide: add missing checks for control register existence ide-scsi: add ide_scsi_hex_dump() helper
2007-12-17	usb.h: fix kernel-doc warning	Randy Dunlap
	Fix kernel-doc warning in usb.h: Warning(linux-2.6.24-rc3-git7//include/linux/usb.h:166): No description found for parameter 'sysfs_files_created' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-17	usb-storage: Fix devices that cannot handle 32k transfers	Doug Maxey
	When a device cannot handle the smallest previously limited transfer size (64 blocks) without stalling, limit the device to the amount of packets that fit in a platform native page. The lowest possible limit is PAGE_CACHE_SIZE, so if the device is ever used on a platform that has larger than 8K pages, you lose unless you can convince the device firmware folks to fix the issue. Cc: Mathew Dharm <mdharm-scsi@one-eyed-alien.net> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Pete Zaitcev <zaitcev@redhat.com> Signed-off-by: Doug Maxey <dwm@austin.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-17	tipar: remove obsolete module	Romain Liévin
	tipar: remove obsolete module The tipar character driver was used to implement bit-banging access to Texas Instruments parallel link cable. A user-land method now exists thru PPDEV & PARPORT. Signed-off-by: Romain Liévin <roms@lpg.ticalc.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	[NETFILTER]: bridge: fix missing link layer headers on outgoing routed packets	Patrick McHardy
	As reported by Damien Thebault, the double POSTROUTING hook invocation fix caused outgoing packets routed between two bridges to appear without a link-layer header. The reason for this is that we're skipping the br_nf_post_routing hook for routed packets now and don't save the original link layer header, but nevertheless tries to restore it on output, causing corruption. The root cause for this is that skb->nf_bridge has no clearly defined lifetime and is used to indicate all kind of things, but that is quite complicated to fix. For now simply don't touch these packets and handle them like packets from any other device. Tested-by: Damien Thebault <damien.thebault@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-12	ide: DMA reporting and validity checking fixes (take 3)	Bartlomiej Zolnierkiewicz
	* ide_xfer_verbose() fixups: - beautify returned mode names - fix PIO5 reporting - make it return 'const char ' Change printk() level from KERN_DEBUG to KERN_INFO in ide_find_dma_mode(). * Add ide_id_dma_bug() helper based on ide_dma_verbose() to check for invalid DMA info in identify block. * Use ide_id_dma_bug() in ide_tune_dma() and ide_driveid_update(). As a result DMA won't be tuned or will be disabled after tuning if device reports inconsistent info about enabled DMA mode (ide_dma_verbose() does the same checks while the IDE device is probed by ide-{cd,disk} device driver). * Remove no longer needed ide_dma_verbose(). This patch should fix the following problem with out-of-sync IDE messages reported by Nick Warne: hdd: ATAPI 48X DVD-ROM DVD-R-RAM CD-R/RW drive, 2048kB Cache<7>hdd: skipping word 93 validity check , UDMA(66) and later debugged by Mark Lord to be caused by: ide_dma_verbose() printk( ... "2048kB Cache"); eighty_ninty_three() printk(KERN_DEBUG "%s: skipping word 93 validity check\n"); ide_dma_verbose() printk(", UDMA(66)" Please note that as a result ide-{cd,disk} device drivers won't report the DMA speed used but this is intended since now DMA mode being used is always reported by IDE core code. v2: * fixes suggested by Randy: - use KERN_CONT for printk()-s in ide-{cd,disk}.c - don't remove argument name from ide_xfer_verbose() declaration v3: * Remove incorrect check for (id->field_valid & 1) from ide_id_dma_bug() (spotted by Sergei). * "XFER SLOW" -> "PIO SLOW" in ide_xfer_verbose() (suggested by Sergei). * Fix ide_find_dma_mode() to report the correct mode ('mode' after being limited by 'req_mode'). Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Nick Warne <nick@ukfsn.org> Cc: Mark Lord <lkml@rtr.ca> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-12-12	mmc: remove unused 'mode' from the mmc_host structure	Nicolas Pitre
	This field and corresponding defines are simply never used anywhere in the code. But its mere presence is enough to confuse some host driver authors who attempt to rely on it. Let's eliminate the possibility for confusion and remove it entirely. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-12-12	sdhci: support JMicron JMB38x chips	Pierre Ossman
	The JMicron JMB38x chip doesn't support transfers that aren't 32-bit aligned (both size and start address). It also doesn't like switching between PIO and DMA mode, so it needs to be reset after each request. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-12-11	[POWERPC] Add for_each_child_of_node() helper for iterating over child nodes	Michael Ellerman
	Add for_each_child_of_node() to encapsulate the common idiom of iterating over the children of a device_node. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-07	bonding: Add new layer2+3 hash for xor/802.3ad modes	Jay Vosburgh
	Add new hash for balance-xor and 802.3ad modes. Originally submitted by "Glenn Griffin" <ggriffin.kernel@gmail.com>; modified by Jay Vosburgh to move setting of hash policy out of line, tweak the documentation update and add version update to 3.2.2. Glenn's original comment follows: Included is a patch for a new xmit_hash_policy for the bonding driver that selects slaves based on MAC and IP information. This is a middle ground between what currently exists in the layer2 only policy and the layer3+4 policy. This policy strives to be fully 802.3ad compliant by transmitting every packet of any particular flow over the same link. As documented the layer3+4 policy is not fully compliant for extreme cases such as ip fragmentation, so this policy is a nice compromise for environments that require full compliance but desire more than the layer2 only policy. Signed-off-by: "Glenn Griffin" <ggriffin.kernel@gmail.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-07	leds: Fix led trigger locking bugs	Richard Purdie
	Convert part of the led trigger core from rw spinlocks to rw semaphores. We're calling functions which can sleep from invalid contexts otherwise. Fixes bug #9264. Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
2007-12-05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: futex: correctly return -EFAULT not -EINVAL lockdep: in_range() fix lockdep: fix debug_show_all_locks() sched: style cleanups futex: fix for futex_wait signal stack corruption
2007-12-05	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: VM/Security: add security hook to do_brk Security: round mmap hint address above mmap_min_addr security: protect from stack expantion into low vm addresses Security: allow capable check to permit mmap or low vm space SELinux: detect dead booleans SELinux: do not clear f_op when removing entries
2007-12-05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: [LRO]: fix lro_gen_skb() alignment [TCP]: NAGLE_PUSH seems to be a wrong way around [TCP]: Move prior_in_flight collect to more robust place [TCP] FRTO: Use of existing funcs make code more obvious & robust [IRDA]: Move ircomm_tty_line_info() under #ifdef CONFIG_PROC_FS [ROSE]: Trivial compilation CONFIG_INET=n case [IPVS]: Fix sched registration race when checking for name collision. [IPVS]: Don't leak sysctl tables if the scheduler registration fails.
2007-12-05	proc: fix proc_dir_entry refcounting	Alexey Dobriyan
	Creating PDEs with refcount 0 and "deleted" flag has problems (see below). Switch to usual scheme: * PDE is created with refcount 1 * every de_get does +1 * every de_put() and remove_proc_entry() do -1 * once refcount reaches 0, PDE is freed. This elegantly fixes at least two following races (both observed) without introducing new locks, without abusing old locks, without spreading lock_kernel(): 1) PDE leak remove_proc_entry de_put ----------------- ------ [refcnt = 1] if (atomic_read(&de->count) == 0) if (atomic_dec_and_test(&de->count)) if (de->deleted) /* also not taken! / free_proc_entry(de); else de->deleted = 1; [refcount=0, deleted=1] 2) use after free remove_proc_entry de_put ----------------- ------ [refcnt = 1] if (atomic_dec_and_test(&de->count)) if (atomic_read(&de->count) == 0) free_proc_entry(de); / boom! / if (de->deleted) free_proc_entry(de); BUG: unable to handle kernel paging request at virtual address 6b6b6b6b printing eip: c10acdda pdpt = 00000000338f8001 *pde = 0000000000000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4) EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1 EIP is at strnlen+0x6/0x18 EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000) Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400 c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400 f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34 Call Trace: [<c10ac4f0>] vsnprintf+0x2ad/0x49b [<c10ac779>] vscnprintf+0x14/0x1f [<c1018e6b>] vprintk+0xc5/0x2f9 [<c10379f1>] handle_fasteoi_irq+0x0/0xab [<c1004f44>] do_IRQ+0x9f/0xb7 [<c117db3b>] preempt_schedule_irq+0x3f/0x5b [<c100264e>] need_resched+0x1f/0x21 [<c10190ba>] printk+0x1b/0x1f [<c107c8ad>] de_put+0x3d/0x50 [<c107c8f8>] proc_delete_inode+0x38/0x41 [<c107c8c0>] proc_delete_inode+0x0/0x41 [<c1066298>] generic_delete_inode+0x5e/0xc6 [<c1065aa9>] iput+0x60/0x62 [<c1063c8e>] d_kill+0x2d/0x46 [<c1063fa9>] dput+0xdc/0xe4 [<c10571a1>] __fput+0xb0/0xcd [<c1054e49>] filp_close+0x48/0x4f [<c1055ee9>] sys_close+0x67/0xa5 [<c10026b6>] sysenter_past_esp+0x5f/0x85 ======================= Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9 EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44 Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds, module is already pinned and remove_proc_entry() can't happen => nobody can mark PDE deleted. Dummy proc root in netns code is not marked with refcount 1. AFAICS, we never get it, it's just for proper /proc/net removal. I double checked CLONE_NETNS continues to work. Patch survives many hours of modprobe/rmmod/cat loops without new bugs which can be attributed to refcounting. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05	jbd: Fix assertion failure in fs/jbd/checkpoint.c	Jan Kara
	Before we start committing a transaction, we call __journal_clean_checkpoint_list() to cleanup transaction's written-back buffers. If this call happens to remove all of them (and there were already some buffers), __journal_remove_checkpoint() will decide to free the transaction because it isn't (yet) a committing transaction and soon we fail some assertion - the transaction really isn't ready to be freed :). We change the check in __journal_remove_checkpoint() to free only a transaction in T_FINISHED state. The locking there is subtle though (as everywhere in JBD ;(). We use j_list_lock to protect the check and a subsequent call to __journal_drop_transaction() and do the same in the end of journal_commit_transaction() which is the only place where a transaction can get to T_FINISHED state. Probably I'm too paranoid here and such locking is not really necessary - checkpoint lists are processed only from log_do_checkpoint() where a transaction must be already committed to be processed or from __journal_clean_checkpoint_list() where kjournald itself calls it and thus transaction cannot change state either. Better be safe if something changes in future... Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05	futex: fix for futex_wait signal stack corruption	Steven Rostedt
	David Holmes found a bug in the -rt tree with respect to pthread_cond_timedwait. After trying his test program on the latest git from mainline, I found the bug was there too. The bug he was seeing that his test program showed, was that if one were to do a "Ctrl-Z" on a process that was in the pthread_cond_timedwait, and then did a "bg" on that process, it would return with a "-ETIMEDOUT" but early. That is, the timer would go off early. Looking into this, I found the source of the problem. And it is a rather nasty bug at that. Here's the relevant code from kernel/futex.c: (not in order in the file) [...] smlinkage long sys_futex(u32 __user uaddr, int op, u32 val, struct timespec __user utime, u32 __user uaddr2, u32 val3) { struct timespec ts; ktime_t t, tp = NULL; u32 val2 = 0; int cmd = op & FUTEX_CMD_MASK; if (utime && (cmd == FUTEX_WAIT \|\| cmd == FUTEX_LOCK_PI)) { if (copy_from_user(&ts, utime, sizeof(ts)) != 0) return -EFAULT; if (!timespec_valid(&ts)) return -EINVAL; t = timespec_to_ktime(ts); if (cmd == FUTEX_WAIT) t = ktime_add(ktime_get(), t); tp = &t; } [...] return do_futex(uaddr, op, val, tp, uaddr2, val2, val3); } [...] long do_futex(u32 __user uaddr, int op, u32 val, ktime_t timeout, u32 __user uaddr2, u32 val2, u32 val3) { int ret; int cmd = op & FUTEX_CMD_MASK; struct rw_semaphore fshared = NULL; if (!(op & FUTEX_PRIVATE_FLAG)) fshared = &current->mm->mmap_sem; switch (cmd) { case FUTEX_WAIT: ret = futex_wait(uaddr, fshared, val, timeout); [...] static int futex_wait(u32 __user uaddr, struct rw_semaphore fshared, u32 val, ktime_t abs_time) { [...] struct restart_block restart; restart = &current_thread_info()->restart_block; restart->fn = futex_wait_restart; restart->arg0 = (unsigned long)uaddr; restart->arg1 = (unsigned long)val; restart->arg2 = (unsigned long)abs_time; restart->arg3 = 0; if (fshared) restart->arg3 \|= ARG3_SHARED; return -ERESTART_RESTARTBLOCK; [...] static long futex_wait_restart(struct restart_block restart) { u32 __user uaddr = (u32 __user )restart->arg0; u32 val = (u32)restart->arg1; ktime_t abs_time = (ktime_t )restart->arg2; struct rw_semaphore fshared = NULL; restart->fn = do_no_restart_syscall; if (restart->arg3 & ARG3_SHARED) fshared = &current->mm->mmap_sem; return (long)futex_wait(uaddr, fshared, val, abs_time); } So when the futex_wait is interrupt by a signal we break out of the hrtimer code and set up or return from signal. This code does not return back to userspace, so we set up a RESTARTBLOCK. The bug here is that we save the "abs_time" which is a pointer to the stack variable "ktime_t t" from sys_futex. This returns and unwinds the stack before we get to call our signal. On return from the signal we go to futex_wait_restart, where we update all the parameters for futex_wait and call it. But here we have a problem where abs_time is no longer valid. I verified this with print statements, and sure enough, what abs_time was set to ends up being garbage when we get to futex_wait_restart. The solution I did to solve this (with input from Linus Torvalds) was to add unions to the restart_block to allow system calls to use the restart with specific parameters. This way the futex code now saves the time in a 64bit value in the restart block instead of storing it on the stack. Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64 in thread_info.h, when there's a #ifdef __KERNEL__ just below that. Not sure what that is there for. If this turns out to be a problem, I've tested this with using "unsigned int" for u32 and "unsigned long long" for u64 and it worked just the same. I'm using u32 and u64 just to be consistent with what the futex code uses. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05	[LRO]: fix lro_gen_skb() alignment	Andrew Gallatin
	Add a field to the lro_mgr struct so that drivers can specify how much padding is required to align layer 3 headers when a packet is copied into a freshly allocated skb by inet_lro.c:lro_gen_skb(). Without padding, skbs generated by LRO will cause alignment warnings on architectures which require strict alignment (seen on sparc64). Myri10GE is updated to use this field. Signed-off-by: Andrew Gallatin <gallatin@myri.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-06	Security: round mmap hint address above mmap_min_addr	Eric Paris
	If mmap_min_addr is set and a process attempts to mmap (not fixed) with a non-null hint address less than mmap_min_addr the mapping will fail the security checks. Since this is just a hint address this patch will round such a hint address above mmap_min_addr. gcj was found to try to be very frugal with vm usage and give hint addresses in the 8k-32k range. Without this patch all such programs failed and with the patch they happily get a higher address. This patch is wrappad in CONFIG_SECURITY since mmap_min_addr doesn't exist without it and there would be no security check possible no matter what. So we should not bother compiling in this rounding if it is just a waste of time. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2007-12-04	PHY: Add the phy_device_release device method.	Anton Vorontsov
	Lately I've got this nice badness on mdio bus removal: Device 'e0103120:06' does not have a release() function, it is broken and must be fixed. ------------[ cut here ]------------ Badness at drivers/base/core.c:107 NIP: c015c1a8 LR: c015c1a8 CTR: c0157488 REGS: c34bdcf0 TRAP: 0700 Not tainted (2.6.23-rc5-g9ebadfbb-dirty) MSR: 00029032 <EE,ME,IR,DR> CR: 24088422 XER: 00000000 ... [c34bdda0] [c015c1a8] device_release+0x78/0x80 (unreliable) [c34bddb0] [c01354cc] kobject_cleanup+0x80/0xbc [c34bddd0] [c01365f0] kref_put+0x54/0x6c [c34bdde0] [c013543c] kobject_put+0x24/0x34 [c34bddf0] [c015c384] put_device+0x1c/0x2c [c34bde00] [c0180e84] mdiobus_unregister+0x2c/0x58 ... Though actually there is nothing broken, it just device subsystem core expects another "pattern" of resource managment. This patch implement phy device's release function, thus we're getting rid of this badness. Also small hidden bug fixed, hope none other introduced. ;-) Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Andy Fleming <afleming@freescale.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-03	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: cpu accounting controller (V2)
2007-12-03	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (27 commits) [INET]: Fix inet_diag dead-lock regression [NETNS]: Fix /proc/net breakage [TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure [NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK [NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON [DECNET]: dn_nl_deladdr() almost always returns no error [IPV6]: Restore IPv6 when MTU is big enough [RXRPC]: Add missing select on CRYPTO mac80211: rate limit wep decrypt failed messages rfkill: fix double-mutex-locking mac80211: drop unencrypted frames if encryption is expected mac80211: Fix behavior of ieee80211_open and ieee80211_close ieee80211: fix unaligned access in ieee80211_copy_snap mac80211: free ifsta->extra_ie and clear IEEE80211_STA_PRIVACY_INVOKED SCTP: Fix build issues with SCTP AUTH. SCTP: Fix chunk acceptance when no authenticated chunks were listed. SCTP: Fix the supported extensions paramter SCTP: Fix SCTP-AUTH to correctly add HMACS paramter. SCTP: Fix the number of HB transmissions. [TCP] illinois: Incorrect beta usage ...
2007-12-02	sched: cpu accounting controller (V2)	Srivatsa Vaddagiri
	Commit cfb5285660aad4931b2ebbfa902ea48a37dfffa1 removed a useful feature for us, which provided a cpu accounting resource controller. This feature would be useful if someone wants to group tasks only for accounting purpose and doesnt really want to exercise any control over their cpu consumption. The patch below reintroduces the feature. It is based on Paul Menage's original patch (Commit 62d0df64065e7c135d0002f069444fbdfc64768f), with these differences: - Removed load average information. I felt it needs more thought (esp to deal with SMP and virtualized platforms) and can be added for 2.6.25 after more discussions. - Convert group cpu usage to be nanosecond accurate (as rest of the cfs stats are) and invoke cpuacct_charge() from the respective scheduler classes - Make accounting scalable on SMP systems by splitting the usage counter to be per-cpu - Move the code from kernel/cpu_acct.c to kernel/sched.c (since the code is not big enough to warrant a new file and also this rightly needs to live inside the scheduler. Also things like accessing rq->lock while reading cpu usage becomes easier if the code lived in kernel/sched.c) The patch also modifies the cpu controller not to provide the same accounting information. Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran some simple tests like cpuspin (spin on the cpu), ran several tasks in the same group and timed them. Compared their time stamps with cpuacct.usage. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-12-01	phylib: add PHY interface modes for internal delay for tx and rx only	Kim Phillips
	Allow phylib specification of cases where hardware needs to configure PHYs for Internal Delay only on either RX or TX (not both). Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Tested-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Li Yang <leoli@freescale.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-01	Merge branch 'master' into upstream-fixes	Jeff Garzik