aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2008-12-03block: fix setting of max_segment_size and seg_boundary maskMilan Broz
Fix setting of max_segment_size and seg_boundary mask for stacked md/dm devices. When stacking devices (LVM over MD over SCSI) some of the request queue parameters are not set up correctly in some cases by default, namely max_segment_size and and seg_boundary mask. If you create MD device over SCSI, these attributes are zeroed. Problem become when there is over this mapping next device-mapper mapping - queue attributes are set in DM this way: request_queue max_segment_size seg_boundary_mask SCSI 65536 0xffffffff MD RAID1 0 0 LVM 65536 -1 (64bit) Unfortunately bio_add_page (resp. bio_phys_segments) calculates number of physical segments according to these parameters. During the generic_make_request() is segment cout recalculated and can increase bio->bi_phys_segments count over the allowed limit. (After bio_clone() in stack operation.) Thi is specially problem in CCISS driver, where it produce OOPS here BUG_ON(creq->nr_phys_segments > MAXSGENTRIES); (MAXSEGENTRIES is 31 by default.) Sometimes even this command is enough to cause oops: dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10 This command generates bios with 250 sectors, allocated in 32 4k-pages (last page uses only 1024 bytes). For LVM layer, it allocates bio with 31 segments (still OK for CCISS), unfortunatelly on lower layer it is recalculated to 32 segments and this violates CCISS restriction and triggers BUG_ON(). The patch tries to fix it by: * initializing attributes above in queue request constructor blk_queue_make_request() * make sure that blk_queue_stack_limits() inherits setting (DM uses its own function to set the limits because it blk_queue_stack_limits() was introduced later. It should probably switch to use generic stack limit function too.) * sets the default seg_boundary value in one place (blkdev.h) * use this mask as default in DM (instead of -1, which differs in 64bit) Bugs related to this: https://bugzilla.redhat.com/show_bug.cgi?id=471639 http://bugzilla.kernel.org/show_bug.cgi?id=8672 Signed-off-by: Milan Broz <mbroz@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-03block: internal dequeue shouldn't start timerTejun Heo
blkdev_dequeue_request() and elv_dequeue_request() are equivalent and both start the timeout timer. Barrier code dequeues the original barrier request but doesn't passes the request itself to lower level driver, only broken down proxy requests; however, as the original barrier code goes through the same dequeue path and timeout timer is started on it. If barrier sequence takes long enough, this timer expires but the low level driver has no idea about this request and oops follows. Timeout timer shouldn't have been started on the original barrier request as it never goes through actual IO. This patch unexports elv_dequeue_request(), which has no external user anyway, and makes it operate on elevator proper w/o adding the timer and make blkdev_dequeue_request() call elv_dequeue_request() and add timer. Internal users which don't pass the request to driver - barrier code and end_that_request_last() - are converted to use elv_dequeue_request(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits) MAINTAINERS: add netdev to ATM ATM: horizon, fix hrz_probe fail path pppol2tp: Add missing sock_put() in pppol2tp_release() net: Fix soft lockups/OOM issues w/ unix garbage collector macvlan: don't broadcast PAUSE frames to macvlan devices Phonet: fix oops in phonet_address_del() on non-Phonet device netfilter: ctnetlink: fix GFP_KERNEL allocation under spinlock sungem: Fix PCS_MIICTRL register write in gem_init_phy(). net: make skb_truesize_bug() call WARN() net: hp-plus uses eip_poll net/wireless/reg.c: fix bad WARN_ON in if statement ath5k: disable beacon filter when station is not associated ath5k: fix Security issue in DebugFS part of ath5k ath9k: correct expected max RX buffer size ath9k: Fix SW-IOMMU bounce buffer starvation mac80211 : Fix setting ad-hoc mode and non-ibss channel iwlagn: fix DMA sync phylib: Add Vitesse VSC8221 SGMII PHY rose: zero length frame filtering in af_rose.c bridge: netfilter: fix update_pmtu crash with GRE ...
2008-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: alim15x3: fix sparse warning ide: remove dead code from drive_is_ready() ide: fix build for DEBUG_PM ide: respect current DMA setting during resume ide: add SAMSUNG SP0822N with firmware WA100-10 to ivb_list[] amd74xx: workaround unreliable AltStatus register for nVidia controllers ide: fix the ide_release_lock imbalance
2008-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: [SCSI] stex: switch to block timeout [SCSI] make scsi_eh_try_stu use block timeout [SCSI] megaraid_sas: switch to block timeout [SCSI] ibmvscsi: switch to block timeout [SCSI] aacraid: switch to block timeout [SCSI] zfcp: prevent double decrement on host_busy while being busy [SCSI] zfcp: fix deadlock between wq triggered port scan and ERP [SCSI] zfcp: eliminate race between validation and locking [SCSI] zfcp: verify for correct rport state before scanning for SCSI devs [SCSI] zfcp: returning an ERR_PTR where a NULL value is expected [SCSI] zfcp: Fix opening of wka ports [SCSI] zfcp: fix remote port status check [SCSI] fc_transport: fix old bug on bitflag definitions [SCSI] Fix hang in starved list processing
2008-12-02nfsd: fix vm overcommit crash fix #2Junjiro R. Okajima
The previous patch from Alan Cox ("nfsd: fix vm overcommit crash", commit 731572d39fcd3498702eda4600db4c43d51e0b26) fixed the problem where knfsd crashes on exported shmemfs objects and strict overcommit is set. But the patch forgot supporting the case when CONFIG_SECURITY is disabled. This patch copies a part of his fix which is mainly for detecting a bug earlier. Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Junjiro R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-02amd74xx: workaround unreliable AltStatus register for nVidia controllersBartlomiej Zolnierkiewicz
It seems that on some nVidia controllers using AltStatus register can be unreliable so default to Status register if the PCI device is in Compatibility Mode. In order to achieve this: * Add ide_pci_is_in_compatibility_mode() inline helper to <linux/ide.h>. * Add IDE_HFLAG_BROKEN_ALTSTATUS host flag and set it in amd74xx host driver for nVidia controllers in Compatibility Mode. * Teach actual_try_to_identify() and drive_is_ready() about the new flag. This fixes the regression caused by removal of CONFIG_IDEPCI_SHARE_IRQ config option in 2.6.25 and using AltStatus register unconditionally when available (kernel.org bugs #11659 and #10216). [ Moreover for CONFIG_IDEPCI_SHARE_IRQ=y (which is what most people and distributions use) it never worked correctly. ] Thanks to Remy LABENE and Lars Winterfeld for help with debugging the problem. More info at: http://bugzilla.kernel.org/show_bug.cgi?id=11659 http://bugzilla.kernel.org/show_bug.cgi?id=10216 Reported-by: Remy LABENE <remy.labene@free.fr> Tested-by: Remy LABENE <remy.labene@free.fr> Tested-by: Lars Winterfeld <lars.winterfeld@tu-ilmenau.de> Acked-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-12-01lib/idr.c: fix rcu related race with idr_findManfred Spraul
2nd part of the fixes needed for http://bugzilla.kernel.org/show_bug.cgi?id=11796. When the idr tree is either grown or shrunk, then the update to the number of layers and the top pointer were not atomic. This race caused crashes. The attached patch fixes that by replicating the layers counter in each layer, thus idr_find doesn't need idp->layers anymore. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Clement Calmels <cboulte@gmail.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01epoll: introduce resource usage limitsDavide Libenzi
It has been thought that the per-user file descriptors limit would also limit the resources that a normal user can request via the epoll interface. Vegard Nossum reported a very simple program (a modified version attached) that can make a normal user to request a pretty large amount of kernel memory, well within the its maximum number of fds. To solve such problem, default limits are now imposed, and /proc based configuration has been introduced. A new directory has been created, named /proc/sys/fs/epoll/ and inside there, there are two configuration points: max_user_instances = Maximum number of devices - per user max_user_watches = Maximum number of "watched" fds - per user The current default for "max_user_watches" limits the memory used by epoll to store "watches", to 1/32 of the amount of the low RAM. As example, a 256MB 32bit machine, will have "max_user_watches" set to roughly 90000. That should be enough to not break existing heavy epoll users. The default value for "max_user_instances" is set to 128, that should be enough too. This also changes the userspace, because a new error code can now come out from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already listed, so that should be ok. [akpm@linux-foundation.org: use get_current_user()] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: <stable@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Reported-by: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: libata: blacklist Seagate drives which time out FLUSH_CACHE when used with NCQ [libata] pata_rb532_cf: fix signature of the xfer function [libata] pata_rb532_cf: fix and rename register definitions ata_piix: add borked Tecra M4 to broken suspend list
2008-12-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/mlx4: Fix MTT leakage in resize CQ IB/ehca: Fix problem with generated flush work completions IB/ehca: Change misleading error message on memory hotplug mlx4_core: Save/restore default port IB capability mask
2008-12-01libata: blacklist Seagate drives which time out FLUSH_CACHE when used with NCQTejun Heo
Some recent Seagate harddrives have firmware bug which causes FLUSH CACHE to timeout under certain circumstances if NCQ is being used. This can be worked around by disabling NCQ and fixed by updating the firmware. Implement ATA_HORKAGE_FIRMWARE_UPDATE and blacklist these devices. The wiki page has been updated to contain information on this issue. http://ata.wiki.kernel.org/index.php/Known_issues Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-11-30Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
* master.kernel.org:/home/rmk/linux-2.6-arm: Allow architectures to override copy_user_highpage() [ARM] pxa/palmtx: misc fixes to use generic GPIO API ARM: OMAP: Fixes for suspend / resume GPIO wake-up handling [ARM] pxa/corgi: update default config to exclude tosa from being built [ARM] pxa/pcm990: use negative number for an invalid GPIO in camera data ARM: OMAP: Typo fix for clock_allow_idle ARM: OMAP: Remove broken LCD driver for SX1 [ARM] 5335/1: pxa25x_udc: Fix is_vbus_present to return 1 or 0 [ARM] pxa/MioA701: bluetooth resume fix [ARM] pxa/MioA701: fix memory corruption.
2008-11-30Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq.h: fix missing/extra kernel-doc genirq: __irq_set_trigger: change pr_warning to pr_debug irq: fix typo x86: apic honour irq affinity which was set in early boot genirq: fix the affinity setting in setup_irq genirq: keep affinities set from userspace across free/request_irq()
2008-11-30Merge branch 'drm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/i915: Save/restore HWS_PGA on suspend/resume drm: move drm vblank initialization/cleanup to driver load/unload drm/i915: execbuffer pins objects, no need to ensure they're still in the GTT drm/i915: Always read pipestat in irq_handler drm/i915: Subtract total pinned bytes from available aperture size drm/i915: Avoid BUG_ONs on VT switch with a wedged chipset. drm/i915: Remove IMR masking during interrupt handler, and restart it if needed. drm/i915: Manage PIPESTAT to control vblank interrupts instead of IMR.
2008-11-30Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: toshiba_acpi: close race in toshiba_acpi driver ACPICA: disable _BIF warning ACPI: delete OSI(Linux) DMI dmesg spam ACPICA: Allow _WAK method to return an Integer ACPI: thinkpad-acpi: fix fan sleep/resume path sony-laptop: printk tweak sony-laptop: brightness regression fix Revert "ACPI: don't enable control method power button as wakeup device when Fixed Power button is used" ACPI suspend: Blacklist boxes that require us to set SCI_EN directly on resume ACPI: scheduling in atomic via acpi_evaluate_integer () ACPI: battery: Convert discharge energy rate to current properly ACPI: EC: count interrupts only if called from interrupt handler.
2008-11-30remove __ARCH_WANT_COMPAT_SYS_PTRACEChristoph Hellwig
All architectures now use the generic compat_sys_ptrace, as should every new architecture that needs 32bit compat (if we'll ever get another). Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also kill a comment about __ARCH_SYS_PTRACE that was added after __ARCH_SYS_PTRACE was already gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30hotplug_memory_notifier section annotationAl Viro
Same as for hotplug_cpu - we want static notifier_block in there in meminitdata, to avoid false positives whenever it's used. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30meminit section warningsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-28mlx4_core: Save/restore default port IB capability maskJack Morgenstein
Commit 7ff93f8b ("mlx4_core: Multiple port type support") introduced support for different port types. As part of that support, SET_PORT is invoked to set the port type during driver startup. However, as a side-effect, for IB ports the invocation of this command also sets the port's capability mask to zero (losing the default value set by FW). To fix this, get the default ib port capabilities (via a MAD_IFC Port Info query) during driver startup, and save them for use in the mlx4_SET_PORT command when setting the port-type to Infiniband. This patch fixes problems with subnet manager (SM) failover such as <https://bugs.openfabrics.org/show_bug.cgi?id=1183>, which occurred because the IsTrapSupported bit in the capability mask was zeroed. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-27Allow architectures to override copy_user_highpage()Russell King
With aliasing VIPT cache support, the ARM implementation of clear_user_page() and copy_user_page() sets up a temporary kernel space mapping such that we have the same cache colour as the userspace page. This avoids having to consider any userspace aliases from this operation. However, when highmem is enabled, kmap_atomic() have to setup mappings. The copy_user_highpage() and clear_user_highpage() call these functions before delegating the copies to copy_user_page() and clear_user_page(). The effect of this is that each of the *_user_highpage() functions setup their own kmap mapping, followed by the *_user_page() functions setting up another mapping. This is rather wasteful. Thankfully, copy_user_highpage() can be overriden by architectures by defining __HAVE_ARCH_COPY_USER_HIGHPAGE. However, replacement of clear_user_highpage() is more difficult because its inline definition is not conditional. It seems that you're expected to define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE and provide a replacement __alloc_zeroed_user_highpage() implementation instead. The allocation itself is fine, so we don't want to override that. What we really want to do is to override clear_user_highpage() with our own version which doesn't kmap_atomic() unnecessarily. Other VIPT architectures (PARISC and SH) would also like to override this function as well. Acked-by: Hugh Dickins <hugh@veritas.com> Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-11-27ACPICA: disable _BIF warningLin Ming
A generic work-around from ACPICA is in the queue, but since Linux has a work-around in its battery driver, we can disable this warning now. Allow _BIF method to return an Package with Buffer elements http://bugzilla.kernel.org/show_bug.cgi?id=11822 Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2008-11-27ACPICA: Allow _WAK method to return an IntegerBob Moore
This can happen if the _WAK method returns nothing (as per ACPI 1.0) but does return an integer if the implicit return mechanism is enabled. This is the only method that has this problem, since it is also defined to return a package of two integers (ACPI 1.0b+). In all other cases, if a method returns an object when one was not expected, no warning is issued. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2008-11-26net: Fix soft lockups/OOM issues w/ unix garbage collectordann frazier
This is an implementation of David Miller's suggested fix in: https://bugzilla.redhat.com/show_bug.cgi?id=470201 It has been updated to use wait_event() instead of wait_event_interruptible(). Paraphrasing the description from the above report, it makes sendmsg() block while UNIX garbage collection is in progress. This avoids a situation where child processes continue to queue new FDs over a AF_UNIX socket to a parent which is in the exit path and running garbage collection on these FDs. This contention can result in soft lockups and oom-killing of unrelated processes. Signed-off-by: dann frazier <dannf@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24netfilter: xtables: add missing const qualifier to xt_tgchk_paramJan Engelhardt
When entryinfo was a standalone parameter to functions, it used to be "const void *". Put the const back in. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25drm: move drm vblank initialization/cleanup to driver load/unloadKeith Packard
drm vblank initialization keeps track of the changes in driver-supplied frame counts across vt switch and mode setting, but only if you let it by not tearing down the drm vblank structure. Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-11-23irq.h: fix missing/extra kernel-docRandy Dunlap
Impact: fix kernel-doc build Fix missing & excess irq.h kernel-doc: Warning(include/linux/irq.h:182): No description found for parameter 'irq' Warning(include/linux/irq.h:182): Excess struct/union/enum/typedef member 'affinity_entry' description in 'irq_desc' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23Merge commit 'v2.6.28-rc6' into irq/urgentIngo Molnar
2008-11-21net: Fix memory leak in the proto_register functionCatalin Marinas
If the slub allocator is used, kmem_cache_create() may merge two or more kmem_cache's into one but the cache name pointer is not updated and kmem_cache_name() is no longer guaranteed to return the pointer passed to the former function. This patch stores the kmalloc'ed pointers in the corresponding request_sock_ops and timewait_sock_ops structures. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21[SCSI] fc_transport: fix old bug on bitflag definitionsJames Smart
When the fastfail flag was added, it did not account for the flags being bit fields. Correct the definition so there is no longer a conflict. Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits) net: fix tiny output corruption of /proc/net/snmp6 atl2: don't request irq on resume if netif running ipv6: use seq_release_private for ip6mr.c /proc entries pkt_sched: fix missing check for packet overrun in qdisc_dump_stab() smc911x: Fix printf format typo in smc911x driver. asix: Fix asix-based cards connecting to 10/100Mbs LAN. mv643xx_eth: fix recycle check bound mv643xx_eth: fix the order of mdiobus_{unregister, free}() calls sh: sh_eth: Update to change of mii_bus TPROXY: supply a struct flowi->flags argument in inet_sk_rebuild_header() TPROXY: fill struct flowi->flags in udp_sendmsg() net: ipg.c fix bracing on endian swapping phylib: Fix auto-negotiation restart avoidance net: jme.c rxdesc.flags is __le16, other missing endian swaps phylib: fix phy name example in documentation net: Do not fire linkwatch events until the device is registered. phonet: fix compilation with gcc-3.4 ixgbe: fix compilation with gcc-3.4 pktgen: fix multiple queue warning net: fix ip_mr_init() error path ...
2008-11-20Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2008-11-19Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: more general identifier for Phoenix BIOS AMD IOMMU: check for next_bit also in unmapped area AMD IOMMU: fix fullflush comparison length AMD IOMMU: enable device isolation per default AMD IOMMU: add parameter to disable device isolation x86, PEBS/DS: fix code flow in ds_request() x86: add rdtsc barrier to TSC sync check xen: fix scrub_page() x86: fix es7000 compiling x86, bts: fix unlock problem in ds.c x86, voyager: fix smp generic helper voyager breakage x86: move iomap.h to the new include location
2008-11-19cpuset: update top cpuset's mems after adding a nodeMiao Xie
After adding a node into the machine, top cpuset's mems isn't updated. By reviewing the code, we found that the update function cpuset_track_online_nodes() was invoked after node_states[N_ONLINE] changes. It is wrong because N_ONLINE just means node has pgdat, and if node has/added memory, we use N_HIGH_MEMORY. So, We should invoke the update function after node_states[N_HIGH_MEMORY] changes, just like its commit says. This patch fixes it. And we use notifier of memory hotplug instead of direct calling of cpuset_track_online_nodes(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Paul Menage <menage@google.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19reintroduce accept4Ulrich Drepper
Introduce a new accept4() system call. The addition of this system call matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(), inotify_init1(), epoll_create1(), pipe2()) which added new system calls that differed from analogous traditional system calls in adding a flags argument that can be used to access additional functionality. The accept4() system call is exactly the same as accept(), except that it adds a flags bit-mask argument. Two flags are initially implemented. (Most of the new system calls in 2.6.27 also had both of these flags.) SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled for the new file descriptor returned by accept4(). This is a useful security feature to avoid leaking information in a multithreaded program where one thread is doing an accept() at the same time as another thread is doing a fork() plus exec(). More details here: http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling", Ulrich Drepper). The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag to be enabled on the new open file description created by accept4(). (This flag is merely a convenience, saving the use of additional calls fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result. Here's a test program. Works on x86-32. Should work on x86-64, but I (mtk) don't have a system to hand to test with. It tests accept4() with each of the four possible combinations of SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies that the appropriate flags are set on the file descriptor/open file description returned by accept4(). I tested Ulrich's patch in this thread by applying against 2.6.28-rc2, and it passes according to my test program. /* test_accept4.c Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk <mtk.manpages@gmail.com> Licensed under the GNU GPLv2 or later. */ #define _GNU_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <sys/socket.h> #include <netinet/in.h> #include <stdlib.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #define PORT_NUM 33333 #define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0) /**********************************************************************/ /* The following is what we need until glibc gets a wrapper for accept4() */ /* Flags for socket(), socketpair(), accept4() */ #ifndef SOCK_CLOEXEC #define SOCK_CLOEXEC O_CLOEXEC #endif #ifndef SOCK_NONBLOCK #define SOCK_NONBLOCK O_NONBLOCK #endif #ifdef __x86_64__ #define SYS_accept4 288 #elif __i386__ #define USE_SOCKETCALL 1 #define SYS_ACCEPT4 18 #else #error "Sorry -- don't know the syscall # on this architecture" #endif static int accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags) { printf("Calling accept4(): flags = %x", flags); if (flags != 0) { printf(" ("); if (flags & SOCK_CLOEXEC) printf("SOCK_CLOEXEC"); if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK)) printf(" "); if (flags & SOCK_NONBLOCK) printf("SOCK_NONBLOCK"); printf(")"); } printf("\n"); #if USE_SOCKETCALL long args[6]; args[0] = fd; args[1] = (long) sockaddr; args[2] = (long) addrlen; args[3] = flags; return syscall(SYS_socketcall, SYS_ACCEPT4, args); #else return syscall(SYS_accept4, fd, sockaddr, addrlen, flags); #endif } /**********************************************************************/ static int do_test(int lfd, struct sockaddr_in *conn_addr, int closeonexec_flag, int nonblock_flag) { int connfd, acceptfd; int fdf, flf, fdf_pass, flf_pass; struct sockaddr_in claddr; socklen_t addrlen; printf("=======================================\n"); connfd = socket(AF_INET, SOCK_STREAM, 0); if (connfd == -1) die("socket"); if (connect(connfd, (struct sockaddr *) conn_addr, sizeof(struct sockaddr_in)) == -1) die("connect"); addrlen = sizeof(struct sockaddr_in); acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen, closeonexec_flag | nonblock_flag); if (acceptfd == -1) { perror("accept4()"); close(connfd); return 0; } fdf = fcntl(acceptfd, F_GETFD); if (fdf == -1) die("fcntl:F_GETFD"); fdf_pass = ((fdf & FD_CLOEXEC) != 0) == ((closeonexec_flag & SOCK_CLOEXEC) != 0); printf("Close-on-exec flag is %sset (%s); ", (fdf & FD_CLOEXEC) ? "" : "not ", fdf_pass ? "OK" : "failed"); flf = fcntl(acceptfd, F_GETFL); if (flf == -1) die("fcntl:F_GETFD"); flf_pass = ((flf & O_NONBLOCK) != 0) == ((nonblock_flag & SOCK_NONBLOCK) !=0); printf("nonblock flag is %sset (%s)\n", (flf & O_NONBLOCK) ? "" : "not ", flf_pass ? "OK" : "failed"); close(acceptfd); close(connfd); printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL"); return fdf_pass && flf_pass; } static int create_listening_socket(int port_num) { struct sockaddr_in svaddr; int lfd; int optval; memset(&svaddr, 0, sizeof(struct sockaddr_in)); svaddr.sin_family = AF_INET; svaddr.sin_addr.s_addr = htonl(INADDR_ANY); svaddr.sin_port = htons(port_num); lfd = socket(AF_INET, SOCK_STREAM, 0); if (lfd == -1) die("socket"); optval = 1; if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval)) == -1) die("setsockopt"); if (bind(lfd, (struct sockaddr *) &svaddr, sizeof(struct sockaddr_in)) == -1) die("bind"); if (listen(lfd, 5) == -1) die("listen"); return lfd; } int main(int argc, char *argv[]) { struct sockaddr_in conn_addr; int lfd; int port_num; int passed; passed = 1; port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM; memset(&conn_addr, 0, sizeof(struct sockaddr_in)); conn_addr.sin_family = AF_INET; conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); conn_addr.sin_port = htons(port_num); lfd = create_listening_socket(port_num); if (!do_test(lfd, &conn_addr, 0, 0)) passed = 0; if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0)) passed = 0; if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK)) passed = 0; if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK)) passed = 0; close(lfd); exit(passed ? EXIT_SUCCESS : EXIT_FAILURE); } [mtk.manpages@gmail.com: rewrote changelog, updated test program] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-18mac80211: remove ieee80211_notify_macJohannes Berg
Before ieee80211_notify_mac() was added, it was presented with the use case of using it to tell mac80211 that the association may have been lost because the firmware crashed/reset. Since then, it has also been used by iwlwifi to (slightly) speed up re-association after resume, a workaround around the fact that mac80211 has no suspend/resume handling yet. It is also not used by any other drivers, so clearly it cannot be necessary for "good enough" suspend/resume. Unfortunately, the callback suffers from a severe problem: It only works for station mode. If suspend/resume happens while in IBSS or any other mode (but station), then the callback is pointless. Recently, it has created a number of locking issues, first because it required rtnl locking rather than RCU due to calling sleeping functions within the critical section, and now because it's called by iwlwifi from the mac80211 workqueue that may not use the rtnl because it is flushed under rtnl. (cf. http://bugzilla.kernel.org/show_bug.cgi?id=12046) I think, therefore, that we should take a step back, remove it entirely for now and add the small feature it provided properly. For suspend and resume we will need to introduce new hooks, and for the case where the firmware was reset the driver will probably simply just pretend it has done a suspend/resume cycle to get mac80211 to reprogram the hardware completely, not just try to connect to the current AP again in station mode. When doing so, we will need to take into account locking issues and possibly defer to schedule_work from within mac80211 for the resume operation, while the suspend operation must be done directly. Proper suspend/resume should also not necessarily try to reconnect to the current AP, the time spent in suspend may have been short enough to not be disconnected from the AP, mac80211 will detect that the AP went out of range quickly if it did, and if the association is lost then the AP will disassoc as soon as a data frame is sent. We might also take into account WWOL then, and have mac80211 program the hardware into such a mode where it is available and requested. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-18Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: hold extra reference to bio in blk_rq_map_user_iov() relay: fix cpu offline problem Release old elevator on change elevator block: fix boot failure with CONFIG_DEBUG_BLOCK_EXT_DEVT=y and nash block/md: fix md autodetection block: make add_partition() return pointer to hd_struct block: fix add_partition() error path
2008-11-18Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: kernel/profile.c: fix section mismatch warning function tracing: fix wrong pos computing when read buffer has been fulfilled tracing: fix mmiotrace resizing crash ring-buffer: no preempt for sched_clock() ring-buffer: buffer record on/off switch
2008-11-18Merge branch 'iommu-fixes-2.6.28' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent
2008-11-18block: make add_partition() return pointer to hd_structTejun Heo
Make add_partition() return pointer to the new hd_struct on success and ERR_PTR() value on failure. This change will be used to fix md autodetection bug. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-11-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits) rtnetlink: propagate error from dev_change_flags in do_setlink() isdn: remove extra byteswap in isdn_net_ciscohdlck_slarp_send_reply Phonet: refuse to send bigger than MTU packets e1000e: fix IPMI traffic e1000e: fix warn_on reload after phy_id error phy: fix phy address bug e100: fix dma error in direction for mapping igb: use dev_printk instead of printk qla3xxx: Cleanup: Fix link print statements. igb: Use device_set_wakeup_enable e1000: Use device_set_wakeup_enable e1000e: Use device_set_wakeup_enable via-velocity: enable perfect filtering for multicast packets phy: Add support for Marvell 88E1118 PHY mlx4_en: Pause parameters per port phylib: fix premature freeing of struct mii_bus atl1: Do not enumerate options unsupported by chip atl1e: fix broken multicast by removing unnecessary crc inversion gianfar: Fix DMA unmap invocations net/ucc_geth: Fix oops in uec_get_ethtool_stats() ...
2008-11-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: don't grab devices with no input HID: fix radio-mr800 hidquirks HID: fix kworld fm700 radio hidquirks HID: fix start/stop cycle in usbhid driver HID: use single threaded work queue for hid_compat HID: map macbook keys for "Expose" and "Dashboard" HID: support for new unibody macbooks HID: fix locking in hidraw_open()
2008-11-15Fix inotify watch removal/umount racesAl Viro
Inotify watch removals suck violently. To kick the watch out we need (in this order) inode->inotify_mutex and ih->mutex. That's fine if we have a hold on inode; however, for all other cases we need to make damn sure we don't race with umount. We can *NOT* just grab a reference to a watch - inotify_unmount_inodes() will happily sail past it and we'll end with reference to inode potentially outliving its superblock. Ideally we just want to grab an active reference to superblock if we can; that will make sure we won't go into inotify_umount_inodes() until we are done. Cleanup is just deactivate_super(). However, that leaves a messy case - what if we *are* racing with umount() and active references to superblock can't be acquired anymore? We can bump ->s_count, grab ->s_umount, which will almost certainly wait until the superblock is shut down and the watch in question is pining for fjords. That's fine, but there is a problem - we might have hit the window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e. the moment when superblock is past the point of no return and is heading for shutdown) and the moment when deactivate_super() acquires ->s_umount. We could just do drop_super() yield() and retry, but that's rather antisocial and this stuff is luser-triggerable. OTOH, having grabbed ->s_umount and having found that we'd got there first (i.e. that ->s_root is non-NULL) we know that we won't race with inotify_umount_inodes(). So we could grab a reference to watch and do the rest as above, just with drop_super() instead of deactivate_super(), right? Wrong. We had to drop ih->mutex before we could grab ->s_umount. So the watch could've been gone already. That still can be dealt with - we need to save watch->wd, do idr_find() and compare its result with our pointer. If they match, we either have the damn thing still alive or we'd lost not one but two races at once, the watch had been killed and a new one got created with the same ->wd at the same address. That couldn't have happened in inotify_destroy(), but inotify_rm_wd() could run into that. Still, "new one got created" is not a problem - we have every right to kill it or leave it alone, whatever's more convenient. So we can use idr_find(...) == watch && watch->inode->i_sb == sb as "grab it and kill it" check. If it's been our original watch, we are fine, if it's a newcomer - nevermind, just pretend that we'd won the race and kill the fscker anyway; we are safe since we know that its superblock won't be going away. And yes, this is far beyond mere "not very pretty"; so's the entire concept of inotify to start with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-15Merge branch 'sh/for-2.6.28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * 'sh/for-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: serial: sh-sci: Reorder the SCxTDR write after the TDxE clear. sh: __copy_user function can corrupt the stack in case of exception sh: Fixed the TMU0 reload value on resume sh: Don't factor in PAGE_OFFSET for valid_phys_addr_range() check. sh: early printk port type fix i2c: fix i2c-sh_mobile rx underrun sh: Provide a sane valid_phys_addr_range() to prevent TLB reset with PMB. usb: r8a66597-hcd: fix wrong data access in SuperH on-chip USB fix sci type for SH7723 serial: sh-sci: fix cannot work SH7723 SCIFA sh: Handle fixmap TLB eviction more coherently.
2008-11-15Add 'pr_fmt()' format modifier to pr_xyz macros.Martin Schwidefsky
A common reason for device drivers to implement their own printk macros is the lack of a printk prefix with the standard pr_xyz macros. Introduce a pr_fmt() macro that is applied for every pr_xyz macro to the format string. The most common use of the pr_fmt macro would be to add the name of the device driver to all pr_xyz messages in a source file. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-13lockdep: include/linux/lockdep.h - fix warning in net/bluetooth/af_bluetooth.cIngo Molnar
fix this warning: net/bluetooth/af_bluetooth.c:60: warning: ‘bt_key_strings’ defined but not used net/bluetooth/af_bluetooth.c:71: warning: ‘bt_slock_key_strings’ defined but not used this is a lockdep macro problem in the !LOCKDEP case. We cannot convert it to an inline because the macro works on multiple types, but we can mark the parameter used. [ also clean up a misaligned tab in sock_lock_init_class_and_name() ] [ also remove #ifdefs from around af_family_clock_key strings - which were certainly added to get rid of the ugly build warnings. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-13USB: don't register endpoints for interfaces that are going awayAlan Stern
This patch (as1155) fixes a bug in usbcore. When interfaces are deleted, either because the device was disconnected or because of a configuration change, the extra attribute files and child endpoint devices may get left behind. This is because the core removes them before calling device_del(). But during device_del(), after the driver is unbound the core will reinstall altsetting 0 and recreate those extra attributes and children. The patch prevents this by adding a flag to record when the interface is in the midst of being unregistered. When the flag is set, the attribute files and child devices will not be created. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable <stable@kernel.org> [2.6.27, 2.6.26, 2.6.25] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-11-13slab: document SLAB_DESTROY_BY_RCUPeter Zijlstra
Explain this SLAB_DESTROY_BY_RCU thing... [hugh@veritas.com: add a pointer to comment in mm/slab.c] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-11-13HID: map macbook keys for "Expose" and "Dashboard"Henrik Rydberg
On macbooks there are specific keys for the user-space functions Expose and Dashboard, which currently has no counterpart in input.h. This patch adds KEY_SCALE and KEY_DASHBOARD, and maps the keyboard accordingly. Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-11-12Add c2 port supportRodolfo Giometti
C2port implements a two wire serial communication protocol (bit banging) designed to enable in-system programming, debugging, and boundary-scan testing on low pin-count Silicon Labs devices. Currently this code supports only flash programming through sysfs interface but extensions shoud be easy to add. Signed-off-by: Rodolfo Giometti <giometti@linux.it> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>