aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2009-02-18Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list block: fix booting from partitioned md array block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb cciss: PCI power management reset for kexec paride/pg.c: xs(): &&/|| confusion fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free block: fix bad definition of BIO_RW_SYNC bsg: Fix sense buffer bug in SG_IO
2009-02-18Bernhard has movedBernhard Walle
Since I don't work for SUSE any more and the bwalle@suse.de address is invalid, correct it in the copyright headers and documentation. Signed-off-by: Bernhard Walle <bernhard.walle@gmx.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18jsm: additional device supportAdam Lackorzynski
I have a Digi Neo 8 PCI card (114f:00b1) Serial controller: Digi International Digi Neo 8 (rev 05) that works with the jsm driver after using the following patch. Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de> Cc: Scott H Kilau <Scott_Kilau@digi.com> Cc: Wendy Xiong <wendyx@us.ibm.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18mm: fix memmap init for handling memory holeKAMEZAWA Hiroyuki
Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole. and memmap initialization was not done. This was a trouble for sparc boot. To fix this, the PFN should be initialized and marked as PG_reserved. This patch changes early_pfn_in_nid() return true if PFN is a hole. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reported-by: David Miller <davem@davemlloft.net> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18mm: clean up for early_pfn_to_nid()KAMEZAWA Hiroyuki
What's happening is that the assertion in mm/page_alloc.c:move_freepages() is triggering: BUG_ON(page_zone(start_page) != page_zone(end_page)); Once I knew this is what was happening, I added some annotations: if (unlikely(page_zone(start_page) != page_zone(end_page))) { printk(KERN_ERR "move_freepages: Bogus zones: " "start_page[%p] end_page[%p] zone[%p]\n", start_page, end_page, zone); printk(KERN_ERR "move_freepages: " "start_zone[%p] end_zone[%p]\n", page_zone(start_page), page_zone(end_page)); printk(KERN_ERR "move_freepages: " "start_pfn[0x%lx] end_pfn[0x%lx]\n", page_to_pfn(start_page), page_to_pfn(end_page)); printk(KERN_ERR "move_freepages: " "start_nid[%d] end_nid[%d]\n", page_to_nid(start_page), page_to_nid(end_page)); ... And here's what I got: move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00] move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00] move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff] move_freepages: start_nid[1] end_nid[0] My memory layout on this box is: [ 0.000000] Zone PFN ranges: [ 0.000000] Normal 0x00000000 -> 0x0081ff5d [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[8] active PFN ranges [ 0.000000] 0: 0x00000000 -> 0x00020000 [ 0.000000] 1: 0x00800000 -> 0x0081f7ff [ 0.000000] 1: 0x0081f800 -> 0x0081fe50 [ 0.000000] 1: 0x0081fed1 -> 0x0081fed8 [ 0.000000] 1: 0x0081feda -> 0x0081fedb [ 0.000000] 1: 0x0081fedd -> 0x0081fee5 [ 0.000000] 1: 0x0081fee7 -> 0x0081ff51 [ 0.000000] 1: 0x0081ff59 -> 0x0081ff5d So it's a block move in that 0x81f600-->0x81f7ff region which triggers the problem. This patch: Declaration of early_pfn_to_nid() is scattered over per-arch include files, and it seems it's complicated to know when the declaration is used. I think it makes fix-for-memmap-init not easy. This patch moves all declaration to include/linux/mm.h After this, if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -> Use static definition in include/linux/mm.h else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -> Use generic definition in mm/page_alloc.c else -> per-arch back end function will be called. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: David Miller <davem@davemlloft.net> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18atmel-mci: fix initialization of dma slave dataDan Williams
The conversion of atmel-mci to dma_request_channel missed the initialization of the channel dma_slave information. The filter_fn passed to dma_request_channel is responsible for initializing the channel's private data. This implementation has the additional benefit of enabling a generic client-channel data passing mechanism. Reviewed-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18mm: task dirty accounting fixNick Piggin
YAMAMOTO-san noticed that task_dirty_inc doesn't seem to be called properly for cases where set_page_dirty is not used to dirty a page (eg. mark_buffer_dirty). Additionally, there is some inconsistency about when task_dirty_inc is called. It is used for dirty balancing, however it even gets called for __set_page_dirty_no_writeback. So rather than increment it in a set_page_dirty wrapper, move it down to exactly where the dirty page accounting stats are incremented. Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18timerfd: add flags checkDavide Libenzi
As requested by Michael, add a missing check for valid flags in timerfd_settime(), and make it return EINVAL in case some extra bits are set. Michael said: If this is to be any use to userland apps that want to check flag support (perhaps it is too late already), then the sooner we get it into the kernel the better: 2.6.29 would be good; earlier stables as well would be even better. [akpm@linux-foundation.org: remove unused TFD_FLAGS_SET] Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: <stable@kernel.org> [2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18seq_file: properly cope with preadEric Biederman
Currently seq_read assumes that the offset passed to it is always the offset it passed to user space. In the case pread this assumption is broken and we do the wrong thing when presented with pread. To solve this I introduce an offset cache inside of struct seq_file so we know where our logical file position is. Then in seq_read if we try to read from another offset we reset our data structures and attempt to go to the offset user space wanted. [akpm@linux-foundation.org: restore FMODE_PWRITE] [pjt@google.com: seq_open needs its fmode opened up to take advantage of this] Signed-off-by: Eric Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Paul Turner <pjt@google.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18vfs: separate FMODE_PREAD/FMODE_PWRITE into separate flagsPaul Turner
Separate FMODE_PREAD and FMODE_PWRITE into separate flags to reflect the reality that the read and write paths may have independent restrictions. A git grep verifies that these flags are always cleared together so this new behavior will only apply to interfaces that change to clear flags individually. This is required for "seq_file: properly cope with pread", a post-2.6.25 regression fix. [akpm@linux-foundation.org: add comment] Signed-off-by: Paul Turner <pjt@google.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18vmalloc: add __get_vm_area_caller()Benjamin Herrenschmidt
We have get_vm_area_caller() and __get_vm_area() but not __get_vm_area_caller() On powerpc, I use __get_vm_area() to separate the ranges of addresses given to vmalloc vs. ioremap (various good reasons for that) so in order to be able to implement the new caller tracking in /proc/vmallocinfo, I need a "_caller" variant of it. (akpm: needed for ongoing powerpc development, so merge it early) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18block: fix bad definition of BIO_RW_SYNCJens Axboe
We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before 213d9417fec62ef4c3675621b9364a667954d4dd. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-17Add support for VT6415 PCIE PATA IDE Host ControllerZlatko Calusic
Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-17Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, vm86: fix preemption bug x86, olpc: fix model detection without OFW x86, hpet: fix for LS21 + HPET = boot hang x86: CPA avoid repeated lazy mmu flush x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem x86/cpa: make sure cpa is safe to call in lazy mmu mode x86, ptrace, mm: fix double-free on race
2009-02-17FRV: __pte_to_swp_entry doesn't expand correctlyRoel Kluin
The macro doesn't expand correctly when its parameter isn't 'pte'. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-17Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fix NULL dereference in ext4_ext_migrate()'s error handling ext4: Implement range_cyclic in ext4_da_writepages instead of write_cache_pages ext4: Initialize preallocation list_head's properly ext4: Fix lockdep warning ext4: Fix to read empty directory blocks correctly in 64k jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() Revert "ext4: wait on all pending commits in ext4_sync_fs()" jbd2: Fix return value of jbd2_journal_start_commit()
2009-02-15KVM: Add kvm_arch_sync_events to sync with asynchronize eventsSheng Yang
kvm_arch_sync_events is introduced to quiet down all other events may happen contemporary with VM destroy process, like IRQ handler and work struct for assigned device. For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so the state of KVM here is legal and can provide a environment to quiet down other events. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15KVM: Avoid using CONFIG_ in userspace visible headersAvi Kivity
Kconfig symbols are not available in userspace, and are not stripped by headers-install. Avoid their use by adding #defines in <asm/kvm.h> to suit each architecture. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ASoC: Only register AC97 bus if it's not done already ALSA: hda - Add snd_hda_multi_out_dig_cleanup() ALSA: hda - Add missing terminator in slave dig-out array ALSA: hda - Change HP dv7 (103c:30f4) quirk from hp-m4 to hp-dv5 model ALSA: hda - Register (new) devices at reconfig ALSA: mtpav - Fix initial value for input hwport ALSA: hda - add id for Intel IbexPeak integrated HDMI codec ALSA: hda - compute checksum in HDMI audio infoframe ALSA: hda - enable HDMI audio pin out at module loading time ALSA: hda - allow multi-channel HDMI audio playback when ELD is not present ASoC: Update SDP3430 machine driver for snd_soc_card ALSA: hda - Add quirk for Asus z37e (1043:8284) sound: Remove OSSlib stuff from linux/soundcard.h ASoC: WM8990: Fix kcontrol's private value use in put callback ASoC: TLV320AIC3X: Fix kcontrol's private value use in put callback
2009-02-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits) wimax: fix oops in wimax_dev_get_by_genl_info() when looking up non-wimax iface net: 4 bytes kernel memory disclosure in SO_BSDCOMPAT gsopt try #2 netxen: fix compile waring "label ‘set_32_bit_mask’ defined but not used" on IA64 platform bnx2: Update version to 1.9.2 and copyright. bnx2: Fix jumbo frames error handling. bnx2: Update 5709 firmware. bnx2: Update 5706/5708 firmware. 3c505: do not set pcb->data.raw beyond its size Documentation/connector/cn_test.c: don't use gfp_any() net: don't use in_atomic() in gfp_any() IRDA: cnt is off by 1 netxen: remove pcie workaround sun3: print when lance_open() fails qlge: bugfix: Add missing rx buf clean index on early exit. qlge: bugfix: Fix RX scaling values. qlge: bugfix: Fix TSO breakage. qlge: bugfix: Add missing dev_kfree_skb_any() call. qlge: bugfix: Add missing put_page() call. qlge: bugfix: Fix fatal error recovery hang. qlge: bugfix: Use netif_receive_skb() and vlan_hwaccel_receive_skb(). ...
2009-02-12net: don't use in_atomic() in gfp_any()Andrew Morton
The problem is that in_atomic() will return false inside spinlocks if CONFIG_PREEMPT=n. This will lead to deadlockable GFP_KERNEL allocations from spinlocked regions. Secondly, if CONFIG_PREEMPT=y, this bug solves itself because networking will instead use GFP_ATOMIC from this callsite. Hence we won't get the might_sleep() debugging warnings which would have informed us of the buggy callsites. Solve both these problems by switching to in_interrupt(). Now, if someone runs a gfp_any() allocation from inside spinlock we will get the warning if CONFIG_PREEMPT=y. I reviewed all callsites and most of them were too complex for my little brain and none of them documented their interface requirements. I have no idea what this patch will do. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-11syscall define: fix uml compile bugHeiko Carstens
With the new system call defines we get this on uml: arch/um/sys-i386/built-in.o: In function `sys_call_table': (.rodata+0x308): undefined reference to `sys_sigprocmask' Reason for this is that uml passes the preprocessor option -Dsigprocmask=kernel_sigprocmask to gcc when compiling the kernel. This causes SYSCALL_DEFINE3(sigprocmask, ...) to be expanded to SYSCALL_DEFINEx(3, kernel_sigprocmask, ...) and finally to a system call named sys_kernel_sigprocmask. However sys_sigprocmask is missing because of this. To avoid macro expansion for the system call name just concatenate the name at first define instead of carrying it through severel levels. This was pointed out by Al Viro. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: WANG Cong <wangcong@zeuux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-11cgroups: fix lockdep subclasses overflowLi Zefan
I enabled all cgroup subsystems when compiling kernel, and then: # mount -t cgroup -o net_cls xxx /mnt # mkdir /mnt/0 This showed up immediately: BUG: MAX_LOCKDEP_SUBCLASSES too low! turning off the locking correctness validator. It's caused by the cgroup hierarchy lock: for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { struct cgroup_subsys *ss = subsys[i]; if (ss->root == root) mutex_lock_nested(&ss->hierarchy_mutex, i); } Now we have 9 cgroup subsystems, and the above 'i' for net_cls is 8, but MAX_LOCKDEP_SUBCLASSES is 8. This patch uses different lockdep keys for different subsystems. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-11Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: fix TIMER_ABSTIME for process wide cpu timers timers: split process wide cpu clocks/timers, fix x86: clean up hpet timer reinit timers: split process wide cpu clocks/timers, remove spurious warning timers: split process wide cpu clocks/timers signal: re-add dead task accumulation stats. x86: fix hpet timer reinit for x86_64 sched: fix nohz load balancer on cpu offline
2009-02-11x86, ptrace, mm: fix double-free on raceMarkus Metzger
Ptrace_detach() races with __ptrace_unlink() if the traced task is reaped while detaching. This might cause a double-free of the BTS buffer. Change the ptrace_detach() path to only do the memory accounting in ptrace_bts_detach() and leave the buffer free to ptrace_bts_untrace() which will be called from __ptrace_unlink(). The fix follows a proposal from Oleg Nesterov. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-11timers: fix TIMER_ABSTIME for process wide cpu timersPeter Zijlstra
The POSIX timer interface allows for absolute time expiry values through the TIMER_ABSTIME flag, therefore we have to synchronize the timer to the clock every time we start it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-11timers: split process wide cpu clocks/timers, fixPeter Zijlstra
To decrease the chance of a missed enable, always enable the timer when we sample it, we'll always disable it when we find that there are no active timers in the jiffy tick. This fixes a flood of warnings reported by Mike Galbraith. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-10pkt_sched: type should be __u32 in headerChuck Ebbert
Using u32 in this header breaks the build of iptables. Signed-off-by: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-10hugetlbfs: fix build failure with !CONFIG_HUGETLBFSStefan Richter
Fix regression due to 5a6fe125950676015f5108fb71b2a67441755003, "Do not account for the address space used by hugetlbfs using VM_ACCOUNT" which added an argument to the function hugetlb_file_setup() but not to the macro hugetlb_file_setup(). Reported-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits) bridge: Fix LRO crash with tun IPv6: fix to set device name when new IPv6 over IPv6 tunnel device is created. gianfar: Fix boot hangs while bringing up gianfar ethernet netfilter: xt_sctp: sctp chunk mapping doesn't work netfilter: ctnetlink: fix echo if not subscribed to any multicast group netfilter: ctnetlink: allow changing NAT sequence adjustment in creation netfilter: nf_conntrack_ipv6: don't track ICMPv6 negotiation message netfilter: fix tuple inversion for Node information request netxen: fix msi-x interrupt handling de2104x: force correct order when writing to rx ring tun: Fix unicast filter overflow drivers/isdn: introduce missing kfree drivers/atm: introduce missing kfree sunhme: Don't match PCI devices in SBUS probe. 9p: fix endian issues [attempt 3] net_dma: call dmaengine_get only if NET_DMA enabled 3c509: Fix resume from hibernation for PnP mode. sungem: Soft lockup in sungem on Netra AC200 when switching interface up RxRPC: Fix a potential NULL dereference r8169: Don't update statistics counters when interface is down ...
2009-02-10Do not account for the address space used by hugetlbfs using VM_ACCOUNTMel Gorman
When overcommit is disabled, the core VM accounts for pages used by anonymous shared, private mappings and special mappings. It keeps track of VMAs that should be accounted for with VM_ACCOUNT and VMAs that never had a reserve with VM_NORESERVE. Overcommit for hugetlbfs is much riskier than overcommit for base pages due to contiguity requirements. It avoids overcommiting on both shared and private mappings using reservation counters that are checked and updated during mmap(). This ensures (within limits) that hugepages exist in the future when faults occurs or it is too easy to applications to be SIGKILLed. As hugetlbfs makes its own reservations of a different unit to the base page size, VM_ACCOUNT should never be set. Even if the units were correct, we would double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may be set because an application can request no reserves be made for hugetlbfs at the risk of getting killed later. With commit fc8744adc870a8d4366908221508bb113d8b72ee, VM_NORESERVE and VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This breaks the accounting for both the core VM and hugetlbfs, can trigger an OOM storm when hugepage pools are too small lockups and corrupted counters otherwise are used. This patch brings hugetlbfs more in line with how the core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-10jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate()Jan Kara
If we race with commit code setting i_transaction to NULL, we could possibly dereference it. Proper locking requires the journal pointer (to access journal->j_list_lock), which we don't have. So we have to change the prototype of the function so that filesystem passes us the journal pointer. Also add a more detailed comment about why the function jbd2_journal_begin_ordered_truncate() does what it does and how it should be used. Thanks to Dan Carpenter <error27@gmail.com> for pointing to the suspitious code. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Joel Becker <joel.becker@oracle.com> CC: linux-ext4@vger.kernel.org CC: ocfs2-devel@oss.oracle.com CC: mfasheh@suse.de CC: Dan Carpenter <error27@gmail.com>
2009-02-10sound: Remove OSSlib stuff from linux/soundcard.hArnd Bergmann
Removed OSSlib stuff from linux/soundcard.h to fix the warnings for 'make headers_check'. This patch breaks building against OSSlib with the kernel headers instead of its own headers. It should still work with any version of the library from the 2003 onwards which provide their own headers for the latest interface. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-02-09Merge branch 'drm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/i915: select framebuffer support automatically drm/i915: add get_vblank_counter function for GM45 drm/i915: capture last_vblank count at IRQ uninstall time too drm/i915: Unlock mutex on i915_gem_fault() error path drm/i915: Quiet the message on get/setparam ioctl with an unknown value. drm/i915: skip LVDS initialization on Apple Mac Mini drm/i915: sync SDVO code with stable userland modesetting driver drm/i915: Unref the object after failing to set tiling mode. drm/i915: add fence register management to execbuf drm/i915: Return error from i915_gem_object_get_fence_reg() when failing. drm/i915: Set up an MTRR covering the GTT at driver load. drm/i915: Skip SDVO/HDMI init when the chipset tells us it's not present. drm/i915: Suppress GEM teardown on X Server exit in KMS mode. drm/radeon: fix ioremap conflict with AGP mappings i915: fix unneeded locking in i915 LVDS get modes code.
2009-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: scatterwalk - Avoid flush_dcache_page on slab pages crypto: shash - Fix tfm destruction crypto: api - Fix zeroing on free crypto: shash - Fix module refcount crypto: api - Fix algorithm test race that broke aead initialisation
2009-02-09x86: spinlocks: define dummy __raw_spin_is_contendedKyle McMartin
Architectures other than mips and x86 are not using ticket spinlocks. Therefore, the contention on the lock is meaningless, since there is nobody known to be waiting on it (arguably /fairly/ unfair locks). Dummy it out to return 0 on other architectures. Signed-off-by: Kyle McMartin <kyle@redhat.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-08async: Rename _special -> _domain for clarity.Cornelia Huck
Rename the async_*_special() functions to async_*_domain(), which describes the purpose of these functions much better. [Broke up long lines to silence checkpatch] Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2009-02-08drm/i915: add fence register management to execbufJesse Barnes
Adds code to set up fence registers at execbuf time on pre-965 chips as necessary. Also fixes up a few bugs in the pre-965 tile register support (get_order != ffs). The number of fences available to the kernel defaults to the hw limit minus 3 (for legacy X front/back/depth), but a new parameter allows userspace to override that as needed. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-02-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI PM: make the PM core more careful with drivers using the new PM framework PCI PM: Read power state from device after trying to change it on resume PCI PM: Do not disable and enable bridges during suspend-resume PCI: PCIe portdrv: Simplify suspend and resume PCI PM: Fix saving of device state in pci_legacy_suspend PCI PM: Check if the state has been saved before trying to restore it PCI PM: Fix handling of devices without drivers PCI: return error on failure to read PCI ROMs PCI: properly clean up ASPM link state on device remove
2009-02-07module: remove over-zealous check in __module_get()Rusty Russell
Impact: fix spurious BUG_ON() triggered under load module_refcount() isn't reliable outside stop_machine(), as demonstrated by Karsten Keil <kkeil@suse.de>, networking can trigger it under load (an inc on one cpu and dec on another while module_refcount() is tallying can give false results, for example). Almost noone should be using __module_get, but that's another issue. Cc: Karsten Keil <kkeil@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-07Merge branches 'release', 'asus', 'bugzilla-12450', 'cpuidle', 'debug', ↵Len Brown
'ec', 'misc', 'printk' and 'processor' into release
2009-02-06net_dma: call dmaengine_get only if NET_DMA enabledDavid S. Miller
Based upon a patch from Atsushi Nemoto <anemo@mba.ocn.ne.jp> -------------------- The commit 649274d993212e7c23c0cb734572c2311c200872 ("net_dma: acquire/release dma channels on ifup/ifdown") added unconditional call of dmaengine_get() to net_dma. The API should be called only if NET_DMA was enabled. -------------------- Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Dan Williams <dan.j.williams@intel.com>
2009-02-07ACPI: Enable bit 11 in _PDC to advertise hw coordPallipadi, Venkatesh
Bit 11 in intel PDC definitions is meant for OS capability to handle hardware coordination of P-states. In Linux we have always supported hwardware coordination of P-states. Just let the BIOSes know that we support it, by setting this bit. Some BIOSes use this bit to choose between hardware or software coordination and without this change below, BIOSes switch to software coordination, which is not very optimal in terms of power consumption and extra wakeups from idle. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2009-02-06timers: split process wide cpu clocks/timers, remove spurious warningIngo Molnar
Mike Galbraith reported that the new warning in thread_group_cputimer() triggers en masse with Amarok running. Oleg Nesterov observed: Can't fastpath_timer_check()->thread_group_cputimer() have the false warning too? Suppose we had the timer, then posix_cpu_timer_del() removes this timer, but task_cputime_zero(&sig->cputime_expires) still not true. Remove the spurious debug warning. Reported-by: Mike Galbraith <efault@gmx.de> Explained-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-05atyfb: fix CONFIG_ namespace violationsRandy Dunlap
Fix namespace violations by changing non-kconfig CONFIG_ names to CNFG_*. Fixes breakage in staging/, which adds a real CONFIG_PANEL. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05wait: prevent exclusive waiter starvationJohannes Weiner
With exclusive waiters, every process woken up through the wait queue must ensure that the next waiter down the line is woken when it has finished. Interruptible waiters don't do that when aborting due to a signal. And if an aborting waiter is concurrently woken up through the waitqueue, noone will ever wake up the next waiter. This has been observed with __wait_on_bit_lock() used by lock_page_killable(): the first contender on the queue was aborting when the actual lock holder woke it up concurrently. The aborted contender didn't acquire the lock and therefor never did an unlock followed by waking up the next waiter. Add abort_exclusive_wait() which removes the process' wait descriptor from the waitqueue, iff still queued, or wakes up the next waiter otherwise. It does so under the waitqueue lock. Racing with a wake up means the aborting process is either already woken (removed from the queue) and will wake up the next waiter, or it will remove itself from the queue and the concurrent wake up will apply to the next waiter after it. Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and __wait_on_bit_lock() when they were interrupted by other means than a wake up through the queue. [akpm@linux-foundation.org: coding-style fixes] Reported-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Mentored-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chuck Lever <cel@citi.umich.edu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> ["after some testing"] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05fbmem: don't call copy_from/to_user() with mutex heldAndrea Righi
Avoid calling copy_from/to_user() with fb_info->lock mutex held in fbmem ioctl(). fb_mmap() is called under mm->mmap_sem (A) held, that also acquires fb_info->lock (B); fb_ioctl() takes fb_info->lock (B) and does copy_from/to_user() that might acquire mm->mmap_sem (A), causing a deadlock. NOTE: it doesn't push down the fb_info->lock in each own driver's fb_ioctl(), so there are still potential deadlocks elsewhere. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Johannes Weiner <hannes@saeurebad.de> Cc: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05generic swap(): don't return a value from swap()Peter Zijlstra
The swap() macro is accidentally retuning the value of its first argument. Change it into a doesn't-return-anything macro before someone goes and relies upon this behaviour. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wu Fengguang <wfg@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05timers: split process wide cpu clocks/timersPeter Zijlstra
Change the process wide cpu timers/clocks so that we: 1) don't mess up the kernel with too many threads, 2) don't have a per-cpu allocation for each process, 3) have no impact when not used. In order to accomplish this we're going to split it into two parts: - clocks; which can take all the time they want since they run from user context -- ie. sys_clock_gettime(CLOCK_PROCESS_CPUTIME_ID) - timers; which need constant time sampling but since they're explicity used, the user can pay the overhead. The clock readout will go back to a full sum of the thread group, while the timers will run of a global 'clock' that only runs when needed, so only programs that make use of the facility pay the price. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-05signal: re-add dead task accumulation stats.Peter Zijlstra
We're going to split the process wide cpu accounting into two parts: - clocks; which can take all the time they want since they run from user context. - timers; which need constant time tracing but can affort the overhead because they're default off -- and rare. The clock readout will go back to a full sum of the thread group, for this we need to re-add the exit stats that were removed in the initial itimer rework (f06febc9: timers: fix itimer/many thread hang). Furthermore, since that full sum can be rather slow for large thread groups and we have the complete dead task stats, revert the do_notify_parent time computation. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>