aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-11-02rcu: Fix long-grace-period race between forcing and initializationPaul E. McKenney
Very long RCU read-side critical sections (50 milliseconds or so) can cause a race between force_quiescent_state() and rcu_start_gp() as follows on kernel builds with multi-level rcu_node hierarchies: 1. CPU 0 calls force_quiescent_state(), sees that there is a grace period in progress, and acquires ->fsqlock. 2. CPU 1 detects the end of the grace period, and so cpu_quiet_msk_finish() sets rsp->completed to rsp->gpnum. This operation is carried out under the root rnp->lock, but CPU 0 has not yet acquired that lock. Note that rsp->signaled is still RCU_SAVE_DYNTICK from the last grace period. 3. CPU 1 calls rcu_start_gp(), but no one wants a new grace period, so it drops the root rnp->lock and returns. 4. CPU 0 acquires the root rnp->lock and picks up rsp->completed and rsp->signaled, then drops rnp->lock. It then enters the RCU_SAVE_DYNTICK leg of the switch statement. 5. CPU 2 invokes call_rcu(), and now needs a new grace period. It calls rcu_start_gp(), which acquires the root rnp->lock, sets rsp->signaled to RCU_GP_INIT (too bad that CPU 0 is already in the RCU_SAVE_DYNTICK leg of the switch statement!) and starts initializing the rcu_node hierarchy. If there are multiple levels to the hierarchy, it will drop the root rnp->lock and initialize the lower levels of the hierarchy. 6. CPU 0 notes that rsp->completed has not changed, which permits both CPU 2 and CPU 0 to try updating it concurrently. If CPU 0's update prevails, later calls to force_quiescent_state() can count old quiescent states against the new grace period, which can in turn result in premature ending of grace periods. Not good. This patch adds an RCU_GP_IDLE state for rsp->signaled that is set initially at boot time and any time a grace period ends. This prevents CPU 0 from getting into the workings of force_quiescent_state() in step 4. Additional locking and checks prevent the concurrent update of rsp->signaled in step 6. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1256742889199-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02uids: Prevent tear down raceThomas Gleixner
Ingo triggered the following warning: WARNING: at lib/debugobjects.c:255 debug_print_object+0x42/0x50() Hardware name: System Product Name ODEBUG: init active object type: timer_list Modules linked in: Pid: 2619, comm: dmesg Tainted: G W 2.6.32-rc5-tip+ #5298 Call Trace: [<81035443>] warn_slowpath_common+0x6a/0x81 [<8120e483>] ? debug_print_object+0x42/0x50 [<81035498>] warn_slowpath_fmt+0x29/0x2c [<8120e483>] debug_print_object+0x42/0x50 [<8120ec2a>] __debug_object_init+0x279/0x2d7 [<8120ecb3>] debug_object_init+0x13/0x18 [<810409d2>] init_timer_key+0x17/0x6f [<81041526>] free_uid+0x50/0x6c [<8104ed2d>] put_cred_rcu+0x61/0x72 [<81067fac>] rcu_do_batch+0x70/0x121 debugobjects warns about an enqueued timer being initialized. If CONFIG_USER_SCHED=y the user management code uses delayed work to remove the user from the hash table and tear down the sysfs objects. free_uid is called from RCU and initializes/schedules delayed work if the usage count of the user_struct is 0. The init/schedule happens outside of the uidhash_lock protected region which allows a concurrent caller of find_user() to reference the about to be destroyed user_struct w/o preventing the work from being scheduled. If the next free_uid call happens before the work timer expired then the active timer is initialized and the work scheduled again. The race was introduced in commit 5cb350ba (sched: group scheduling, sysfs tunables) and made more prominent by commit 3959214f (sched: delayed cleanup of user_struct) Move the init/schedule_delayed_work inside of the uidhash_lock protected region to prevent the race. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: stable@kernel.org
2009-10-28futex: Fix spurious wakeup for requeue_pi reallyThomas Gleixner
The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr == NULL test) nor does it use the wake_list of futex_wake() which where the reason for commit 41890f2 (futex: Handle spurious wake up) See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com> The changes in this fix to the wait_requeue_pi path were considered to be a likely unecessary, but harmless safety net. But it turns out that due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined as EAGAIN we built an endless loop in the code path which returns correctly EWOULDBLOCK. Spurious wakeups in wait_requeue_pi code path are unlikely so we do the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let it deal with the spurious wakeup. Cc: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Stultz <johnstul@linux.vnet.ibm.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> LKML-Reference: <4AE23C74.1090502@us.ibm.com> Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-16futex: Move drop_futex_key_refs out of spinlock'ed regionDarren Hart
When requeuing tasks from one futex to another, the reference held by the requeued task to the original futex location needs to be dropped eventually. Dropping the reference may ultimately lead to a call to "iput_final" and subsequently call into filesystem- specific code - which may be non-atomic. It is therefore safer to defer this drop operation until after the futex_hash_bucket spinlock has been dropped. Originally-From: Helge Bahmann <hcb@chaoticmind.net> Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: <stable@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@linux.vnet.ibm.com> Cc: Sven-Thorsten Dietrich <sdietrich@novell.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <4AD7A298.5040802@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hangPaul E. McKenney
If the following sequence of events occurs, then TREE_PREEMPT_RCU will hang waiting for a grace period to complete, eventually OOMing the system: o A TREE_PREEMPT_RCU build of the kernel is booted on a system with more than 64 physical CPUs present (32 on a 32-bit system). Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted with RCU_FANOUT set to a sufficiently small value that the physical CPUs populate two or more leaf rcu_node structures. o A task is preempted in an RCU read-side critical section while running on a CPU corresponding to a given leaf rcu_node structure. o All CPUs corresponding to this same leaf rcu_node structure record quiescent states for the current grace period. o All of these same CPUs go offline (hence the need for enough physical CPUs to populate more than one leaf rcu_node structure). This causes the preempted task to be moved to the root rcu_node structure. At this point, there is nothing left to cause the quiescent state to be propagated up the rcu_node tree, so the current grace period never completes. The simplest fix, especially after considering the deadlock possibilities, is to detect this situation when the last CPU is offlined, and to set that CPU's ->qsmask bit in its leaf rcu_node structure. This will cause the next invocation of force_quiescent_state() to end the grace period. Without this fix, this hang can be triggered in an hour or so on some machines with rcutorture and random CPU onlining/offlining. With this fix, these same machines pass a full 10 hours of this sort of abuse. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <20091015162614.GA19131@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCUPaul E. McKenney
For the short term, map synchronize_rcu_expedited() to synchronize_rcu() for TREE_PREEMPT_RCU and to synchronize_sched_expedited() for TREE_RCU. Longer term, there needs to be a real expedited grace period for TREE_PREEMPT_RCU, but candidate patches to date are considerably more complex and intrusive. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: npiggin@suse.de Cc: jens.axboe@oracle.com LKML-Reference: <12555405592331-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15rcu: Prevent RCU IPI storms in presence of high call_rcu() loadPaul E. McKenney
As the number of callbacks on a given CPU rises, invoke force_quiescent_state() only every blimit number of callbacks (defaults to 10,000), and even then only if no other CPU has invoked force_quiescent_state() in the meantime. This should fix the performance regression reported by Nick. Reported-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: jens.axboe@oracle.com LKML-Reference: <12555405592133-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14futex: Check for NULL keys in match_futexDarren Hart
If userspace tries to perform a requeue_pi on a non-requeue_pi waiter, it will find the futex_q->requeue_pi_key to be NULL and OOPS. Check for NULL in match_futex() instead of doing explicit NULL pointer checks on all call sites. While match_futex(NULL, NULL) returning false is a little odd, it's still correct as we expect valid key references. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Dinakar Guniguntala <dino@in.ibm.com> CC: John Stultz <johnstul@us.ibm.com> Cc: stable@kernel.org LKML-Reference: <4AD60687.10306@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-13futex: Handle spurious wake upThomas Gleixner
The futex code does not handle spurious wake up in futex_wait and futex_wait_requeue_pi. The code assumes that any wake up which was not caused by futex_wake / requeue or by a timeout was caused by a signal wake up and returns one of the syscall restart error codes. In case of a spurious wake up the signal delivery code which deals with the restart error codes is not invoked and we return that error code to user space. That causes applications which actually check the return codes to fail. Blaise reported that on preempt-rt a python test program run into a exception trap. -rt exposed that due to a built in spurious wake up accelerator :) Solve this by checking signal_pending(current) in the wake up path and handle the spurious wake up case w/o returning to user space. Reported-by: Blaise Gassend <blaise@willowgarage.com> Debugged-by: Darren Hart <dvhltc@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org LKML-Reference: <new-submission>
2009-10-12Merge branch 'urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into core/urgent
2009-10-09oprofile: warn on freeing event buffer too earlyRobert Richter
A race shouldn't happen since all workqueues or handlers are canceled or flushed before the event buffer is freed. A warning is triggered now if the buffer is freed too early. Also, this patch adds some comments about event buffer protection, reworks some code and adds code to clear buffer_pos during alloc and free of the event buffer. Cc: David Rientjes <rientjes@google.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
2009-10-09oprofile: fix race condition in event_buffer freeDavid Rientjes
Looking at the 2.6.31-rc9 code, it appears there is a race condition in the event_buffer cleanup code path (shutdown). This could lead to kernel panic as some CPUs may be operating on the event buffer AFTER it has been freed. The attached patch solves the problem and makes sure CPUs check if the buffer is not NULL before they access it as some may have been spinning on the mutex while the buffer was being freed. The race may happen if the buffer is freed during pending reads. But it is not clear why there are races in add_event_entry() since all workqueues or handlers are canceled or flushed before the event buffer is freed. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
2009-10-09lockdep: Use cpu_clock() for lockstatPeter Zijlstra
Some tracepoint magic (TRACE_EVENT(lock_acquired)) relies on the fact that lock hold times are positive and uses div64 on that. That triggered a build warning on MIPS, and probably causes bad output in certain circumstances as well. Make it truly positive. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1254818502.21044.112.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-08Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: pata_atp867x: add Power Management support pata_atp867x: PIO support fixes pata_atp867x: clarifications in timings calculations and cable detection pata_atp867x: fix it to not claim MWDMA support libata: fix incorrect link online check during probe ahci: filter FPDMA non-zero offset enable for Aspire 3810T libata: make gtf_filter per-dev libata: implement more acpi filtering options libata: cosmetic updates ahci: display all AHCI 1.3 HBA capability flags (v2) pata_ali: trivial fix of a very frequent spelling mistake ahci: disable 64bit DMA by default on SB600s
2009-10-08Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: fix requeue_pi key imbalance futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions rcu: Place root rcu_node structure in separate lockdep class rcu: Make hot-unplugged CPU relinquish its own RCU callbacks rcu: Move rcu_barrier() to rcutree futex: Move exit_pi_state() call to release_mm() futex: Nullify robust lists after cleanup futex: Fix locking imbalance panic: Fix panic message visibility by calling bust_spinlocks(0) before dying rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function rcu: Clean up code based on review feedback from Josh Triplett, part 4 rcu: Clean up code based on review feedback from Josh Triplett, part 3 rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y rcu: Clean up code to address Ingo's checkpatch feedback rcu: Clean up code based on review feedback from Josh Triplett, part 2 rcu: Clean up code based on review feedback from Josh Triplett
2009-10-08Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Set correct normal_prio and prio values in sched_fork()
2009-10-08Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pci: Correct spelling in a comment x86: Simplify bound checks in the MTRR code x86: EDAC: carve out AMD MCE decoding logic initcalls: Add early_initcall() for modules x86: EDAC: MCE: Fix MCE decoding callback logic
2009-10-08Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: user local buffer variable for trace branch tracer tracing: fix warning on kernel/trace/trace_branch.c andtrace_hw_branches.c ftrace: check for failure for all conversions tracing: correct module boundaries for ftrace_release tracing: fix transposed numbers of lock_depth and preempt_count trace: Fix missing assignment in trace_ctxwake_* tracing: Use free_percpu instead of kfree tracing: Check total refcount before releasing bufs in profile_enable failure
2009-10-08Merge branch 'sparc-perf-events-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sparc-perf-events-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: mm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBA perf_event: Provide vmalloc() based mmap() backing
2009-10-08Merge branch 'perf-fixes-for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_events: Make ABI definitions available to userspace perf tools: elf_sym__is_function() should accept "zero" sized functions tracing/syscalls: Use long for syscall ret format and field definitions perf trace: Update eval_flag() flags array to match interrupt.h perf trace: Remove unused code in builtin-trace.c perf: Propagate term signal to child
2009-10-08Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, timers: Check for pending timers after (device) interrupts NOHZ: update idle state also when NOHZ is inactive
2009-10-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: ice1724: increase SPDIF and independent stereo buffer sizes ALSA: opl3: circular locking in the snd_opl3_note_on() and snd_opl3_note_off() ALSA: ICE1712/24 - Change the Multi Track Peak control (level meters) from MIXER to PCM type ALSA: hda - Fix yet another auto-mic bug in ALC268 ASoC: WM8350 capture PGA mutes are inverted ASoC: Remove absent SYNC and TDM DAI format options from i.MX SSI sound: via82xx: move DXS volume controls to PCM interface ALSA: hda - Don't pick up invalid HP pins in alc_subsystem_id() ALSA: hda - Add a workaround for ASUS A7K ALSA: hda - Fix invalid initializations for ALC861 auto mode ASoC: wm8940: Fix check on error code form snd_soc_codec_set_cache_io ASoC: Fix SND_SOC_DAPM_LINE handling
2009-10-08Merge branch 'drm-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (24 commits) drm/radeon/kms: fix vline register for second head. drm/r600: avoid assigning vb twice in blit code drm/radeon: use list_for_each_entry instead of list_for_each drm/radeon/kms: Fix AGP support for R600/RV770 family (v2) drm/radeon/kms: Fallback to non AGP when acceleration fails to initialize (v2) drm/radeon/kms: Fix RS600/RV515/R520/RS690 IRQ drm/radeon: Fix setting of bits drm/ttm: fix refcounting in ttm global code. drm/fb: add more correct 8/16/24/32 bpp fb support. drm/fb: add setcmap and fix 8-bit support. drm/radeon/kms: respect single crtc cards, only create one crtc. (v2) drm: Delete the DRM_DEBUG_KMS in drm_mode_cursor_ioctl drm/radeon/kms: add support for "Surround View" drm/radeon/kms: Fix irq handling on AVIVO hw drm/radeon/kms: R600/RV770 remove dead code and print message for wrong BIOS drm/radeon/kms: Fix R600/RV770 disable acceleration path drm/radeon/kms: Fix R600/RV770 startup path & reset drm/radeon/kms: Fix R600 write back buffer drm/radeon/kms: Remove old init path as no hw use it anymore drm/radeon/kms: Convert RS600 to new init path ...
2009-10-08Merge branch 'omap-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6 * 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: omapfb: Blizzard: constify register address tables omapfb: Blizzard: fix pointer to be const omapfb: Condition mutex acquisition omap: iovmm: Add missing mutex_unlock omap: iovmm: Fix incorrect spelling omap: SRAM: flush the right address after memcpy in omap_sram_push omap: Lock DPLL5 at boot omap: Fix incorrect 730 vs 850 detection OMAP3: PM: introduce a new powerdomain walk helper OMAP3: PM: Enable GPIO module-level wakeups OMAP3: PM: USBHOST: clear wakeup events on both hosts OMAP3: PM: PRCM interrupt: only handle selected PRCM interrupts OMAP3: PM: PRCM interrupt: check MPUGRPSEL register OMAP3: PM: Prevent hang in prcm_interrupt_handler
2009-10-08Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: amd64_edac: beef up DRAM error injection amd64_edac: fix DRAM base and limit extraction amd64_edac: fix chip select handling amd64_edac: simple fix to allow reporting of CECC errors amd64_edac: fix K8 intlv_sel check amd64_edac: fix interleave enable tests amd64_edac: fix DRAM base and limit address extraction amd64_edac: fix driver instance lookup table allocation
2009-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits) ethoc: limit the number of buffers to 128 ethoc: use system memory as buffer ethoc: align received packet to make IP header at word boundary ethoc: fix buffer address mapping ethoc: fix typo to compute number of tx descriptors au1000_eth: Duplicate test of RX_OVERLEN bit in update_rx_stats() netxen: Fix Unlikely(x) > y pasemi_mac: ethtool get settings fix add maintainer for network drop monitor kernel service tg3: Fix phylib locking strategy rndis_host: support ETHTOOL_GPERMADDR ipv4: arp_notify address list bug gigaset: add kerneldoc comments gigaset: correct debugging output selection gigaset: improve error recovery gigaset: fix device ERROR response handling gigaset: announce if built with debugging gigaset: handle isoc frame errors more gracefully gigaset: linearize skb gigaset: fix reject/hangup handling ...
2009-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6: Revert "Revert "ide: try to use PIO Mode 0 during probe if possible"" sis5513: fix PIO setup for ATAPI devices
2009-10-08x86, timers: Check for pending timers after (device) interruptsArjan van de Ven
Now that range timers and deferred timers are common, I found a problem with these using the "perf timechart" tool. Frans Pop also reported high scheduler latencies via LatencyTop, when using iwlagn. It turns out that on x86, these two 'opportunistic' timers only get checked when another "real" timer happens. These opportunistic timers have the objective to save power by hitchhiking on other wakeups, as to avoid CPU wakeups by themselves as much as possible. The change in this patch runs this check not only at timer interrupts, but at all (device) interrupts. The effect is that: 1) the deferred timers/range timers get delayed less 2) the range timers cause less wakeups by themselves because the percentage of hitchhiking on existing wakeup events goes up. I've verified the working of the patch using "perf timechart", the original exposed bug is gone with this patch. Frans also reported success - the latencies are now down in the expected ~10 msec range. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Tested-by: Frans Pop <elendil@planet.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091008064041.67219b13@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-08mm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBADavid Miller
When a vmalloc'd area is mmap'd into userspace, some kind of co-ordination is necessary for this to work on platforms with cpu D-caches which can have aliases. Otherwise kernel side writes won't be seen properly in userspace and vice versa. If the kernel side mapping and the user side one have the same alignment, modulo SHMLBA, this can work as long as VM_SHARED is shared of VMA and for all current users this is true. VM_SHARED will force SHMLBA alignment of the user side mmap on platforms with D-cache aliasing matters. The bulk of this patch is just making it so that a specific alignment can be passed down into __get_vm_area_node(). All existing callers pass in '1' which preserves existing behavior. vmalloc_user() gives SHMLBA for the alignment. As a side effect this should get the video media drivers and other vmalloc_user() users into more working shape on such systems. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <200909211922.n8LJMYjw029425@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-08Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6 * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6: agp: parisc-agp.c - use correct page_mask function parisc: Fix linker script breakage. parisc: convert to asm-generic/hardirq.h parisc: Make THREAD_SIZE available to assembly files and linker scripts. parisc: correct use of SHF_ALLOC parisc: rename parisc's vmalloc_start to parisc_vmalloc_start parisc: add me to Maintainers parisc: includecheck fix: signal.c parisc: HAVE_ARCH_TRACEHOOK parisc: add skeleton syscall.h parisc: stop using task->ptrace for {single,block}step flags parisc: split syscall_trace into two halves parisc: add missing TI_TASK macro in syscall.S parisc: tracehook_signal_handler parisc: tracehook_report_syscall
2009-10-08lis3lv02d_spi: module unload didn't remove sysfs entrySamu Onkalo
In module unload, lis3lv02d core driver sysfs clean up was not called. Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com> Acked-by: Daniel Mack <daniel@caiaq.de> Cc: Éric Piel <eric.piel@tremplin-utc.net> Cc: "Trisal, Kalhan" <kalhan.trisal@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08mmc: sdio: don't require CISTPL_VERS_1 to contain 4 stringsDavid Vrabel
The PC Card 8.0 specification (vol. 4, section 3.2.10) says the TPLLV1_INFO field of the CISTPL_VERS_1 tuple must contain 4 strings. Some cards don't have all 4 so just parse as many as we can. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Vrabel <david.vrabel@csr.com> Tested-by: Jonathan Cameron <jic23@cam.ac.uk> Tested-by: Bing Zhao <bzhao@marvell.com> Cc: Roel Kluin <roel.kluin@gmail.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08page-types: add hwpoison/unpoison featureWu Fengguang
For hwpoison stress testing. The debugfs mount point is assumed to be /debug/. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08page-types: introduce kpageflags_flags()Wu Fengguang
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08page-types: make voffset local variablesWu Fengguang
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08page-types: make standalone pagemap/kpageflags read routinesWu Fengguang
Refactor the code to be more modular and easier to reuse. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08page-types: introduce checked_open()Wu Fengguang
This helps merge duplicate code (now and future) and outstand the main logic. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08page-types: add GPL noteWu Fengguang
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08pagemap: document KPF_KSM and show it in page-typesWu Fengguang
It indicates to the system admin that processes mapping such pages may be eating less physical memory than the reported numbers by legacy tools. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Acked-by: Chris Wright <chrisw@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08pagemap: export KPF_HWPOISONWu Fengguang
This flag indicates a hardware detected memory corruption on the page. Any future access of the page data may bring down the machine. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08cgroups: update documentation of cgroups tasks and procs filesPaul Menage
Update documentation of cgroups tasks and procs files Document the cgroup.procs file. Clarify the semantics of the cgroup.procs and tasks files. Although the current cgroup.procs interface returns a sorted and uniqified list of pids, potential future performance enhancements could result in those properties being removed - explicitly document this aspect of the API. There are no existing users of cgroup.procs, so compatibility isn't an issue. There are users of the "tasks" file, but none that would appear to break in the event of the sorted property being broken. The standard "libcpuset" explicitly sorts the results of reading from the tasks file, and "libcg" and other users don't appear to care about ordering. Signed-off-by: Paul Menage <menage@google.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08video: includecheck fix: da8xx-fb.cJaswinder Singh Rajput
fix the following 'make includecheck' warning: drivers/video/da8xx-fb.c: linux/device.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08video: includecheck fix: msm, mddi.cJaswinder Singh Rajput
fix the following 'make includecheck' warning: drivers/video/msm/mddi.c: linux/delay.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08fs: includecheck fix: proc, kcore.cJaswinder Singh Rajput
fix the following 'make includecheck' warning: fs/proc/kcore.c: linux/mm.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08mm: includecheck fix: vmalloc.cJaswinder Singh Rajput
fix the following 'make includecheck' warning: mm/vmalloc.c: linux/highmem.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08ksm: more on default valuesHugh Dickins
Adjust the max_kernel_pages default to a quarter of totalram_pages, instead of nr_free_buffer_pages() / 4: the KSM pages themselves come from highmem, and even on a 16GB PAE machine, 4GB of KSM pages would only be pinning 32MB of lowmem with their rmap_items, so no need for the more obscure calculation (nor for its own special init function). There is no way for the user to switch KSM on if CONFIG_SYSFS is not enabled, so in that case default run to KSM_RUN_MERGE. Update KSM Documentation and Kconfig to reflect the new defaults. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-08Merge branch 'fix/misc' into for-linusTakashi Iwai
2009-10-08Merge branch 'fix/hda' into for-linusTakashi Iwai
2009-10-08ALSA: ice1724: increase SPDIF and independent stereo buffer sizesRobert Hancock
Increase the default and maximum PCM buffer prellocation size for ice1724's SPDIF and independent stereo pair outputs to 256K, which is the hardware's maximum supported size. This allows a reduction in interrupt rate and potentially power usage when an application is not latency-critical. Signed-off-by: Robert Hancock <hancockrwd@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-10-08ALSA: opl3: circular locking in the snd_opl3_note_on() and snd_opl3_note_off()Krzysztof Helt
Fix following circular locking in the opl3 driver. ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc3 #87 ------------------------------------------------------- swapper/0 is trying to acquire lock: (&opl3->voice_lock){..-...}, at: [<cca748fe>] snd_opl3_note_off+0x1e/0xe0 [snd_opl3_synth] but task is already holding lock: (&opl3->sys_timer_lock){..-...}, at: [<cca75169>] snd_opl3_timer_func+0x19/0xc0 [snd_opl3_synth] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&opl3->sys_timer_lock){..-...}: [<c02461d5>] validate_chain+0xa25/0x1040 [<c0246aca>] __lock_acquire+0x2da/0xab0 [<c024731a>] lock_acquire+0x7a/0xa0 [<c044c300>] _spin_lock_irqsave+0x40/0x60 [<cca75046>] snd_opl3_note_on+0x686/0x790 [snd_opl3_synth] [<cca68912>] snd_midi_process_event+0x322/0x590 [snd_seq_midi_emul] [<cca74245>] snd_opl3_synth_event_input+0x15/0x20 [snd_opl3_synth] [<cca4dcc0>] snd_seq_deliver_single_event+0x100/0x200 [snd_seq] [<cca4de07>] snd_seq_deliver_event+0x47/0x1f0 [snd_seq] [<cca4e50b>] snd_seq_dispatch_event+0x3b/0x140 [snd_seq] [<cca5008c>] snd_seq_check_queue+0x10c/0x120 [snd_seq] [<cca5037b>] snd_seq_enqueue_event+0x6b/0xe0 [snd_seq] [<cca4e0fd>] snd_seq_client_enqueue_event+0xdd/0x100 [snd_seq] [<cca4eb7a>] snd_seq_write+0xea/0x190 [snd_seq] [<c02827b6>] vfs_write+0x96/0x160 [<c0282c9d>] sys_write+0x3d/0x70 [<c0202c45>] syscall_call+0x7/0xb -> #0 (&opl3->voice_lock){..-...}: [<c02467e6>] validate_chain+0x1036/0x1040 [<c0246aca>] __lock_acquire+0x2da/0xab0 [<c024731a>] lock_acquire+0x7a/0xa0 [<c044c300>] _spin_lock_irqsave+0x40/0x60 [<cca748fe>] snd_opl3_note_off+0x1e/0xe0 [snd_opl3_synth] [<cca751f0>] snd_opl3_timer_func+0xa0/0xc0 [snd_opl3_synth] [<c022ac46>] run_timer_softirq+0x166/0x1e0 [<c02269e8>] __do_softirq+0x78/0x110 [<c0226ac6>] do_softirq+0x46/0x50 [<c0226e26>] irq_exit+0x36/0x40 [<c0204bd2>] do_IRQ+0x42/0xb0 [<c020328e>] common_interrupt+0x2e/0x40 [<c021092f>] apm_cpu_idle+0x10f/0x290 [<c0201b11>] cpu_idle+0x21/0x40 [<c04443cd>] rest_init+0x4d/0x60 [<c055c835>] start_kernel+0x235/0x280 [<c055c066>] i386_start_kernel+0x66/0x70 other info that might help us debug this: 2 locks held by swapper/0: #0: (&opl3->tlist){+.-...}, at: [<c022abd0>] run_timer_softirq+0xf0/0x1e0 #1: (&opl3->sys_timer_lock){..-...}, at: [<cca75169>] snd_opl3_timer_func+0x19/0xc0 [snd_opl3_synth] stack backtrace: Pid: 0, comm: swapper Not tainted 2.6.32-rc3 #87 Call Trace: [<c0245188>] print_circular_bug+0xc8/0xd0 [<c02467e6>] validate_chain+0x1036/0x1040 [<c0247f14>] ? check_usage_forwards+0x54/0xd0 [<c0246aca>] __lock_acquire+0x2da/0xab0 [<c024731a>] lock_acquire+0x7a/0xa0 [<cca748fe>] ? snd_opl3_note_off+0x1e/0xe0 [snd_opl3_synth] [<c044c300>] _spin_lock_irqsave+0x40/0x60 [<cca748fe>] ? snd_opl3_note_off+0x1e/0xe0 [snd_opl3_synth] [<cca748fe>] snd_opl3_note_off+0x1e/0xe0 [snd_opl3_synth] [<c044c307>] ? _spin_lock_irqsave+0x47/0x60 [<cca751f0>] snd_opl3_timer_func+0xa0/0xc0 [snd_opl3_synth] [<c022ac46>] run_timer_softirq+0x166/0x1e0 [<c022abd0>] ? run_timer_softirq+0xf0/0x1e0 [<cca75150>] ? snd_opl3_timer_func+0x0/0xc0 [snd_opl3_synth] [<c02269e8>] __do_softirq+0x78/0x110 [<c044c0fd>] ? _spin_unlock+0x1d/0x20 [<c025915f>] ? handle_level_irq+0xaf/0xe0 [<c0226ac6>] do_softirq+0x46/0x50 [<c0226e26>] irq_exit+0x36/0x40 [<c0204bd2>] do_IRQ+0x42/0xb0 [<c024463c>] ? trace_hardirqs_on_caller+0x12c/0x180 [<c020328e>] common_interrupt+0x2e/0x40 [<c0208d88>] ? default_idle+0x38/0x50 [<c021092f>] apm_cpu_idle+0x10f/0x290 [<c0201b11>] cpu_idle+0x21/0x40 [<c04443cd>] rest_init+0x4d/0x60 [<c055c835>] start_kernel+0x235/0x280 [<c055c210>] ? unknown_bootoption+0x0/0x210 [<c055c066>] i386_start_kernel+0x66/0x70 Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Takashi Iwai <tiwai@suse.de>