Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2008-06-20	Merge branch 'sched-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched, delay accounting: fix incorrect delay time when constantly waiting on runqueue sched: CPU hotplug events must not destroy scheduler domains created by the cpusets sched: rt-group: fix RR buglet sched: rt-group: heirarchy aware throttle sched: rt-group: fix hierarchy sched: NULL pointer dereference while setting sched_rt_period_us sched: fix defined-but-unused warning
2008-06-20	Merge branch 'x86-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, geode: add a VSA2 ID for General Software x86: use BOOTMEM_EXCLUSIVE on 32-bit x86, 32-bit: fix boot failure on TSC-less processors x86: fix NULL pointer deref in __switch_to x86: set PAE PHYSICAL_MASK_SHIFT to 44 bits.
2008-06-20	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: Blackfin Serial Driver: Use timer to poll CTS PIN instead of workqueue. Blackfin arch: fix typo error in bf548 serial header file
2008-06-20	Merge branch 'upstream-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: ahci: sis can't do PMP ata_piix: add TECRA M4 to broken suspend list LIBATA: Add HAVE_PATA_PLATFORM to select PATA_PLATFORM driver sata_mv: warn on PIO with multiple DRQs sata_mv: enable async_notify for 60x1 Rev.C0 and higher libata: don't check whether to use DMA or not for no data commands ahci: jmb361 has only one port
2008-06-20	[watchdog] hpwdt: fix use of inline assembly	Linus Torvalds
	The inline assembly in drivers/watchdog/hpwdt.c was incredibly broken, and included all the function prologue and epilogue stuff, even though it was itself then inside a C function where the compiler would add its own prologue and epilogue on top of it all. This then just _happened_ to work if you had exactly the right compiler version and exactly the right compiler flags, so that gcc just happened to not create any prologue at all (the gcc-generated epilogue wouldn't matter, since it would never be reached). But the more proper way to fix it is to simply not do this. Move the inline asm to the top level, with no surrounding function at all (the better alternative would be to remove the prologue and make it actually use proper description of the arguments to the inline asm, but that's a bigger change than the one I'm willing to make right now). Tested-by: S.Çağlar Onur <caglar@pardus.org.tr> Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-20	[IA64] SN2: security hole in sn2_ptc_proc_write	Cliff Wickman
	Security hole in sn2_ptc_proc_write It is possible to overrun a buffer with a write to this /proc file. Signed-off-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-06-20	BAST: Remove old IDE driver	Ben Dooks
	Remove the old BAST IDE driver, as we are now using the platform-pata support. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Cc: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-06-20	pcmcia ide kingston compactflash's have a new manufacturer id	Christophe Niclaes
	Up to now, Kingston compactflash cards (ab)used the Toshiba Manufacturer's ID, In their new CF cards, they use a new one. Let's the ide subsystem recognize CF cards with the new id. Signed-off-by: Christophe Niclaes <cniclaes@develtech.com> Acked-by: Philippe De Muyter <phdm@macqel.be> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-06-20	pcmcia: add another pata/ide ID	Kristoffer Ericson
	Addition of Transcend 1GB 45x id so that it is properly detected. [bart: fix typo in ide-cs's ID spotted by Alan Cox] Signed-off-by: William Peters <w1ll14@gmail.com> Signed-off-by: Kristoffer Ericson <Kristoffer_e1@hotmail.com> CC: Alan Cox <alan@lxorguk.ukuu.org.uk> CC: linux-ide@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-06-20	pcmcia: add an pata/ide ID	Matt Reimer
	Add an id for: product info: "M-Systems", "CF300", "" manfid: 0x000a, 0x0000 function: 4 (fixed disk) Signed-off-by: Matt Reimer <mreimer@vpop.net> CC: Alan Cox <alan@lxorguk.ukuu.org.uk> CC: linux-ide@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-06-20	ide: increase timeout in wait_drive_not_busy()	Bartlomiej Zolnierkiewicz
	Some ATAPI devices take longer than the current max timeout value to become ready (i.e. TEAC DV-W28ECW takes 6 ms) so increase the timeout value to 10 ms. This fixes kernel.org bugzilla bug #10887: http://bugzilla.kernel.org/show_bug.cgi?id=10887 Reported-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-06-20	palm_bk3710: fix resource management	Sergei Shtylyov
	The driver expected a virtual address in the IDE platform device's memory resource and didn't request the memory region for the register block. Fix this taking into account the fact that DaVinci SoC devices are fixed-mapped to the virtual memory early and we can get their virtual addresses using IO_ADDRESS() macro, not having to call ioremap()... While at it, also do some cosmetic changes... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-06-20	Reinstate ZERO_PAGE optimization in 'get_user_pages()' and fix XIP	Linus Torvalds
	KAMEZAWA Hiroyuki and Oleg Nesterov point out that since the commit 557ed1fa2620dc119adb86b34c614e152a629a80 ("remove ZERO_PAGE") removed the ZERO_PAGE from the VM mappings, any users of get_user_pages() will generally now populate the VM with real empty pages needlessly. We used to get the ZERO_PAGE when we did the "handle_mm_fault()", but since fault handling no longer uses ZERO_PAGE for new anonymous pages, we now need to handle that special case in follow_page() instead. In particular, the removal of ZERO_PAGE effectively removed the core file writing optimization where we would skip writing pages that had not been populated at all, and increased memory pressure a lot by allocating all those useless newly zeroed pages. This reinstates the optimization by making the unmapped PTE case the same as for a non-existent page table, which already did this correctly. While at it, this also fixes the XIP case for follow_page(), where the caller could not differentiate between the case of a page that simply could not be used (because it had no "struct page" associated with it) and a page that just wasn't mapped. We do that by simply returning an error pointer for pages that could not be turned into a "struct page *". The error is arbitrarily picked to be EFAULT, since that was what get_user_pages() already used for the equivalent IO-mapped page case. [ Also removed an impossible test for pte_offset_map_lock() failing: that's not how that function works ] Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Nick Piggin <npiggin@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-20	Ext4: Fix online resize block group descriptor corruption	Frederic Bohe
	This is the patch for the group descriptor table corruption during online resize pointed out by Theodore Tso. The problem was caused by the fact that the ext4 group descriptor can be either 32 or 64 bytes long. Only the 64 bytes structure was taken into account. Signed-off-by: Frederic Bohe <frederic.bohe@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-06-20	sched: refactor wait_for_completion_timeout()	Oleg Nesterov
	Simplify the code and fix the boundary condition of wait_for_completion_timeout(,0). We can kill the first __remove_wait_queue() as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2008-06-20	xen: don't drop NX bit	Jeremy Fitzhardinge
	Because NX is now enforced properly, we must put the hypercall page into the .text segment so that it is executable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20	xen: mask unwanted pte bits in __supported_pte_mask	Jeremy Fitzhardinge
	[ Stable: this isn't a bugfix in itself, but it's a pre-requiste for "xen: don't drop NX bit" ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20	xen: Use wmb instead of rmb in xen_evtchn_do_upcall().	Isaku Yamahata
	This patch is ported one from 534:77db69c38249 of linux-2.6.18-xen.hg. Use wmb instead of rmb to enforce ordering between evtchn_upcall_pending and evtchn_pending_sel stores in xen_evtchn_do_upcall(). Cc: Samuel Thibault <samuel.thibault@eu.citrix.com> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20	x86: fix NULL pointer deref in __switch_to	Suresh Siddha
	I am able to reproduce the oops reported by Simon in __switch_to() with lguest. My debug showed that there is at least one lguest specific issue (which should be present in 2.6.25 and before aswell) and it got exposed with a kernel oops with the recent fpu dynamic allocation patches. In addition to the previous possible scenario (with fpu_counter), in the presence of lguest, it is possible that the cpu's TS bit it still set and the lguest launcher task's thread_info has TS_USEDFPU still set. This is because of the way the lguest launcher handling the guest's TS bit. (look at lguest_set_ts() in lguest_arch_run_guest()). This can result in a DNA fault while doing unlazy_fpu() in __switch_to(). This will end up causing a DNA fault in the context of new process thats getting context switched in (as opossed to handling DNA fault in the context of lguest launcher/helper process). This is wrong in both pre and post 2.6.25 kernels. In the recent 2.6.26-rc series, this is showing up as NULL pointer dereferences or sleeping function called from atomic context(__switch_to()), as we free and dynamically allocate the FPU context for the newly created threads. Older kernels might show some FPU corruption for processes running inside of lguest. With the appended patch, my test system is running for more than 50 mins now. So atleast some of your oops (hopefully all!) should get fixed. Please give it a try. I will spend more time with this fix tomorrow. Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk> Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20	sched: fix wait_for_completion_timeout() spurious failure under heavy load	Roland Dreier
	It seems that the current implementaton of wait_for_completion_timeout() has a small problem under very high load for the common pattern: if (!wait_for_completion_timeout(&done, timeout)) /* handle failure / because the implementation very roughly does (lots of code deleted to show the basic flow): static inline long __sched do_wait_for_common(struct completion x, long timeout, int state) { if (x->done) return timeout; do { timeout = schedule_timeout(timeout); if (!timeout) return timeout; } while (!x->done); return timeout; } so if the system is very busy and x->done is not set when do_wait_for_common() is entered, it is possible that the first call to schedule_timeout() returns 0 because the task doing wait_for_completion doesn't get rescheduled for a long time, even if it is woken up early enough. In this case, wait_for_completion_timeout() returns 0 without even checking x->done again, and the code above falls into its failure case purely for scheduler reasons, even if the hardware event or whatever was being waited for happened early enough. It would make sense to add an extra test to do_wait_for() in the timeout case and return 1 if x->done is actually set. A quick audit (not exhaustive) of wait_for_completion_timeout() callers seems to indicate that no one actually cares about the return value in the success case -- they just test for 0 (timed out) versus non-zero (wait succeeded). Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20	sched: rt: dont stop the period timer when there are tasks wanting to run	Peter Zijlstra
	So if the group ever gets throttled, it will never wake up again. Reported-by: "Daniel K." <dk@uw.no> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Daniel K. <dk@uw.no> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20	Merge branch 'bugzilla-9761' into release	Len Brown

2008-06-20	Merge branch 'bugzilla-10695' into release	Len Brown

2008-06-20	drm: only trust core drm ioctls - driver ioctls are a mess.	Dave Airlie
	So driver ioctls need a full auditing before we can make this change. Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-06-20	drm/i915: add support for Intel series 4 chipsets.	Zhenyu Wang
	Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-06-20	[agp]: fixup chipset flush for new Intel G4x.	Zhenyu Wang
	Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-06-19	ipv6: Drop packets for loopback address from outside of the box.	YOSHIFUJI Hideaki
	[ Based upon original report and patch by Karsten Keil. Karsten has verified that this fixes the TAHI test case "ICMPv6 test v6LC.5.1.2 Part F". -DaveM ] Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19	ipv6: Remove options header when setsockopt's optlen is 0	Shan Wei
	Remove the sticky Hop-by-Hop options header by calling setsockopt() for IPV6_HOPOPTS with a zero option length, per RFC3542. Routing header and Destination options header does the same as Hop-by-Hop options header. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19	x86, geode: add a VSA2 ID for General Software	Jordan Crouse
	General Software writes their own VSA2 module for their version of the Geode BIOS, which returns a different ID then the standard VSA2. This was causing the framebuffer driver to break for most GSW boards. Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Cc: tglx@linutronix.de Cc: linux-geode@lists.infradead.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19	sched, delay accounting: fix incorrect delay time when constantly waiting on ↵	Bharath Ravi
	runqueue This patch corrects the incorrect value of per process run-queue wait time reported by delay statistics. The anomaly was due to the following reason. When a process leaves the CPU and immediately starts waiting for CPU on the runqueue (which means it remains in the TASK_RUNNABLE state), the time of re-entry into the run-queue is never recorded. Due to this, the waiting time on the runqueue from this point of re-entry upto the next time it hits the CPU is not accounted for. This is solved by recording the time of re-entry of a process leaving the CPU in the sched_info_depart() function IF the process will go back to waiting on the run-queue. This IF condition is verified by checking whether the process is still in the TASK_RUNNABLE state. The patch was tested on 2.6.26-rc6 using two simple CPU hog programs. The values noted prior to the fix did not account for the time spent on the runqueue waiting. After the fix, the correct values were reported back to user space. Signed-off-by: Bharath Ravi <bharathravi1@gmail.com> Signed-off-by: Madhava K R <madhavakr@gmail.com> Cc: dhaval@linux.vnet.ibm.com Cc: vatsa@in.ibm.com Cc: balbir@in.ibm.com Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19	hwmon: (lm75) sensor reading bugfix	David Brownell
	LM75 sensor reading bugfix: never save error status as valid sensor output. This could be improved, but at least this prevents certain rude failure modes. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
2008-06-19	hwmon: (abituguru3) update driver detection	Hans de Goede
	It has been reported that the abituguru3 driver fails to load after a BIOS update. This patch fixes this by loosening the detection routine so that it will work after the BIOS update too. To compensate for the now very loose detection an additional check is added on the DMI Base Board vendor string to make sure we only load on Abit motherboards, this is the same as the check in the abituguru (1 / 2) driver. Signed-of-by: Hans de Goede <j.w.r.degoede@hhs.nl> Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
2008-06-19	hwmon: (w83791d) new maintainer	Marc Hulsman
	Signed-off-by: Charles Spirakis <bezaur@gmail.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
2008-06-19	hwmon: (abituguru3) Identify Abit AW8D board as such	Hans de Goede
	This patch identifies the Abit AW8D board as such, and adds support for its aux5 fan connector Signed-off-by: Hans de Goede <j.w.r.degoede@hhs.nl> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
2008-06-19	hwmon: Update the sysfs interface documentation	Jean Delvare
	* Document the characteristics of libsensors 3.0.0 and 3.0.1. * The sysfs interface is no longer subject to changes. Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Juerg Haefliger <juergh at gmail.com> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
2008-06-19	hwmon: (adt7473) Initialize max_duty_at_overheat before use	Jean Delvare
	data->max_duty_at_overheat is not updated in adt7473_update_device, so it might be used before it is initialized (if the user reads from sysfs file max_duty_at_crit before writing to it.) Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
2008-06-19	hwmon: (lm85) Fix function RANGE_TO_REG()	Jean Delvare
	Function RANGE_TO_REG() is broken. For a requested range of 2000 (2 degrees C), it will return an index value of 15, i.e. 80.0 degrees C, instead of the expected index value of 0. All other values are handled properly, just 2000 isn't. The bug was introduced back in November 2004 by this patch: http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=1c28d80f1992240373099d863e4996cdd5d646d0 While this can be fixed easily with the current code, I'd rather rewrite the whole function in a way which is more obviously correct. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Justin Thiessen <jthiessen@penguincomputing.com> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
2008-06-19	Blackfin Serial Driver: Use timer to poll CTS PIN instead of workqueue.	Sonic Zhang
	This allows other threads to run when the serial driver polls the CTS PIN in a loop. Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org>
2008-06-19	Blackfin arch: fix typo error in bf548 serial header file	Sonic Zhang
	Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org>
2008-06-19	x86: use BOOTMEM_EXCLUSIVE on 32-bit	Bernhard Walle
	This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for i386 and prints a error message on failure. The patch is still for 2.6.26 since it is only bug fixing. The unification of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27. Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org>
2008-06-19	x86, 32-bit: fix boot failure on TSC-less processors	Mikael Pettersson
	Booting 2.6.26-rc6 on my 486 DX/4 fails with a "BUG: Int 6" (invalid opcode) and a kernel halt immediately after the kernel has been uncompressed. The BUG shows EIP pointing to an rdtsc instruction in native_read_tsc(), invoked from native_sched_clock(). (This error occurs so early that not even the serial console can capture it.) A bisection showed that this bug first occurs in 2.6.26-rc3-git7, via commit 9ccc906c97e34fd91dc6aaf5b69b52d824386910: >x86: distangle user disabled TSC from unstable > >tsc_enabled is set to 0 from the command line switch "notsc" and from >the mark_tsc_unstable code. Seperate those functionalities and replace >tsc_enable with tsc_disable. This makes also the native_sched_clock() >decision when to use TSC understandable. > >Preparatory patch to solve the sched_clock() issue on 32 bit. > >Signed-off-by: Thomas Gleixner <tglx@linutronix.de> The core reason for this bug is that native_sched_clock() gets called before tsc_init(). Before the commit above, tsc_32.c used a "tsc_enabled" variable which defaulted to 0 == disabled, and which only got enabled late in tsc_init(). Thus early calls to native_sched_clock() would skip the TSC and use jiffies instead. After the commit above, tsc_32.c uses a "tsc_disabled" variable which defaults to 0, meaning that the TSC is Ok to use. Early calls to native_sched_clock() now erroneously try to use the TSC on !cpu_has_tsc processors, leading to invalid opcode exceptions. My proposed fix is to initialise tsc_disabled to a "soft disabled" state distinct from the hard disabled state set up by the "notsc" kernel option. This fixes the native_sched_clock() problem. It also allows tsc_init() to be simplified: instead of setting tsc_disabled = 1 on every error return, we just set tsc_disabled = 0 once when all checks have succeeded. I've verified that this lets my 486 boot again. I've also verified that a Core2 machine still uses the TSC as clocksource after the patch. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19	x86: fix NULL pointer deref in __switch_to	Suresh Siddha
	Patrick McHardy reported a crash: > > I get this oops once a day, its apparently triggered by something > > run by cron, but the process is a different one each time. > > > > Kernel is -git from yesterday shortly before the -rc6 release > > (last commit is the usb-2.6 merge, the x86 patches are missing), > > .config is attached. > > > > I'll retry with current -git, but the patches that have gone in > > since I last updated don't look related. > > > > [62060.043009] BUG: unable to handle kernel NULL pointer dereference at > > 000001ff > > [62060.043009] IP: [<c0102a9b>] __switch_to+0x2f/0x118 > > [62060.043009] pde = 00000000 > > [62060.043009] Oops: 0002 [#1] PREEMPT Vegard Nossum analyzed it: > This decodes to > > 0: 0f ae 00 fxsave (%eax) > > so it's related to the floating-point context. This is the exact > location of the crash: > > $ addr2line -e arch/x86/kernel/process_32.o -i ab0 > include/asm/i387.h:232 > include/asm/i387.h:262 > arch/x86/kernel/process_32.c:595 > > ...so it looks like prev_task->thread.xstate->fxsave has become NULL. > Or maybe it never had any other value. Somehow (as described below) TS_USEDFPU is set but the fpu is not allocated or freed. Another possible FPU pre-emption issue with the sleazy FPU optimization which was benign before but not so anymore, with the dynamic FPU allocation patch. New task is getting exec'd and it is prempted at the below point. flush_thread() { ... / * Forget coprocessor state.. */ clear_fpu(tsk); <----- Preemption point clear_used_math(); ... } Now when it context switches in again, as the used_math() is still set and fpu_counter can be > 5, we will do a math_state_restore() which sets the task's TS_USEDFPU. After it continues from the above preemption point it does clear_used_math() and much later free_thread_xstate(). Now, at the next context switch, it is quite possible that xstate is null, used_math() is not set and TS_USEDFPU is still set. This will trigger unlazy_fpu() causing kernel oops. Fix this by clearing tsk's fpu_counter before clearing task's fpu. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19	x86: set PAE PHYSICAL_MASK_SHIFT to 44 bits.	Jeremy Fitzhardinge
	When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can potentially have the same number of physical address bits as the 64-bit host ("Enhanced Legacy PAE Paging"). This means, in theory, we could have up to 52 bits of physical address in a pte. The 32-bit kernel uses a 32-bit unsigned long to represent a pfn. This means that it can only represent physical addresses up to 32+12=44 bits wide. Rather than widening pfns everywhere, just set 2^44 as the Linux x86_32-PAE architectural limit for physical address size. This is a bugfix for two cases: 1. running a 32-bit PAE kernel on a machine with more than 64GB RAM. 2. running a 32-bit PAE Xen guest on a host machine with more than 64GB RAM In both cases, a pte could need to have more than 36 bits of physical, and masking it to 36-bits will cause fairly severe havoc. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19	softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression	Jason Wessel
	The touch_nmi_watchdog() routine on x86 ultimately calls touch_softlockup_watchdog(). The problem is that to touch the softlockup watchdog, the cpu_clock code has to be called which could involve multiple cpu locks and can lead to a hard hang if one of the locks is held by a processor that is not going to return anytime soon (such as could be the case with kgdb or perhaps even with some other kind of exception). This patch causes the public version of the touch_softlockup_watchdog() to defer the cpu clock access to a later point. The test case for this problem is to use the following kernel config options: CONFIG_KGDB_TESTS=y CONFIG_KGDB_TESTS_ON_BOOT=y CONFIG_KGDB_TESTS_BOOT_STRING="V1F100I100000" It should be noted that kgdb test suite and these options were not available until 2.6.26-rc2, so it was necessary to patch the kgdb test suite during the bisection. I would consider this patch a regression fix because the problem first appeared in commit 27ec4407790d075c325e1f4da0a19c56953cce23 when some logic was added to try to periodically sync the clocks. It was possible to work around this particular problem by simply not performing the sync anytime the system was in a critical context. This was ok until commit 3e51f33fcc7f55e6df25d15b55ed10c8b4da84cd, which added config option CONFIG_HAVE_UNSTABLE_SCHED_CLOCK and some multi-cpu locks to sync the clocks. It became clear that accessing this code from an nmi was the source of the lockups. Avoiding the access to the low level clock code from an code inside the NMI processing also fixed the problem with the 27ec44... commit. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19	rcupreempt: remove export of rcu_batches_completed_bh	Steven Rostedt
	In rcupreempt, rcu_batches_completed_bh is defined as a static inline in the header file. This does not need to be exported, and not only that, this breaks my PPC build. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: paulus@samba.org Cc: linuxppc-dev@ozlabs.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-19	cpuset: limit the input of cpuset.sched_relax_domain_level	Li Zefan
	We allow the inputs to be [-1 ... SD_LV_MAX), and return -EINVAL for inputs outside this range. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Acked-by: Paul Jackson <pj@sgi.com> Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-19	Merge branch 'linus' into sched/urgent	Ingo Molnar

2008-06-19	sched: CPU hotplug events must not destroy scheduler domains created by the ↵	Max Krasnyansky
	cpusets First issue is not related to the cpusets. We're simply leaking doms_cur. It's allocated in arch_init_sched_domains() which is called for every hotplug event. So we just keep reallocation doms_cur without freeing it. I introduced free_sched_domains() function that cleans things up. Second issue is that sched domains created by the cpusets are completely destroyed by the CPU hotplug events. For all CPU hotplug events scheduler attaches all CPUs to the NULL domain and then puts them all into the single domain thereby destroying domains created by the cpusets (partition_sched_domains). The solution is simple, when cpusets are enabled scheduler should not create default domain and instead let cpusets do that. Which is exactly what the patch does. Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Cc: pj@sgi.com Cc: menage@google.com Cc: rostedt@goodmis.org Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-19	sched: rt-group: fix RR buglet	Peter Zijlstra
	In tick_task_rt() we first call update_curr_rt() which can dequeue a runqueue due to it running out of runtime, and then we try to requeue it, of it also having exhausted its RR quota. Obviously requeueing something that is no longer on the runqueue will not have the expected result. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Daniel K. <dk@uw.no> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19	sched: rt-group: heirarchy aware throttle	Peter Zijlstra
	The bandwidth throttle code dequeues a group when it runs out of quota, and re-queues it once the period rolls over and the quota gets refreshed. Sadly it failed to take the hierarchy into consideration. Share more of the enqueue/dequeue code with regular task opterations. Also, some operations like sched_setscheduler() can dequeue/enqueue tasks that are in throttled runqueues, we should not inadvertly re-enqueue empty runqueues so check for that. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Daniel K. <dk@uw.no> Signed-off-by: Ingo Molnar <mingo@elte.hu>