aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2007-07-19PM: Integrate beeping flag with existing acpi_sleep flagsPavel Machek
Move "debug during resume from s2ram" into the variable we already use for real-mode flags to simplify code. It also closes nasty trap for the user in acpi_sleep_setup; order of parameters actually mattered there, acpi_sleep=s3_bios,s3_mode doing something different from acpi_sleep=s3_mode,s3_bios. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19PM: Optional beeping during resume from suspend to RAMNigel Cunningham
Add a feature allowing the user to make the system beep during a resume from suspend to RAM, on x86_64 and i386. This is useful for the users with broken resume from RAM, so that they can verify if the control reaches the kernel after a wake-up event. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19PM: Introduce pm_power_off_prepareRafael J. Wysocki
Introduce the pm_power_off_prepare() callback that can be registered by the interested platforms in analogy with pm_idle() and pm_power_off(), used for preparing the system to power off (needed by ACPI). This allows us to drop acpi_sysclass and device_acpi that are only defined in order to register the ACPI power off preparation callback, which is needed by pm_power_off() registered in a much different way. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19ACPI: Do not prepare for hibernation in acpi_shutdownRafael J. Wysocki
Since we are now explicitly calling hibernation_ops->prepare() before hibernation_ops->enter() in hibernation_platform_enter() (defined in kernel/power/disk.c), ACPI should not call acpi_sleep_prepare(ACPI_STATE_S4) from acpi_shutdown(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19PM: Reduce code duplication between main.c and user.cRafael J. Wysocki
The SNAPSHOT_S2RAM ioctl code is outdated and it should not duplicate the suspend code in kernel/power/main.c. Fix that. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19PM: prevent frozen user mode helpers from failing the freezing of tasksRafael J. Wysocki
At present, if a user mode helper is running while usermodehelper_pm_callback() is executed, the helper may be frozen and the completion in call_usermodehelper_exec() won't be completed until user space processes are thawed. As a result, the freezing of kernel threads may fail, which is not desirable. Prevent this from happening by introducing a counter of running user mode helpers and allowing usermodehelper_pm_callback() to succeed for action = PM_HIBERNATION_PREPARE or action = PM_SUSPEND_PREPARE only if there are no helpers running. [Namely, usermodehelper_pm_callback() waits for at most RUNNING_HELPERS_TIMEOUT for the number of running helpers to become zero and fails if that doesn't happen.] Special thanks to Uli Luckas <u.luckas@road.de>, Pavel Machek <pavel@ucw.cz> and Oleg Nesterov <oleg@tv-sign.ru> for reviewing the previous versions of this patch and for very useful comments. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Uli Luckas <u.luckas@road.de> Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19PM: disable usermode helper before hibernation and suspendRafael J. Wysocki
Use a hibernation and suspend notifier to disable the user mode helper before a hibernation/suspend and enable it after the operation. [akpm@linux-foundation.org: build fix] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19PM: introduce hibernation and suspend notifiersRafael J. Wysocki
Make it possible to register hibernation and suspend notifiers, so that subsystems can perform hibernation-related or suspend-related operations that should not be carried out by device drivers' .suspend() and .resume() routines. [akpm@linux-foundation.org: build fixes] [akpm@linux-foundation.org: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19Freezer: remove redundant check in try_to_freeze_tasksRafael J. Wysocki
We don't need to check if todo is positive before calling time_after() in try_to_freeze_tasks(), because if todo is zero at this point, the loop will be broken anyway due to the while () condition being false. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19Freezer: return int from freeze_processesRafael J. Wysocki
Make try_to_freeze_tasks() and freeze_processes() return -EBUSY on failure instead of the number of unfrozen tasks (none of the callers actually uses this number). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19Freezer: use __set_current_state in refrigeratorRafael J. Wysocki
Use __set_current_state() as appropriate in refrigerator() instead of accessing current->state directly. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19Freezer: avoid freezing kernel threads prematurelyRafael J. Wysocki
Kernel threads should not have TIF_FREEZE set when user space processes are being frozen, since otherwise some of them might be frozen prematurely. To prevent this from happening we can (1) make exit_mm() unset TIF_FREEZE unconditionally just after clearing tsk->mm and (2) make try_to_freeze_tasks() check if p->mm is different from zero and PF_BORROWED_MM is unset in p->flags when user space processes are to be frozen. Namely, when user space processes are being frozen, we only should set TIF_FREEZE for tasks that have p->mm different from NULL and don't have PF_BORROWED_MM set in p->flags. For this reason task_lock() must be used to prevent try_to_freeze_tasks() from racing with use_mm()/unuse_mm(), in which p->mm and p->flags.PF_BORROWED_MM are changed under task_lock(p). Also, we need to prevent the following scenario from happening: * daemonize() is called by a task spawned from a user space code path * freezer checks if the task has p->mm set and the result is positive * task enters exit_mm() and clears its TIF_FREEZE * freezer sets TIF_FREEZE for the task * task calls try_to_freeze() and goes to the refrigerator, which is wrong at that point This requires us to acquire task_lock(p) before p->flags.PF_BORROWED_MM and p->mm are examined and release it after TIF_FREEZE is set for p (or it turns out that TIF_FREEZE should not be set). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19Hibernation: prepare to enter the low power stateRafael J. Wysocki
During hibernation we call hibernation_ops->prepare() before creating the image, but then, before saving it, we cancel the power transition by calling hibernation_ops->finish(). Thus prior to calling hibernation_ops->enter() we should let the platform firmware know that we're going to enter the low power state after all. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19swsusp: fix hibernation code orderingRafael J. Wysocki
Change the code ordering so that hibernation_ops->prepare() is called after device_suspend(). This is needed so that we don't violate the ACPI specification, which states that the _PTS and _GTS system-control methods, executed from acpi_sleep_prepare(), ought to be called after devices have been put in low power states. The "Finish" label in hibernation_restore() is moved, because device_suspend() resumes devices if the suspending of them fails and the restore code ordering should reflect the hibernation code ordering. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19swsusp: introduce restore platform operationsRafael J. Wysocki
At least on some machines it is necessary to prepare the ACPI firmware for the restoration of the system memory state from the hibernation image if the "platform" mode of hibernation has been used. Namely, in that cases we need to disable the GPEs before replacing the "boot" kernel with the "frozen" kernel (cf. http://bugzilla.kernel.org/show_bug.cgi?id=7887). After the restore they will be re-enabled by hibernation_ops->finish(), but if the restore fails, they have to be re-enabled by the restore code explicitly. For this purpose we can introduce two additional hibernation operations, called pre_restore() and restore_cleanup() and call them from the restore code path. Still, they should be called if the "platform" mode of hibernation has been used, so we need to pass the information about the hibernation mode from the "frozen" kernel to the "boot" kernel in the image header. Apparently, we can't drop the disabling of GPEs before the restore because of Bug #7887 .  We also can't do it unconditionally, because the GPEs wouldn't have been enabled after a successful restore if the suspend had been done in the 'shutdown' or 'reboot' mode. In principle we could (and probably should) unconditionally disable the GPEs before each snapshot creation *and* before the restore, but then we'd have to unconditionally enable them after the snapshot creation as well as after the restore (or restore failure)   Still, for this purpose we'd need to modify acpi_enter_sleep_state_prep() and acpi_leave_sleep_state() and we'd have to introduce some mechanism synchronizing the disablind/enabling of the GPEs with the device drivers' .suspend()/.resume() routines and with disable_/enable_nonboot_cpus().  However, this would have affected the suspend (ie. s2ram) code as well as the hibernation, which I'd like to avoid in this patch series. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19swsusp: remove code duplication between disk.c and user.cRafael J. Wysocki
Currently, much of the code in kernel/power/disk.c is duplicated in kernel/power/user.c , mainly for historical reasons. By eliminating this code duplication we can reduce the size of user.c quite substantially and remove the maintenance difficulty resulting from it. [bunk@stusta.de: kernel/power/disk.c: make code static] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19swsusp: remove incorrect code from user.cRafael J. Wysocki
In the face of the recent change of suspend code ordering (cf. http://marc.info/?l=linux-acpi&m=117938245931603&w=2) we should also modify the code ordering in swsusp so that hibernation_ops->prepare() is executed after device_suspend(). However, for this purpose it seems reasonable to eliminate the code duplication between kernel/power/disk.c and kernel/power/user.c first. By eliminating it we can reduce the size of user.c quite substantially and remove the maintenance difficulty with making essentially the same changes in two different places. Moreover, we should also remove the calls to "platform" functions from the restore code path, since it doesn't carry out any power transition of the system, but we generally need to disable the GPEs before the restore if the 'platform' hibernation mode has been used. To do this, we can introduce two new hibernation_ops to be used in the restore code. This patch: Make the code hibernation code in kernel/power/user.c be functionally equivalent to the corresponding code in kernel/power/disk.c , as it should be. The calls to the platform functions removed by this patch are incorrect. They should be replaced with some other "platform" invocations that will be introduced in one of the subsequent patches. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19PM: Do not require dev spew to get PM_DEBUGBen Collins
In order to enable things like PM_TRACE, you're required to enable PM_DEBUG, which sends a large spew of messages on boot, and often times can overflow dmesg buffer. Create new PM_VERBOSE and shift that to be the option that enables drivers/base/power's messages. Signed-off-by: Ben Collins <bcollins@ubuntu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19freezer: run show_state() when freezing times outAndrew Morton
To see which tasks are stuck where. Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19only allow nonlinear vmas for ram backed filesystemsMiklos Szeredi
page_mkclean() doesn't re-protect ptes for non-linear mappings, so a later re-dirty through such a mapping will not generate a fault, PG_dirty will not reflect the dirty state and the dirty count will be skewed. This implies that msync() is also currently broken for nonlinear mappings. The easiest solution is to emulate remap_file_pages on non-linear mappings with simple mmap() for non ram-backed filesystems. Applications continue to work (albeit slower), as long as the number of remappings remain below the maximum vma count. However all currently known real uses of non-linear mappings are for ram backed filesystems, which this patch doesn't affect. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19Remove alloc_zeroed_user_highpage()Mel Gorman
alloc_zeroed_user_highpage() has no in-tree users and it is not exported. As it is not exported, it can simply be removed. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19mm: fix clear_page_dirty_for_io vs fault raceNick Piggin
Fix msync data loss and (less importantly) dirty page accounting inaccuracies due to the race remaining in clear_page_dirty_for_io(). The deleted comment explains what the race was, and the added comments explain how it is fixed. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19mm: fault feedback #2Nick Piggin
This patch completes Linus's wish that the fault return codes be made into bit flags, which I agree makes everything nicer. This requires requires all handle_mm_fault callers to be modified (possibly the modifications should go further and do things like fault accounting in handle_mm_fault -- however that would be for another patch). [akpm@linux-foundation.org: fix alpha build] [akpm@linux-foundation.org: fix s390 build] [akpm@linux-foundation.org: fix sparc build] [akpm@linux-foundation.org: fix sparc64 build] [akpm@linux-foundation.org: fix ia64 build] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Still apparently needs some ARM and PPC loving - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19mm: fault feedback #1Nick Piggin
Change ->fault prototype. We now return an int, which contains VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte. FAULT_RET_ code tells the VM whether a page was found, whether it has been locked, and potentially other things. This is not quite the way he wanted it yet, but that's changed in the next patch (which requires changes to arch code). This means we no longer set VM_CAN_INVALIDATE in the vma in order to say that a page is locked which requires filemap_nopage to go away (because we can no longer remain backward compatible without that flag), but we were going to do that anyway. struct fault_data is renamed to struct vm_fault as Linus asked. address is now a void __user * that we should firmly encourage drivers not to use without really good reason. The page is now returned via a page pointer in the vm_fault struct. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19Document ->page_mkwrite() lockingMark Fasheh
There seems to be very little documentation about this callback in general. The locking in particular is a bit tricky, so it's worth having this in writing. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19ocfs2: release page lock before calling ->page_mkwriteMark Fasheh
__do_fault() was calling ->page_mkwrite() with the page lock held, which violates the locking rules for that callback. Release and retake the page lock around the callback to avoid deadlocking file systems which manually take it. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19mm: merge populate and nopage into fault (fixes nonlinear)Nick Piggin
Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19mm: fix fault vs invalidate race for linear mappingsNick Piggin
Fix the race between invalidate_inode_pages and do_no_page. Andrea Arcangeli identified a subtle race between invalidation of pages from pagecache with userspace mappings, and do_no_page. The issue is that invalidation has to shoot down all mappings to the page, before it can be discarded from the pagecache. Between shooting down ptes to a particular page, and actually dropping the struct page from the pagecache, do_no_page from any process might fault on that page and establish a new mapping to the page just before it gets discarded from the pagecache. The most common case where such invalidation is used is in file truncation. This case was catered for by doing a sort of open-coded seqlock between the file's i_size, and its truncate_count. Truncation will decrease i_size, then increment truncate_count before unmapping userspace pages; do_no_page will read truncate_count, then find the page if it is within i_size, and then check truncate_count under the page table lock and back out and retry if it had subsequently been changed (ptl will serialise against unmapping, and ensure a potentially updated truncate_count is actually visible). Complexity and documentation issues aside, the locking protocol fails in the case where we would like to invalidate pagecache inside i_size. do_no_page can come in anytime and filemap_nopage is not aware of the invalidation in progress (as it is when it is outside i_size). The end result is that dangling (->mapping == NULL) pages that appear to be from a particular file may be mapped into userspace with nonsense data. Valid mappings to the same place will see a different page. Andrea implemented two working fixes, one using a real seqlock, another using a page->flags bit. He also proposed using the page lock in do_no_page, but that was initially considered too heavyweight. However, it is not a global or per-file lock, and the page cacheline is modified in do_no_page to increment _count and _mapcount anyway, so a further modification should not be a large performance hit. Scalability is not an issue. This patch implements this latter approach. ->nopage implementations return with the page locked if it is possible for their underlying file to be invalidated (in that case, they must set a special vm_flags bit to indicate so). do_no_page only unlocks the page after setting up the mapping completely. invalidation is excluded because it holds the page lock during invalidation of each page (and ensures that the page is not mapped while holding the lock). This also allows significant simplifications in do_no_page, because we have the page locked in the right place in the pagecache from the start. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-18Merge branch 'isdn-fix' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6 * 'isdn-fix' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6: ISDN HiSax: uninitialized return in hisax_cs_setup
2007-07-18Merge branch 'upstream-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: eHEA: Fix bonding support Blackfin ethernet driver: on chip ethernet MAC controller driver fix wrong argument of tc35815_read_plat_dev_addr() ARM/ETHER3: Handle multicast frames. SAA9730: Handle multicast frames. NI5010: Handle multicast frames. NS83820: Handle multicast frames. Fix RGMII-ID handling in gianfar Fix Vitesse RGMII-ID support Add phy-connection-type to gianfar nodes Fix Vitesse 824x PHY interrupt acking [PATCH] zd1211rw: Add ID for Siemens Gigaset USB Stick 54 [PATCH] zd1211rw: Add ID for Planex GW-US54GXS [PATCH] Update version ipw2200 stamp to 1.2.2 [PATCH] ipw2200: Fix ipw_isr() comments error on shared IRQ [PATCH] Fix ipw2200 set wrong power parameter causing firmware error [PATCH] ipw2100: Fix `iwpriv set_power` error [PATCH] softmac: Channel is listed twice in scan output
2007-07-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (24 commits) [CIFS] merge conflict in fs/cifs/export.c [CIFS] Allow disabling CIFS Unix Extensions as mount option [CIFS] More whitespace/formatting fixes (noticed by checkpatch) [CIFS] Typo in previous patch [CIFS] zero_user_page() conversions [CIFS] use simple_prepare_write to zero page data [CIFS] Fix build break - inet.h not included when experimental ifdef off [CIFS] Add support for new POSIX unlink [CIFS] whitespace/formatting fixes [CIFS] Fix oops in cifs_create when nfsd server exports cifs mount [CIFS] whitespace cleanup [CIFS] Fix packet signatures for NTLMv2 case [CIFS] more whitespace fixes [CIFS] more whitespace cleanup [CIFS] whitespace cleanup [CIFS] whitespace cleanup [CIFS] ipv6 support no longer experimental [CIFS] Mount should fail if server signing off but client mount option requires it [CIFS] whitespace fixes [CIFS] Fix sign mount option and sign proc config setting ...
2007-07-18Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/docs-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/docs-2.6: zh_CN/HOWTO: update URLs of git trees Chinese translation of Documentation/stable_api_nonsense.txt HOWTO: add Chinese translation of Documentation/HOWTO Documentation: add Japanese translated stable_api_nonsense.txt HOWTO: add Japanese translation of Documentation/HOWTO
2007-07-18Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: sysfs: cosmetic clean up on node creation failure paths sysfs: kill an extra put in sysfs_create_link() failure path Driver core: check return code of sysfs_create_link() HOWTO: Add the knwon_regression URI to the documentation dev_vdbg() documentation dev_vdbg(), available with -DVERBOSE_DEBUG sysfs: make sysfs_init_inode() static sysfs: fix sysfs root inode nlink accounting Documentation fix devres.txt: lib/iomap.c -> lib/devres.c sysfs: avoid kmem_cache_free(NULL) PM: remove deprecated dpm_runtime_* routines PM: Remove deprecated sysfs files Driver core: accept all valid action-strings in uevent-trigger debugfs: remove rmdir() non-empty complaint
2007-07-18Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/uio-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/uio-2.6: UIO: Hilscher CIF card driver UIO: Documentation UIO: Add the User IO core code
2007-07-18Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-linus' of git://linux-nfs.org/~bfields/linux: locks: fix vfs_test_lock() comment locks: make posix_test_lock() interface more consistent nfs: disable leases over NFS gfs2: stop giving out non-cluster-coherent leases locks: export setlease to filesystems locks: provide a file lease method enabling cluster-coherent leases locks: rename lease functions to reflect locks.c conventions locks: share more common lease code locks: clean up lease_alloc() locks: convert an -EINVAL return to a BUG leases: minor break_lease() comment clarification
2007-07-18Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: (29 commits) IB/mthca: Simplify use of size0 in work request posting IB/mthca: Factor out setting WQE UD segment entries IB/mthca: Factor out setting WQE remote address and atomic segment entries IB/mlx4: Factor out setting other WQE segments IB/mlx4: Factor out setting WQE data segment entries IB/mthca: Factor out setting WQE data segment entries IB/mlx4: Return receive queue sizes for userspace QPs from query QP IB/mlx4: Increase max outstanding RDMA reads as target RDMA/cma: Remove local write permission from QP access flags IB/mthca: Use uninitialized_var() for f0 IB/cm: Make internal function cm_get_ack_delay() static IB/ipath: Remove ipath_get_user_pages_nocopy() IB/ipath: Make a few functions static mlx4_core: Reset device when internal error is detected IB/iser: Make a couple of functions static IB/mthca: Fix printk format used for firmware version in warning IB/mthca: Schedule MSI support for removal IB/ehca: Fix warnings issued by checkpatch.pl IB/ehca: Restructure ehca_set_pagebuf() IB/ehca: MR/MW structure refactoring ...
2007-07-19Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6Steve French
Conflicts: fs/cifs/export.c
2007-07-19[CIFS] merge conflict in fs/cifs/export.cSteve French
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-07-18[CIFS] Allow disabling CIFS Unix Extensions as mount optionSteve French
Previously the only way to do this was to umount all mounts to that server, turn off a proc setting (/proc/fs/cifs/LinuxExtensionsEnabled). Fixes Samba bugzilla bug number: 4582 (and also 2008) Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-07-18locks: fix vfs_test_lock() commentJ. Bruce Fields
Thanks to Doug Chapman for pointing out that the comment here is inconsistent with the function prototype. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-07-18locks: make posix_test_lock() interface more consistentJ. Bruce Fields
Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or presence of a conflict lock by setting fl_type to, respectively, F_UNLCK or something other than F_UNLCK, the return value is no longer needed. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-07-18nfs: disable leases over NFSJ. Bruce Fields
As Peter Staubach says elsewhere (http://marc.info/?l=linux-kernel&m=118113649526444&w=2): > The problem is that some file system such as NFSv2 and NFSv3 do > not have sufficient support to be able to support leases correctly. > In particular for these two file systems, there is no over the wire > protocol support. > > Currently, these two file systems fail the fcntl(F_SETLEASE) call > accidentally, due to a reference counting difference. These file > systems should fail more consciously, with a proper error to > indicate that the call is invalid for them. Define an nfs setlease method that just returns -EINVAL. If someone can demonstrate a real need, perhaps we could reenable them in the presence of the "nolock" mount option. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: Peter Staubach <staubach@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-18gfs2: stop giving out non-cluster-coherent leasesMarc Eshel
Since gfs2 can't prevent conflicting opens or leases on other nodes, we probably shouldn't allow it to give out leases at all. Put the newly defined lease operation into use in gfs2 by turning off lease, unless we're using the "nolock' locking module (in which case all locking is local anyway). Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Steven Whitehouse <swhiteho@redhat.com>
2007-07-18locks: export setlease to filesystemsJ. Bruce Fields
Export setlease so it can used by filesystems to implement their lease methods. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-07-18locks: provide a file lease method enabling cluster-coherent leasesJ. Bruce Fields
Currently leases are only kept locally, so there's no way for a distributed filesystem to enforce them against multiple clients. We're particularly interested in the case of nfsd exporting a cluster filesystem, in which case nfsd needs cluster-coherent leases in order to implement delegations correctly. Also add some documentation. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-07-18locks: rename lease functions to reflect locks.c conventionsJ. Bruce Fields
We've been using the convention that vfs_foo is the function that calls a filesystem-specific foo method if it exists, or falls back on a generic method if it doesn't; thus vfs_foo is what is called when some other part of the kernel (normally lockd or nfsd) wants to get a lock, whereas foo is what filesystems call to use the underlying local functionality as part of their lock implementation. So rename setlease to vfs_setlease (which will call a filesystem-specific setlease after a later patch) and __setlease to setlease. Also, vfs_setlease need only be GPL-exported as long as it's only needed by lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-07-18locks: share more common lease codeJ. Bruce Fields
Share more code between setlease (used by nfsd) and fcntl. Also some minor cleanup. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Christoph Hellwig <hch@infradead.org>
2007-07-18locks: clean up lease_alloc()J. Bruce Fields
Return the newly allocated structure as the return value instead of using a struct ** parameter. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-07-18locks: convert an -EINVAL return to a BUGJ. Bruce Fields
There's no point trying to return an error in these cases, which all represent bugs in the callers. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-07-18leases: minor break_lease() comment clarificationdavid m. richter
clarify that break_lease() checks for presence of any lock, not just leases. Signed-off-by: David M. Richter <richterd@citi.umich.edu> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>