aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2005-11-23[PATCH] kprobes: Fix return probes on sys_execveJim Keniston
Fix a bug in kprobes that can cause an Oops or even a crash when a return probe is installed on one of the following functions: sys_execve, do_execve, load_*_binary, flush_old_exec, or flush_thread. The fix is to remove the call to kprobe_flush_task() in flush_thread(). This fix has been tested on all architectures for which the return-probes feature has been implemented (i386, x86_64, ppc64, ia64). Please apply. BACKGROUND Up to now, we have called kprobe_flush_task() under two situations: when a task exits, and when it execs. Flushing kretprobe_instances on exit is correct because (a) do_exit() doesn't return, and (b) one or more return-probed functions may be active when a task calls do_exit(). Neither is the case for sys_execve() and its callees. Initially, the mistaken call to kprobe_flush_task() on exec was harmless because we put the "real" return address of each active probed function back in the stack, just to be safe, when we recycled its kretprobe_instance. When support for ppc64 and ia64 was added, this safety measure couldn't be employed, and was eventually dropped even for i386 and x86_64. sys_execve() and its callees were informally blacklisted for return probes until this fix was developed. Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23[PATCH] mm: fill arch atomic64 gapsHugh Dickins
alpha, sparc64, x86_64 are each missing some primitives from their atomic64 support: fill in the gaps I've noticed by extrapolating asm, follow the groupings in each file. But powerpc and parisc still lack atomic64. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <ak@muc.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23[PATCH] mm: powerpc init_mm without ptlockHugh Dickins
Restore an earlier mod which went missing in the powerpc reshuffle: the 4xx mmu_mapin_ram does not need to take init_mm.page_table_lock. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23[PATCH] mm: powerpc ptlock commentsHugh Dickins
Update comments (only) on page_table_lock and mmap_sem in arch/powerpc. Removed the comment on page_table_lock from hash_huge_page: since it's no longer taking page_table_lock itself, it's irrelevant whether others are; but how it is safe (even against huge file truncation?) I can't say. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23[PATCH] mm: unbloat get_futex_keyHugh Dickins
The follow_page changes in get_futex_key have left it with two almost identical blocks, when handling the rare case of a futex in a nonlinear vma. get_user_pages will itself do that follow_page, and its additional find_extend_vma is hardly any overhead since the vma is already cached. Let's just delete the follow_page block and let get_user_pages do it. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23[PATCH] mm: update split ptlock KconfigHugh Dickins
Closer attention to the arithmetic shows that neither ppc64 nor sparc really uses one page for multiple page tables: how on earth could they, while pte_alloc_one returns just a struct page pointer, with no offset? Well, arm26 manages it by returning a pte_t pointer cast to a struct page pointer, harumph, then compensating in its pmd_populate. But arm26 is never SMP, so it's not a problem for split ptlock either. And the PA-RISC situation has been recently improved: CONFIG_PA20 works without the 16-byte alignment which inflated its spinlock_t. But the current union of spinlock_t with private does make the 7xxx struct page significantly larger, even without debug, so disable its split ptlock. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23[PATCH] revert floppy-fix-read-only-handlingAndrew Morton
This fix causes problems on the very first floppy access - we haven't yet talked to the FDC so we don't know which state the write-protect tab is in. Revert for now. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23[PATCH] Check the irq number is within boundsMatthew Wilcox
Most of the functions already check. Do the ones that didn't. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23Revert "[NET]: Shut up warnings in net/core/flow.c"Linus Torvalds
This reverts commit af2b4079ab154bd12e8c12b02db5f31b31babe63 Changing the #define to an inline function breaks on non-SMP builds, since wuite a few places in the kernel do not implement the ipi handler when compiling for UP. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22compat-ioctl.c: fix compile with no CONFIG_JBDLinus Torvalds
The ext3 compat-ioctl translation wants to translate data structures that <linux/jbd.h> only declared when CONFIG_JBD was enabled. So make <linux/jbd.h> play nicely even when we don't actually end up using it. Acked-by: Andrew Morton <akpm@osdl.org> Acked-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu> Acked-by: Zan Lynx <zlynx@acm.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22Fix up GFP_ZONEMASK for GFP_DMA32 usageLinus Torvalds
There was some confusion about the different zone usage, this should fix up the resulting mess in the GFP zonemask handling. The different zone usage is still confusing (it's very easy to mix up the individual zone numbers with the GFP zone _list_ numbers), so we might want to clean up some of this in the future, but in the meantime this should fix the actual problems. Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds
2005-11-22[SPARC]: drivers/sbus/char/aurora.c: "extern inline" -> "static inline"Adrian Bunk
"extern inline" doesn't make much sense. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-22[NET]: Fix ifenslave to not fail on lack of IP informationNeil Horman
Patch to ifenslave so that under older ABI versions, a failure to propogate ip information from master to slave does not result in a filure to enslave the slave device. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-22[NETFILTER] ctnetlink: Fix refcount leak ip_conntrack/nat_protoPablo Neira Ayuso
Remove proto == NULL checking since ip_conntrack_[nat_]proto_find_get always returns a valid pointer. Fix missing ip_conntrack_proto_put in some paths. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-22[IPV4]: Fix secondary IP addresses after promotionJamal Hadi Salim
This patch fixes the problem with promoting aliases when: a) a single primary and > 1 secondary addresses b) multiple primary addresses each with at least one secondary address Based on earlier efforts from Brian Pomerantz <bapper@piratehaven.org>, Patrick McHardy <kaber@trash.net> and Thomas Graf <tgraf@suug.ch> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-22[NETLINK]: Use tgid instead of pid for nlmsg_pidHerbert Xu
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-22[NET]: Shut up warnings in net/core/flow.cRussell King
Not really a network problem, more a !SMP issue. net/core/flow.c:295: warning: statement with no effect flow.c:295: smp_call_function(flow_cache_flush_per_cpu, &info, 1, 0); Fix this by converting the macro to an inline function, which also increases the typechecking for !SMP builds. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-22[PATCH] prefer pkg-config for the QT checkRoman Zippel
This makes pkg-config now the prefered way to configure QT and properly fixes the recent Fedora breakage and leaves the old QT detection as fallback mechanism. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] device-mapper raid1: drop mark_region spinlock fixJonathan E Brassow
The spinlock region_lock is held while calling mark_region which can sleep. Drop the spinlock before calling that function. A region's state and inclusion in the clean list are altered by rh_inc and rh_dec. The state variable is set to RH_CLEAN in rh_dec, but only if 'pending' is zero. It is set to RH_DIRTY in rh_inc, but not if it is already so. The changes to 'pending', the state, and the region's inclusion in the clean list need to be atomicly. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] device-mapper snapshot: bio_list fixjblunck@suse.de
bio_list_merge() should do nothing if the second list is empty - not oops. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] device-mapper dm-mpath: endio spinlock fixStefan Bader
do_end_io() can be called without interrupts blocked. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] device-mapper: mirror log bitset fixAlasdair G Kergon
The linux bitset operators (test_bit, set_bit etc) work on arrays of "unsigned long". dm-log uses such bitsets but treats them as arrays of uint32_t, only allocating and zeroing a multiple of 4 bytes (as 'clean_bits' is a uint32_t). The patch below fixes this problem. The problem is specific to 64-bit big endian machines such as s390x or ppc-64 and can prevent pvmove terminating. In the simplest case, if "region_count" were (say) 30, then bitset_size (below) would be 4 and bitset_uint32_count would be 1. Thus the memory for this butset, after allocation and zeroing would be 0 0 0 0 X X X X On a bigendian 64bit machine, bit 0 for this bitset is in the 8th byte! (and every bit that dm-log would use would be in the X area). 0 0 0 0 X X X X ^ here which hasn't been cleared properly. As the dm-raid1 code only syncs and counts regions which have a 0 in the 'sync_bits' bitset, and only finishes when it has counted high enough, a large number of 1's among those 'X's will cause the sync to not complete. It is worth noting that the code uses the same bitsets for in-memory and on-disk logs. As these bitsets are host-endian and host-sized, this means that they cannot safely be moved between computers with Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] device-mapper: list_versions fixAlasdair G Kergon
In some circumstances the LIST_VERSIONS output is truncated because the size calculation forgets about a 'uint32_t' in each structure - but the inclusion of the whole of ALIGN_MASK frequently compensates for the omission. This is a quick workaround to use an upper bound. (The code ought to be fixed to supply the actual size.) Running 'dmsetup targets' may demonstrate the problem: when I run it, the last line comes out as 'erro' instead of 'error'. Consequently, 'lvcreate --type error' doesn't work. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] device-mapper dm-ioctl: missing put in table load error caseKiyoshi Ueda
An error path in table_load() forgets to release a table that won't now be referenced. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] kernel Doc/ URL correctionsRandy Dunlap
Correct lots of URLs in Documentation/ Also a few minor whitespace cleanups and typo/spello fixes. Sadly there are still a lot of bad URLs remaining. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] dell_rbu driver depends on x86[64]Dave Jones
This driver only appears on IA32 & EM64T boxes. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] Fix a bug in scsi_get_commandMatthew Dobson
scsi_get_command() attempts to write into a structure that may not have been successfully allocated. Move this write inside the if statement that ensures we won't panic the kernel with a NULL pointer dereference. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] cpufreq: silence cpufreq for UPGrant Coady
drivers/cpufreq/cpufreq.c: In function `cpufreq_remove_dev': drivers/cpufreq/cpufreq.c:696: warning: unused variable `cpu_sys_dev' Signed-off-by: Grant Coady <gcoady@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] hugetlb: fix race in set_max_huge_pages for multiple updaters of ↵Eric Paris
nr_huge_pages If there are multiple updaters to /proc/sys/vm/nr_hugepages simultaneously it is possible for the nr_huge_pages variable to become incorrect. There is no locking in the set_max_huge_pages function around alloc_fresh_huge_page which is able to update nr_huge_pages. Two callers to alloc_fresh_huge_page could race against each other as could a call to alloc_fresh_huge_page and a call to update_and_free_page. This patch just expands the area covered by the hugetlb_lock to cover the call into alloc_fresh_huge_page. I'm not sure how we could say that a sysctl section is performance critical where more specific locking would be needed. My reproducer was to run a couple copies of the following script simultaneously while [ true ]; do echo 1000 > /proc/sys/vm/nr_hugepages echo 500 > /proc/sys/vm/nr_hugepages echo 750 > /proc/sys/vm/nr_hugepages echo 100 > /proc/sys/vm/nr_hugepages echo 0 > /proc/sys/vm/nr_hugepages done and then watch /proc/meminfo and eventually you will see things like HugePages_Total: 100 HugePages_Free: 109 After applying the patch all seemed well. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] vgacon: Fix usage of stale height value on vc initializationAntonino A. Daplas
Reported by: Wayne E. Harlan "[1.] One line summary of the problem: When the kernel option "vga=1" is used, additional tty's (alt+control+Fx with x=2,3,4,5, etc) do not provide the full 50 lines of output. The first one does have 50 lines, however. [2.] Full description of the problem/report: These addtitional tty's show only 39 lines plus the top pixel of the 40-th line. The remaining lines are black and not shown. Kernel version 2.6.13.4 does not show this problem." This bug is caused by using a stale font height value on vgacon_init. Booting with vga=1 gives an 80x50 screen with an 8x8 font. Somewhere during the initialization, the font was changed to 8x9 and the first vc was correctly resized to 80x44. However, the rest of the vc's were not allocated yet, and when they were subsequently initialized, they still used a font height of 8 (instead of 9) causing the mentioned bug. Fix by saving the new font height to vga_video_font_height. Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] fbcon: Console Rotation - Fix wrong shift calculationAntonino A. Daplas
The shift value (amount to shift the bitmap so first pixel starts at origin(0,0)) is incorrect. This causes corrupted characters or a kernel crash if fontwidth is not divisible by 8 at 270 degrees, or fontheight not divisible by 8 at 180 degrees. Report and part of the fix contributed by Knut Petersen. Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] Fix hugetlbfs_statfs() reporting of block limitsDavid Gibson
Currently, if a hugetlbfs is mounted without limits (the default), statfs() will return -1 for max/free/used blocks. This does not appear to be in line with normal convention: simple_statfs() and shmem_statfs() both return 0 in similar cases. Worse, it confuses the translation logic in put_compat_statfs(), causing it to return -EOVERFLOW on such a mount. This patch alters hugetlbfs_statfs() to return 0 for max/free/used blocks on a mount without limits. Note that we need the test in the patch below, rather than just using 0 in the sbinfo structure, because the -1 marked in the free blocks field is used internally to tell the Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] Fix error handling with put_compat_statfs()David Gibson
In fs/compat.c, whenever put_compat_statfs() returns an error, the containing syscall returns -EFAULT. This is presumably by analogy with the non-compat case, where any non-zero code from copy_to_user() should be translated into an EFAULT. However, put_compat_statfs() is also return -EOVERFLOW. The same applies for put_compat_statfs64(). This bug can be observed with a statfs() on a hugetlbfs directory. hugetlbfs, when mounted without limits reports available, free and total blocks as -1 (itself a bug, another patch coming). statfs() will mysteriously return EFAULT although it's parameters are perfectly valid addresses. This patch causes the compat versions of statfs() and statfs64() to correctly propogate the return values from put_compat_statfs() and put_compat_statfs64(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: fix sound Bad page statesHugh Dickins
Earlier I unifdefed PageCompound, so that snd_pcm_mmap_control_nopage and others can give out a 0-order component of a higher-order page, which won't be mistakenly freed when zap_pte_range unmaps it. But many Bad page states reported a PG_reserved was freed after all: I had missed that we need to say __GFP_COMP to get compound page behaviour. Some of these higher-order pages are allocated by snd_malloc_pages, some by snd_malloc_dev_pages; or if SBUS, by sbus_alloc_consistent - but that has no gfp arg, so add __GFP_COMP into its sparc32/64 implementations. I'm still rather puzzled that DRM seems not to need a similar change. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: copy_page_range vmaHugh Dickins
For copy_one_pte's print_bad_pte to show the task correctly (instead of "???"), dup_mmap must pass down parent vma rather than child vma. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: PG_reserved bad_pageHugh Dickins
It used to be the case that PG_reserved pages were silently never freed, but in 2.6.15-rc1 they may be freed with a "Bad page state" message. We should work through such cases as they appear, fixing the code; but for now it's safer to issue the message without freeing the page, leaving PG_reserved set. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: ZERO_PAGE in VM_UNPAGEDHugh Dickins
It's strange enough to be looking out for anonymous pages in VM_UNPAGED areas, let's not insert the ZERO_PAGE there - though whether it would matter will depend on what we decide about ZERO_PAGE refcounting. But whereas do_anonymous_page may (exceptionally) be called on a VM_UNPAGED area, do_no_page should never be: just BUG_ON. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: anon in VM_UNPAGEDHugh Dickins
copy_one_pte needs to copy the anonymous COWed pages in a VM_UNPAGED area, zap_pte_range needs to free them, do_wp_page needs to COW them: just like ordinary pages, not like the unpaged. But recognizing them is a little subtle: because PageReserved is no longer a condition for remap_pfn_range, we can now mmap all of /dev/mem (whether the distro permits, and whether it's advisable on this or that architecture, is another matter). So if we can see a PageAnon, it may not be ours to mess with (or may be ours from elsewhere in the address space). I suspect there's an entertaining insoluble self-referential problem here, but the page_is_anon function does a good practical job, and MAP_PRIVATE PROT_WRITE VM_UNPAGED will always be an odd choice. In updating the comment on page_address_in_vma, noticed a potential NULL dereference, in a path we don't actually take, but fixed it. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: COW on VM_UNPAGEDHugh Dickins
Remove the BUG_ON(vma->vm_flags & VM_UNPAGED) from do_wp_page, and let it do Copy-On-Write without touching the VM_UNPAGED's page counts - but this is incomplete, because the anonymous page it inserts will itself need to be handled, here and in other functions - next patch. We still don't copy the page if the pfn is invalid, because the copy_user_highpage interface does not allow it. But that's not been a problem in the past: can be added in later if the need arises. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: VM_NONLINEAR VM_RESERVEDHugh Dickins
There's one peculiar use of VM_RESERVED which the previous patch left behind: because VM_NONLINEAR's try_to_unmap_cluster uses vm_private_data as a swapout cursor, but should never meet VM_RESERVED vmas, it was a way of extending VM_NONLINEAR to VM_RESERVED vmas using vm_private_data for some other purpose. But that's an empty set - they don't have the populate function required. So just throw away those VM_RESERVED tests. But one more interesting in rmap.c has to go too: try_to_unmap_one will want to swap out an anonymous page from VM_RESERVED or VM_UNPAGED area. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: VM_UNPAGEDHugh Dickins
Although we tend to associate VM_RESERVED with remap_pfn_range, quite a few drivers set VM_RESERVED on areas which are then populated by nopage. The PageReserved removal in 2.6.15-rc1 changed VM_RESERVED not to free pages in zap_pte_range, without changing those drivers not to set it: so their pages just leak away. Let's not change miscellaneous drivers now: introduce VM_UNPAGED at the core, to flag the special areas where the ptes may have no struct page, or if they have then it's not to be touched. Replace most instances of VM_RESERVED in core mm by VM_UNPAGED. Force it on in remap_pfn_range, and the sparc and sparc64 io_remap_pfn_range. Revert addition of VM_RESERVED to powerpc vdso, it's not needed there. Is it needed anywhere? It still governs the mm->reserved_vm statistic, and special vmas not to be merged, and areas not to be core dumped; but could probably be eliminated later (the drivers are probably specifying it because in 2.4 it kept swapout off the vma, but in 2.6 we work from the LRU, which these pages don't get on). Use the VM_SHM slot for VM_UNPAGED, and define VM_SHM to 0: it serves no purpose whatsoever, and should be removed from drivers when we clean up. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: unifdefed PageCompoundHugh Dickins
It looks like snd_xxx is not the only nopage to be using PageReserved as a way of holding a high-order page together: which no longer works, but is masked by our failure to free from VM_RESERVED areas. We cannot fix that bug without first substituting another way to hold the high-order page together, while farming out the 0-order pages from within it. That's just what PageCompound is designed for, but it's been kept under CONFIG_HUGETLB_PAGE. Remove the #ifdefs: which saves some space (out- of-line put_page), doesn't slow down what most needs to be fast (already using hugetlb), and unifies the way we handle high-order pages. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: sound nopage get_pageHugh Dickins
Something noticed when studying use of VM_RESERVED in different drivers: snd_usX2Y_hwdep_pcm_vm_nopage omitted to get_page: fixed. And how did this work before? Aargh! That nopage is returning a page from within a buffer allocated by snd_malloc_pages, which allocates a high-order page, then does SetPageReserved on each 0-order page within. That would have worked in 2.6.14, because when the area was unmapped, PageReserved inhibited put_page. 2.6.15-rc1 removed that inhibition (while leaving ineffective PageReserveds around for now), but it hasn't caused trouble because.. we've not been freeing from VM_RESERVED at all. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: private write VM_RESERVEDHugh Dickins
The PageReserved removal in 2.6.15-rc1 issued a "deprecated" message when you tried to mmap or mprotect MAP_PRIVATE PROT_WRITE a VM_RESERVED, and failed with -EACCES: because do_wp_page lacks the refinement to COW pages in those areas, nor do we expect to find anonymous pages in them; and it seemed just bloat to add code for handling such a peculiar case. But immediately it caused vbetool and ddcprobe (using lrmi) to fail. So revert the "deprecated" messages, letting mmap and mprotect succeed. But leave do_wp_page's BUG_ON(vma->vm_flags & VM_RESERVED) in place until we've added the code to do it right: so this particular patch is only good if the app doesn't really need to write to that private area. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] unpaged: get_user_pages VM_RESERVEDHugh Dickins
The PageReserved removal in 2.6.15-rc1 prohibited get_user_pages on the areas flagged VM_RESERVED in place of PageReserved. That is correct in theory - we ought not to interfere with struct pages in such a reserved area; but in practice it broke BTTV for one. So revert to prohibiting only on VM_IO: if someone gets into trouble with get_user_pages on VM_RESERVED, it'll just be a "don't do that". You can argue that videobuf_mmap_mapper shouldn't set VM_RESERVED in the first place, but now's not the time for breaking drivers without notice. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] uml: eliminate use of libc PAGE_SIZEJeff Dike
On some systems, libc PAGE_SIZE calls getpagesize, which can't happen from a stub. So, I use UM_KERN_PAGE_SIZE, which is less variable in its definition, instead. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] uml: properly invoke x86_64 system callsJeff Dike
This patch makes stub_segv use the stub_syscall macros. This was needed anyway, but the bug that prompted this was the discovery that gcc was storing stuff in RCX, which is trashed across a system call. This is exactly the sort of problem that the new macros fix. There is a stub_syscall0 for getpid. stub_segv was changed to be a libc file, and that caused some include changes. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] uml: eliminate anonymous union and clean up symlink lossageJeff Dike
This gives a name to the anonymous union introduced in skas-hold-own-ldt, allowing to build on a wider range of gccs. It also removes ldt.h, which somehow became real, and replaces it with a symlink, and creates ldt-x86_64.h as a copy of ldt-i386.h for now. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] uml: eliminate use of local in clone stubJeff Dike
We have a bug in the i386 stub_syscall6 which pushes ebp before the system call and pops it afterwards. Because we use syscall6 to remap the stack, the old contents of the stack (and the former value of ebp) are no longer available. Some versions of gcc make from a real local, accessed through ebp, despite my efforts to make it obvious that references to from are really constants. This patch attempts to make it even more obvious by eliminating from and using a macro to access the stub's data explicitly with constants. My original thinking on this was to replace syscall6 with a remap_stack interface which saved ebp someplace and restored it afterwards. The problem is that there are no registers to put it in, except for esp. That could work, since we can store a constant in esp after the mmap because we just replaced the stack. However, this approach seems a tad cleaner. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>