Age | Commit message (Collapse) | Author |
|
We can't just call xfs_log_unmount_dealloc on any failure because the
ail thread which is torn down by xfs_log_unmount_dealloc might not
be initialized yet.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Reported-by: Lachlan McIlroy <lachlan@sgi.com>
|
|
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
|
|
Currently we call from the nicely abstracted linux quotaops into a ugly
multiplexer just to split the calls out at the same boundary again.
Rewrite the quota ops handling to remove that obfucation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
|
|
Get rid of various obsfucating wrappers for accessing the quota hash lock,
we only keep the accessors for accessing the mplist and freelist locks as
they encode a multi-level datastructure walk. But make sure all of them
are defined in the same way as simple macros.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
|
|
Now that we have a helper to test if a mutex is held use it instead of our
own little hacks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
|
|
Remove these macros which only obsfucated the code in rather nast ways.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
|
|
xfs_create and xfs_mkdir only have minor differences, so merge both of them
into a sigle function. While we're at it also make the error handling code
more straight-forward.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Dave Chinner <david@fromorbit.com>
|
|
Just another set of types obsfucating the code, remove them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
|
|
xfs_ialloc_btree.h has a a cuple of macros that only obsfucate the code
but don't provide any abstraction benefits. This patches removes those
and cleans up the reamaining defintions up a little.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
|
|
Our default has been to always use 8 32KB log buffers for a while now, so
remove the special casing for larger block size filesystem to use the same
or even lower number of buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
|
|
The XFS_QMOPT_DQLOCK flag introduces major complexity in the quota subsystem
but isn't actually used anywhere. So remove it and all the hazzles it
introduces.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
|
|
Remove the superflous igrab by keeping a reference on the path/file all the
time and clean up various bits of surrounding code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
|
|
Rename the async_*_special() functions to async_*_domain(), which
describes the purpose of these functions much better.
[Broke up long lines to silence checkpatch]
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
CRED: Fix SUID exec regression
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (37 commits)
Btrfs: Make sure dir is non-null before doing S_ISGID checks
Btrfs: Fix memory leak in cache_drop_leaf_ref
Btrfs: don't return congestion in write_cache_pages as often
Btrfs: Only prep for btree deletion balances when nodes are mostly empty
Btrfs: fix btrfs_unlock_up_safe to walk the entire path
Btrfs: change btrfs_del_leaf to drop locks earlier
Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inode
Btrfs: Don't try to compress pages past i_size
Btrfs: join the transaction in __btrfs_setxattr
Btrfs: Handle SGID bit when creating inodes
Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks
Btrfs: Change btree locking to use explicit blocking points
Btrfs: hash_lock is no longer needed
Btrfs: disable leak debugging checks in extent_io.c
Btrfs: sort references by byte number during btrfs_inc_ref
Btrfs: async threads should try harder to find work
Btrfs: selinux support
Btrfs: make btrfs acls selectable
Btrfs: Catch missed bios in the async bio submission thread
Btrfs: fix readdir on 32 bit machines
...
|
|
The addition of filename encryption caused a regression in unencrypted
filename symlink support. ecryptfs_copy_filename() is used when dealing
with unencrypted filenames and it reported that the new, copied filename
was a character longer than it should have been.
This caused the return value of readlink() to count the NULL byte of the
symlink target. Most applications don't care about the extra NULL byte,
but a version control system (bzr) helped in discovering the bug.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland
* 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
elf core dump: fix get_user use
|
|
The elf_core_dump() code does its work with set_fs(KERNEL_DS) in force,
so vma_dump_size() needs to switch back with set_fs(USER_DS) to safely
use get_user() for a normal user-space address.
Checking for VM_READ optimizes out the case where get_user() would fail
anyway. The vm_file check here was already superfluous given the control
flow earlier in the function, so that is a cleanup/optimization unrelated
to other changes but an obvious and trivial one.
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
|
|
The patch:
commit a6f76f23d297f70e2a6b3ec607f7aeeea9e37e8d
CRED: Make execve() take advantage of copy-on-write credentials
moved the place in which the 'safeness' of a SUID/SGID exec was performed to
before de_thread() was called. This means that LSM_UNSAFE_SHARE is now
calculated incorrectly. This flag is set if any of the usage counts for
fs_struct, files_struct and sighand_struct are greater than 1 at the time the
determination is made. All of which are true for threads created by the
pthread library.
However, since we wish to make the security calculation before irrevocably
damaging the process so that we can return it an error code in the case where
we decide we want to reject the exec request on this basis, we have to make the
determination before calling de_thread().
So, instead, we count up the number of threads (CLONE_THREAD) that are sharing
our fs_struct (CLONE_FS), files_struct (CLONE_FILES) and sighand_structs
(CLONE_SIGHAND/CLONE_THREAD) with us. These will be killed by de_thread() and
so can be discounted by check_unsafe_exec().
We do have to be careful because CLONE_THREAD does not imply FS or FILES.
We _assume_ that there will be no extra references to these structs held by the
threads we're going to kill.
This can be tested with the attached pair of programs. Build the two programs
using the Makefile supplied, and run ./test1 as a non-root user. If
successful, you should see something like:
[dhowells@andromeda tmp]$ ./test1
--TEST1--
uid=4043, euid=4043 suid=4043
exec ./test2
--TEST2--
uid=4043, euid=0 suid=0
SUCCESS - Correct effective user ID
and if unsuccessful, something like:
[dhowells@andromeda tmp]$ ./test1
--TEST1--
uid=4043, euid=4043 suid=4043
exec ./test2
--TEST2--
uid=4043, euid=4043 suid=4043
ERROR - Incorrect effective user ID!
The non-root user ID you see will depend on the user you run as.
[test1.c]
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
static void *thread_func(void *arg)
{
while (1) {}
}
int main(int argc, char **argv)
{
pthread_t tid;
uid_t uid, euid, suid;
printf("--TEST1--\n");
getresuid(&uid, &euid, &suid);
printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid);
if (pthread_create(&tid, NULL, thread_func, NULL) < 0) {
perror("pthread_create");
exit(1);
}
printf("exec ./test2\n");
execlp("./test2", "test2", NULL);
perror("./test2");
_exit(1);
}
[test2.c]
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char **argv)
{
uid_t uid, euid, suid;
getresuid(&uid, &euid, &suid);
printf("--TEST2--\n");
printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid);
if (euid != 0) {
fprintf(stderr, "ERROR - Incorrect effective user ID!\n");
exit(1);
}
printf("SUCCESS - Correct effective user ID\n");
exit(0);
}
[Makefile]
CFLAGS = -D_GNU_SOURCE -Wall -Werror -Wunused
all: test1 test2
test1: test1.c
gcc $(CFLAGS) -o test1 test1.c -lpthread
test2: test2.c
gcc $(CFLAGS) -o test2 test2.c
sudo chown root.root test2
sudo chmod +s test2
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: David Smith <dsmith@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
|
|
This is a modification of a patch by Bill Pemberton <wfp5p@virginia.edu>
nobh_write_end() could call attach_nobh_buffers() with head == NULL.
This would result in a trap when attach_nobh_buffers() attempted to
access bh->b_this_page.
This can be illustrated by running the writev01 testcase from LTP on jfs.
This error was introduced by commit 5b41e74a "vfs: fix data leak in
nobh_write_end()". That patch did not take into account that if
PageMappedToDisk() is true upon entry to nobh_write_begin(), then no
buffers will be allocated for the page. In that case, we won't have to
worry about a failed write leaving unitialized data in the page.
Of course, head != NULL implies !page_has_buffers(page), so no need to
test both.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Dmitri Monakhov <dmonakhov@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The S_ISGID check in btrfs_new_inode caused an oops during subvol creation
because sometimes the dir is null.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
... and yes, gcc is insane enough to eat that without complaint.
We probably want sparse to scream on those...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
Revert "configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()"
|
|
lseek() further than length of the file will leave stale ->index
(second-to-last during iteration). Next seq_read() will not notice
that ->f_pos is big enough to return 0, but will print last item
as if ->f_pos is pointing to it.
Introduced in commit cb510b8172602a66467f3551b4be1911f5a7c8c2
aka "seq_file: more atomicity in traverse()".
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In 2.6.25 some /proc files were converted to use the seq_file
infrastructure. But seq_files do not correctly support pread(), which
broke some usersapce applications.
To handle pread correctly we can't assume that f_pos is where we left it
in seq_read. So move traverse() so that we can eventually use it in
seq_read and do thus some day support pread().
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Cc: Paul Turner <pjt@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The code wasn't doing a kfree on the sorted array
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
configfs_depend_item()"
This reverts commit 0e0333429a6280e6eb3c98845e4eed90d5f8078a.
I committed this by accident - Joel and Louis are working with the lockdep
maintainer to provide a better solution than just turning lockdep off.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: <Joel Becker <joel.becker@oracle.com>
|
|
On fast devices that go from congested to uncongested very quickly, pdflush
is waiting too often in congestion_wait, and the FS is backing off to
easily in write_cache_pages.
For now, fix this on the btrfs side by only checking congestion after
some bios have already gone down. Longer term a real fix is needed
for pdflush, but that is a larger project.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Whenever an item deletion is done, we need to balance all the nodes
in the tree to make sure we don't end up with an empty node if a pointer
is deleted. This balance prep happens from the root of the tree down
so we can drop our locks as we go.
reada_for_balance was triggering read-ahead on neighboring nodes even
when no balancing was required. This adds an extra check to avoid
calling balance_level() and avoid reada_for_balance() when a balance
won't be required.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
btrfs_unlock_up_safe would break out at the first NULL node entry or
unlocked node it found in the path.
Some of the callers have missing nodes at the lower levels of the path, so this
commit fixes things to check all the nodes in the path before returning.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
btrfs_del_leaf does two things. First it removes the pointer in the
parent, and then it frees the block that has the leaf. It has the
parent node locked for both operations.
But, it only needs the parent locked while it is deleting the pointer.
After that it can safely free the block without the parent locked.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
btrfs_truncate_inode_items is setup to stop doing btree searches when
it has finished removing the items for the inode. It used to detect the
end of the inode by looking for an objectid that didn't match the
one we were searching for.
But, this would result in an extra search through the btree, which
adds extra balancing and cow costs to the operation.
This commit adds a check to see if we found the inode item, which means
we can stop searching early.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
The compression code had some checks to make sure we were only
compressing bytes inside of i_size, but it wasn't catching every
case. To make things worse, some incorrect math about the number
of bytes remaining would make it try to compress more pages than the
file really had.
The fix used here is to fall back to the non-compression code in this
case, which does all the proper cleanup of delalloc and other accounting.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
With selinux on we end up calling __btrfs_setxattr when we create an inode,
which calls btrfs_start_transaction(). The problem is we've already called
that in btrfs_new_inode, and in btrfs_start_transaction we end up doing a
wait_current_trans(). If btrfs-transaction has started committing it will wait
for all handles to finish, while the other process is waiting for the
transaction to commit. This is fixed by using btrfs_join_transaction, which
won't wait for the transaction to commit. Thanks,
Signed-off-by: Josef Bacik <jbacik@redhat.com>
|
|
Before this patch, new files/dirs would ignore the SGID bit on their
parent directory and always be owned by the creating user's uid/gid.
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Every transaction in btrfs creates a new snapshot, and then schedules the
snapshot from the last transaction for deletion. Snapshot deletion
works by walking down the btree and dropping the reference counts
on each btree block during the walk.
If if a given leaf or node has a reference count greater than one,
the reference count is decremented and the subtree pointed to by that
node is ignored.
If the reference count is one, walking continues down into that node
or leaf, and the references of everything it points to are decremented.
The old code would try to work in small pieces, walking down the tree
until it found the lowest leaf or node to free and then returning. This
was very friendly to the rest of the FS because it didn't have a huge
impact on other operations.
But it wouldn't always keep up with the rate that new commits added new
snapshots for deletion, and it wasn't very optimal for the extent
allocation tree because it wasn't finding leaves that were close together
on disk and processing them at the same time.
This changes things to walk down to a level 1 node and then process it
in bulk. All the leaf pointers are sorted and the leaves are dropped
in order based on their extent number.
The extent allocation tree and commit code are now fast enough for
this kind of bulk processing to work without slowing the rest of the FS
down. Overall it does less IO and is better able to keep up with
snapshot deletions under high load.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Most of the btrfs metadata operations can be protected by a spinlock,
but some operations still need to schedule.
So far, btrfs has been using a mutex along with a trylock loop,
most of the time it is able to avoid going for the full mutex, so
the trylock loop is a big performance gain.
This commit is step one for getting rid of the blocking locks entirely.
btrfs_tree_lock takes a spinlock, and the code explicitly switches
to a blocking lock when it starts an operation that can schedule.
We'll be able get rid of the blocking locks in smaller pieces over time.
Tracing allows us to find the most common cause of blocking, so we
can start with the hot spots first.
The basic idea is:
btrfs_tree_lock() returns with the spin lock held
btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in
the extent buffer flags, and then drops the spin lock. The buffer is
still considered locked by all of the btrfs code.
If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops
the spin lock and waits on a wait queue for the blocking bit to go away.
Much of the code that needs to set the blocking bit finishes without actually
blocking a good percentage of the time. So, an adaptive spin is still
used against the blocking bit to avoid very high context switch rates.
btrfs_clear_lock_blocking() clears the blocking bit and returns
with the spinlock held again.
btrfs_tree_unlock() can be called on either blocking or spinning locks,
it does the right thing based on the blocking bit.
ctree.c has a helper function to set/clear all the locked buffers in a
path as blocking.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Before metadata is written to disk, it is updated to reflect that writeout
has begun. Once this update is done, the block must be cow'd before it
can be modified again.
This update was originally synchronized by using a per-fs spinlock. Today
the buffers for the metadata blocks are locked before writeout begins,
and everyone that tests the flag has the buffer locked as well.
So, the per-fs spinlock (called hash_lock for no good reason) is no
longer required.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
extent_io.c has debugging code to report and free leaked extent_state
and extent_buffer objects at rmmod time. This helps track down
leaks and it saves you from rebooting just to properly remove the
kmem_cache object.
But, the code runs under a fairly expensive spinlock and the checks to
see if it is currently enabled are not entirely consistent. Some use
#ifdef and some #if.
This changes everything to #if and disables the leak checking.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
When a block goes through cow, we update the reference counts of
everything that block points to. The internal pointers of the block
can be in just about any order, and it is likely to have clusters of
things that are close together and clusters of things that are not.
To help reduce the seeks that come with updating all of these reference
counts, sort them by byte number before actual updates are done.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Tracing shows the delay between when an async thread goes to sleep
and when more work is added is often very short. This commit adds
a little bit of delay and extra checking to the code right before
we schedule out.
It allows more work to be added to the worker
without requiring notifications from other procs.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Add call to LSM security initialization and save
resulting security xattr for new inodes.
Add xattr support to symlink inode ops.
Set inode->i_op for existing special files.
Signed-off-by: jim owens <jowens@hp.com>
|
|
This patch adds a menu entry to kconfig to enable acls for btrfs.
This allows you to enable FS_POSIX_ACL at kernel compile time.
(updated by Jeff Mahoney to make the changes in fs/btrfs/Kconfig instead)
Signed-off-by: Christian Hesse <mail@earthworm.de>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
|
|
The async bio submission thread was missing some bios that were
added after it had decided there was no work left to do.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Use multiple lables for proper error unwinding and get rid of some now
superflous variables.
Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
|
|
Splitting the task for a VFS-induced inode flush into two functions doesn't
make any sense, so merge the two functions dealing with it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Reviewed-by: Dave Chinner <david@fromorbit.com>
|
|
We currently duplicate code to reset the attribute fork after the last
attribute has been deleted. Factor this out into a small helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
|
|
These aren't only unused but also reference a lock that doesn't exist anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
|