Age | Commit message (Collapse) | Author |
|
Currently, the port and mount port will both display as 65535 if you do not
specify a port number. That would be wrong...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
With the recent spate of changes, the nfs protocol version will now default
to 2 instead of 3, while the mount protocol version defaults to 3.
The following patch should ensure the defaults are consistent with the
previous defaults of vers=3,proto=tcp,mountvers=3,mountproto=tcp.
This fixes the bug
http://bugzilla.kernel.org/show_bug.cgi?id=14259
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
trivial bug in fs/cifs/connect.c .
The bug is caused by fail of extract_hostname()
when mounting cifs file system.
This is the situation when I noticed this bug.
% sudo mount -t cifs //192.168.10.208 mountpoint -o options...
Then my kernel says,
[ 1461.807776] ------------[ cut here ]------------
[ 1461.807781] kernel BUG at mm/slab.c:521!
[ 1461.807784] invalid opcode: 0000 [#2] PREEMPT SMP
[ 1461.807790] last sysfs file:
/sys/devices/pci0000:00/0000:00:1e.0/0000:09:02.0/resource
[ 1461.807793] CPU 0
[ 1461.807796] Modules linked in: nls_iso8859_1 usbhid sbp2 uhci_hcd
ehci_hcd i2c_i801 ohci1394 ieee1394 psmouse serio_raw pcspkr sky2 usbcore
evdev
[ 1461.807816] Pid: 3446, comm: mount Tainted: G D 2.6.32-rc2-vanilla
[ 1461.807820] RIP: 0010:[<ffffffff810b888e>] [<ffffffff810b888e>]
kfree+0x63/0x156
[ 1461.807829] RSP: 0018:ffff8800b4f7fbb8 EFLAGS: 00010046
[ 1461.807832] RAX: ffffea00033fff98 RBX: ffff8800afbae7e2 RCX:
0000000000000000
[ 1461.807836] RDX: ffffea0000000000 RSI: 000000000000005c RDI:
ffffffffffffffea
[ 1461.807839] RBP: ffff8800b4f7fbf8 R08: 0000000000000001 R09:
0000000000000000
[ 1461.807842] R10: 0000000000000000 R11: ffff8800b4f7fbf8 R12:
00000000ffffffea
[ 1461.807845] R13: ffff8800afb23000 R14: ffff8800b4f87bc0 R15:
ffffffffffffffea
[ 1461.807849] FS: 00007f52b6f187c0(0000) GS:ffff880007600000(0000)
knlGS:0000000000000000
[ 1461.807852] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1461.807855] CR2: 0000000000613000 CR3: 00000000af8f9000 CR4:
00000000000006f0
[ 1461.807858] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1461.807861] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 1461.807865] Process mount (pid: 3446, threadinfo ffff8800b4f7e000, task
ffff8800950e4380)
[ 1461.807867] Stack:
[ 1461.807869] 0000000000000202 0000000000000282 ffff8800b4f7fbf8
ffff8800afbae7e2
[ 1461.807876] <0> 00000000ffffffea ffff8800afb23000 ffff8800b4f87bc0
ffff8800b4f7fc28
[ 1461.807884] <0> ffff8800b4f7fcd8 ffffffff81159f6d ffffffff81147bc2
ffffffff816bfb48
[ 1461.807892] Call Trace:
[ 1461.807899] [<ffffffff81159f6d>] cifs_get_tcp_session+0x440/0x44b
[ 1461.807904] [<ffffffff81147bc2>] ? find_nls+0x1c/0xe9
[ 1461.807909] [<ffffffff8115b889>] cifs_mount+0x16bc/0x2167
[ 1461.807917] [<ffffffff814455bd>] ? _spin_unlock+0x30/0x4b
[ 1461.807923] [<ffffffff81150da9>] cifs_get_sb+0xa5/0x1a8
[ 1461.807928] [<ffffffff810c1b94>] vfs_kern_mount+0x56/0xc9
[ 1461.807933] [<ffffffff810c1c64>] do_kern_mount+0x47/0xe7
[ 1461.807938] [<ffffffff810d8632>] do_mount+0x712/0x775
[ 1461.807943] [<ffffffff810d671f>] ? copy_mount_options+0xcf/0x132
[ 1461.807948] [<ffffffff810d8714>] sys_mount+0x7f/0xbf
[ 1461.807953] [<ffffffff8144509a>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1461.807960] [<ffffffff81011cc2>] system_call_fastpath+0x16/0x1b
[ 1461.807963] Code: 00 00 00 00 ea ff ff 48 c1 e8 0c 48 6b c0 68 48 01 d0
66 83 38 00 79 04 48 8b 40 10 66 83 38 00 79 04 48 8b 40 10 80 38 00 78 04
<0f> 0b eb fe 4c 8b 70 58 4c 89 ff 41 8b 76 4c e8 b8 49 fb ff e8
[ 1461.808022] RIP [<ffffffff810b888e>] kfree+0x63/0x156
[ 1461.808027] RSP <ffff8800b4f7fbb8>
[ 1461.808031] ---[ end trace ffe26fcdc72c0ce4 ]---
The reason of this bug is that the error handling code of
cifs_get_tcp_session()
calls kfree() when corresponding kmalloc() failed.
(The kmalloc() is called by extract_hostname().)
Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
CC: Stable <stable@kernel.org>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
|
|
Commit a9327cac440be4d8333bba975cbbf76045096275 added seperate read
and write statistics of in_flight requests. And exported the number
of read and write requests in progress seperately through sysfs.
But Corrado Zoccolo <czoccolo@gmail.com> reported getting strange
output from "iostat -kx 2". Global values for service time and
utilization were garbage. For interval values, utilization was always
100%, and service time is higher than normal.
So this was reverted by commit 0f78ab9899e9d6acb09d5465def618704255963b
The problem was in part_round_stats_single(), I missed the following:
if (now == part->stamp)
return;
- if (part->in_flight) {
+ if (part_in_flight(part)) {
__part_stat_add(cpu, part, time_in_queue,
part_in_flight(part) * (now - part->stamp));
__part_stat_add(cpu, part, io_ticks, (now - part->stamp));
With this chunk included, the reported regression gets fixed.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
--
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Like the cluster allocating stuff, we can lockup the box with the normal
allocation path. This happens when we
1) Start to cache a block group that is severely fragmented, but has a decent
amount of free space.
2) Start to commit a transaction
3) Have the commit try and empty out some of the delalloc inodes with extents
that are relatively large.
The inodes will not be able to make the allocations because they will ask for
allocations larger than a contiguous area in the free space cache. So we will
wait for more progress to be made on the block group, but since we're in a
commit the caching kthread won't make any more progress and it already has
enough free space that wait_block_group_cache_progress will just return. So,
if we wait and fail to make the allocation the next time around, just loop and
go to the next block group. This keeps us from getting stuck in a softlockup.
Thanks,
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
The btrfs async worker threads are used for a wide variety of things,
including processing bio end_io functions. This means that when
the endio threads aren't running, the rest of the FS isn't
able to do the final processing required to clear PageWriteback.
The endio threads also try to exit as they become idle and
start more as the work piles up. The problem is that starting more
threads means kthreadd may need to allocate ram, and that allocation
may wait until the global number of writeback pages on the system is
below a certain limit.
The result of that throttling is that end IO threads wait on
kthreadd, who is waiting on IO to end, which will never happen.
This commit fixes the deadlock by handing off thread startup to a
dedicated thread. It also fixes a bug where the on-demand thread
creation was creating far too many threads because it didn't take into
account threads being started by other procs.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (41 commits)
Revert "Seperate read and write statistics of in_flight requests"
cfq-iosched: don't delay async queue if it hasn't dispatched at all
block: Topology ioctls
cfq-iosched: use assigned slice sync value, not default
cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'
cfq-iosched: implement slower async initiate and queue ramp up
cfq-iosched: delay async IO dispatch, if sync IO was just done
cfq-iosched: add a knob for desktop interactiveness
Add a tracepoint for block request remapping
block: allow large discard requests
block: use normal I/O path for discard requests
swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL
fs/bio.c: move EXPORT* macros to line after function
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
cciss: fix build when !PROC_FS
block: Do not clamp max_hw_sectors for stacking devices
block: Set max_sectors correctly for stacking devices
cciss: cciss_host_attr_groups should be const
cciss: Dynamically allocate the drive_info_struct for each logical drive.
cciss: Add usage_count attribute to each logical drive in /sys
...
|
|
This reverts commit a9327cac440be4d8333bba975cbbf76045096275.
Corrado Zoccolo <czoccolo@gmail.com> reports:
"with 2.6.32-rc1 I started getting the following strange output from
"iostat -kx 2":
Linux 2.6.31bisect (et2) 04/10/2009 _i686_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
10,70 0,00 3,16 15,75 0,00 70,38
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 18,22 0,00 0,67 0,01 14,77 0,02
43,94 0,01 10,53 39043915,03 2629219,87
sdb 60,89 9,68 50,79 3,04 1724,43 50,52
65,95 0,70 13,06 488437,47 2629219,87
avg-cpu: %user %nice %system %iowait %steal %idle
2,72 0,00 0,74 0,00 0,00 96,53
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
avg-cpu: %user %nice %system %iowait %steal %idle
6,68 0,00 0,99 0,00 0,00 92,33
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
avg-cpu: %user %nice %system %iowait %steal %idle
4,40 0,00 0,73 1,47 0,00 93,40
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 4,00 0,00 3,00 0,00 28,00
18,67 0,06 19,50 333,33 100,00
Global values for service time and utilization are garbage. For
interval values, utilization is always 100%, and service time is
higher than normal.
I bisected it down to:
[a9327cac440be4d8333bba975cbbf76045096275] Seperate read and write
statistics of in_flight requests
and verified that reverting just that commit indeed solves the issue
on 2.6.32-rc1."
So until this is debugged, revert the bad commit.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
[PATCH] ext4: retry failed direct IO allocations
ext4: Fix build warning in ext4_dirty_inode()
ext4: drop ext4dev compat
ext4: fix a BUG_ON crash by checking that page has buffers attached to it
|
|
On a 256M filesystem, doing this in a loop:
xfs_io -F -f -d -c 'pwrite 0 64m' test
rm -f test
eventually leads to ENOSPC. (the xfs_io command does a
64m direct IO write to the file "test")
As with other block allocation callers, it looks like we need to
potentially retry the allocations on the initial ENOSPC.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
This fixes the following warning:
fs/ext4/inode.c: In function 'ext4_dirty_inode':
fs/ext4/inode.c:5615: warning: unused variable 'current_handle'
We remove the jbd_debug() statement which does use current_handle, as
it's not terribly important in the grand scheme of things.
Thanks to Stephen Rothwell for pointing this out.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix data space leak fix
Btrfs: remove duplicates of filemap_ helpers
Btrfs: take i_mutex before generic_write_checks
Btrfs: fix arguments to btrfs_wait_on_page_writeback_range
Btrfs: fix deadlock with free space handling and user transactions
Btrfs: fix error cases for ioctl transactions
Btrfs: Use CONFIG_BTRFS_POSIX_ACL to enable ACL code
Btrfs: introduce missing kfree
Btrfs: Fix setting umask when POSIX ACLs are not enabled
Btrfs: proper -ENOSPC handling
|
|
It's just a wrapper for <linux/fscache.h>, so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable into for-linus
|
|
There is a problem where page_mkwrite can be called on a dirtied page that
already has a delalloc range associated with it. The fix is to clear any
delalloc bits for the range we are dirtying so the space accounting gets
handled properly. This is the same thing we do in the normal write case, so we
are consistent across the board. With this patch we no longer leak reserved
space.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
As mentioned in Documentation/CodingStyle, move EXPORT* macro's
to the line immediately after the closing function brace line.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Use filemap_fdatawrite_range and filemap_fdatawait_range instead of
local copies of the functions. For filemap_fdatawait_range that
also means replacing the awkward old wait_on_page_writeback_range
calling convention with the regular filemap byte offsets.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable into for-linus
|
|
btrfs_file_write was incorrectly calling generic_write_checks without
taking i_mutex. This lead to problems with racing around i_size when
doing O_APPEND writes.
The fix here is to move i_mutex higher.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
wait_on_page_writeback_range/btrfs_wait_on_page_writeback_range takes
a pagecache offset, not a byte offset into the file. Shift the arguments
around to wait for the correct range
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Kconfig & super.c promised it'd be gone by 2.6.31, so it's
about time to drop it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
In ext4_num_dirty_pages() we were calling page_buffers() before
checking to see if the page actually had pages attached to it; this
would cause a BUG check crash in the inline function page_buffers().
Thanks to Markus Trippelsdorf for reporting this bug.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The code to set up sctp sockets was not using the sockfd_lookup()
and sockfd_put() routines to translate an fd to a socket. The
direct fget and fput calls were resulting in error messages from
alloc_fd().
Also clean up two log messages and remove a third, related to
setting up sctp associations.
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
The recently added dlm_lowcomms_connect_node() from
391fbdc5d527149578490db2f1619951d91f3561 does not work
when using SCTP instead of TCP. The sctp connection code
has nothing to do without data to send. Check for no data
in the sctp connection code and do nothing instead of
triggering a BUG. Also have connect_node() do nothing
when the protocol is sctp.
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix missing initialization of i_dir_start_lookup member
nilfs2: fix missing zero-fill initialization of btree node cache
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix time encoding with extra epoch bits
ext4: Add a stub for mpage_da_data in the trace header
jbd2: Use tracepoints for history file
ext4: Use tracepoints for mb_history trace file
ext4, jbd2: Drop unneeded printks at mount and unmount time
ext4: Handle nested ext4_journal_start/stop calls without a journal
ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode
ext4: Avoid updating the inode table bh twice in no journal mode
ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first
ext4: async direct IO for holes and fallocate support
ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O
ext4: Split uninitialized extents for direct I/O
ext4: release reserved quota when block reservation for delalloc retry
ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
ext4: Fix hueristic which avoids group preallocation for closed files
ext4: Use ext4_msg() for ext4_da_writepage() errors
ext4: Update documentation about quota mount options
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
fat: Check s_dirt in fat_sync_fs()
vfat: change the default from shortname=lower to shortname=mixed
fat/nls: Fix handling of utf8 invalid char
|
|
"Looking at ext4.h, I think the setting of extra time fields forgets to
mask the epoch bits so the epoch part overwrites nsec part. The second
change is only for coherency (2 -> EXT4_EPOCH_BITS)."
Thanks to Damien Guibouret for pointing out this problem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The /proc/fs/jbd2/<dev>/history was maintained manually; by using
tracepoints, we can get all of the existing functionality of the /proc
file plus extra capabilities thanks to the ftrace infrastructure. We
save memory as a bonus.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.
By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
If an ioctl-initiated transaction is open, we can't force a commit during
the free space checks in order to free up pinned extents or else we
deadlock. Just ENOSPC instead.
A more satisfying solution that reserves space for the entire user
transaction up front is forthcoming...
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Fix leak of vfsmount write reference and open_ioctl_trans reference on
ENOMEM. Clean up the error paths while we're at it.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
There are a number of kernel printk's which are printed when an ext4
filesystem is mounted and unmounted. Disable them to economize space
in the system logs. In addition, disabling the mballoc stats by
default saves a number of unneeded atomic operations for every block
allocation or deallocation.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
We've already defined CONFIG_BTRFS_POSIX_ACL in Kconfig, but we're
currently not using it and are testing CONFIG_FS_POSIX_ACL instead.
CONFIG_FS_POSIX_ACL states "Never use this symbol for ifdefs".
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Error handling code following a kzalloc should free the allocated data.
The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,f1,l;
position p1,p2;
expression *ptr != NULL;
@@
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
<... when != x
when != if (...) { <+...x...+> }
(
x->f1 = E
|
(x->f1 == NULL || ...)
|
f(...,x->f1,...)
)
...>
(
return \(0\|<+...x...+>\|ptr\);
|
return@p2 ...;
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
We currently set sb->s_flags |= MS_POSIXACL unconditionally, which is
incorrect -- it tells the VFS that it shouldn't set umask because we
will, yet we don't set it ourselves if we aren't using POSIX ACLs, so
the umask ends up ignored.
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
This patch fixes a problem with handling nested calls to
ext4_journal_start/ext4_journal_stop, when there is no journal present.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
This patch a problem that ext4_dirty_inode() was not calling
ext4_mark_inode_dirty() if the current_handle is not valid, which it
is the case in no journal mode.
It also removes a test for non-matching transaction which can never
happen.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
This is a cleanup of commit 91ac6f4. Since ext4_mark_inode_dirty()
has already called ext4_mark_iloc_dirty(), which in turn calls
ext4_do_update_inode(), it's not necessary to have ext4_write_inode()
call ext4_do_update_inode() in no journal mode. Indeed, it would be
duplicated work.
Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The i_dir_start_lookup field in nilfs_inode_info objects should be
cleared when the objects are allocated, but the the initialization was
missing in case of reading from disk. This adds the initialization.
Since the variable just gives a start page on directory lookups, the
bug was nonfatal until now.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This will fix file system corruption which infrequently happens after
mount. The problem was reported from users with the title "[NILFS
users] Fail to mount NILFS." (Message-ID:
<200908211918.34720.yuri@itinteg.net>), and so forth. I've also
experienced the corruption multiple times on kernel 2.6.30 and 2.6.31.
The problem turned out to be caused due to discordance between
mapping->nrpages of a btree node cache and the actual number of pages
hung on the cache; if the mapping->nrpages becomes zero even as it has
pages, truncate_inode_pages() returns without doing anything. Usually
this is harmless except it may cause page leak, but garbage collection
fairly infrequently sees a stale page remained in the btree node cache
of DAT (i.e. disk address translation file of nilfs), and induces the
corruption.
I identified a missing initialization in btree node caches was the
root cause. This corrects the bug.
I've tested this for kernel 2.6.30 and 2.6.31.
Reported-by: Yuri Chislov <yuri@itinteg.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable <stable@kernel.org>
|
|
At the start of a transaction we do a btrfs_reserve_metadata_space() and
specify how many items we plan on modifying. Then once we've done our
modifications and such, just call btrfs_unreserve_metadata_space() for
the same number of items we reserved.
For keeping track of metadata needed for data I've had to add an extent_io op
for when we merge extents. This lets us track space properly when we are doing
sequential writes, so we don't end up reserving way more metadata space than
what we need.
The only place where the metadata space accounting is not done is in the
relocation code. This is because Yan is going to be reworking that code in the
near future, so running btrfs-vol -b could still possibly result in a ENOSPC
related panic. This patch also turns off the metadata_ratio stuff in order to
allow users to more efficiently use their disk space.
This patch makes it so we track how much metadata we need for an inode's
delayed allocation extents by tracking how many extents are currently
waiting for allocation. It introduces two new callbacks for the
extent_io tree's, merge_extent_hook and split_extent_hook. These help
us keep track of when we merge delalloc extents together and split them
up. Reservations are handled prior to any actually dirty'ing occurs,
and then we unreserve after we dirty.
btrfs_unreserve_metadata_for_delalloc() will make the appropriate
unreservations as needed based on the number of reservations we
currently have and the number of extents we currently have. Doing the
reservation outside of doing any of the actual dirty'ing lets us do
things like filemap_flush() the inode to try and force delalloc to
happen, or as a last resort actually start allocation on all delalloc
inodes in the fs. This has survived dbench, fs_mark and an fsx torture
test.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Move the check to make sure the original and donor inodes are
different earlier, to avoid a potential deadlock by trying to lock the
same inode twice.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
For async direct IO that covers holes or fallocate, the end_io
callback function now queued the convertion work on workqueue but
don't flush the work rightaway as it might take too long to afford.
But when fsync is called after all the data is completed, user expects
the metadata also being updated before fsync returns.
Thus we need to flush the conversion work when fsync() is called.
This patch keep track of a listed of completed async direct io that
has a work queued on workqueue. When fsync() is called, it will go
through the list and do the conversion.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
|
|
Currently the DIO VFS code passes create = 0 when writing to the
middle of file. It does this to avoid block allocation for holes, so
as not to expose stale data out when there is a parallel buffered read
(which does not hold the i_mutex lock). Direct I/O writes into holes
falls back to buffered IO for this reason.
Since preallocated extents are treated as holes when doing a
get_block() look up (buffer is not mapped), direct IO over fallocate
also falls back to buffered IO. Thus ext4 actually silently falls
back to buffered IO in above two cases, which is undesirable.
To fix this, this patch creates unitialized extents when a direct I/O
write into holes in sparse files, and registering an end_io callback which
converts the uninitialized extent to an initialized extent after the
I/O is completed.
Singed-Off-By: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
When writing into an unitialized extent via direct I/O, and the direct
I/O doesn't exactly cover the unitialized extent, split the extent
into uninitialized and initialized extents before submitting the I/O.
This avoids needing to deal with an ENOSPC error in the end_io
callback that gets used for direct I/O.
When the IO is complete, the written extent will be marked as initialized.
Singed-Off-By: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
ext4_da_reserve_space() can reserve quota blocks multiple times if
ext4_claim_free_blocks() fail and we retry the allocation. We should
release the quota reservation before restarting.
Bug found by Jan Kara.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Work around problems in the writeback code to force out writebacks in
larger chunks than just 4mb, which is just too small. This also works
around limitations in the ext4 block allocator, which can't allocate
more than 2048 blocks at a time. So we need to defeat the round-robin
characteristics of the writeback code and try to write out as many
blocks in one inode before allowing the writeback code to move on to
another inode. We add a a new per-filesystem tunable,
max_writeback_mb_bump, which caps this to a default of 128mb per
inode.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|