aboutsummaryrefslogtreecommitdiff
path: root/fs/dlm
AgeCommit message (Collapse)Author
2007-02-05[DLM] fix softlockup in dlm_recvPatrick Caulfield
This patch stops the dlm_recv workqueue from busy-waiting when a node disconnects. This can cause soft lockup errors on debug systems and bad performance generally. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] zero new user lvbsDavid Teigland
A new lvb for a userland lock wasn't being initialized to zero. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM/GFS2] indent help textRandy Dunlap
Indent help text as expected. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[GFS2/DLM] fix GFS2 circular dependencyAdrian Bunk
On Sun, Jan 28, 2007 at 11:08:18AM +0100, Jiri Slaby wrote: > Andrew Morton napsal(a): > >Temporarily at > > > > http://userweb.kernel.org/~akpm/2.6.20-rc6-mm1/ > > Unable to select IPV6. Menuconfig doesn't offer it when INET is selected. > When it's not it appears in the menu, but after state change it gets away. > The same behaviour in xconfig, gconfig. > > $ mkdir ../a/tst > $ make O=../a/tst menuconfig > HOSTCC scripts/basic/fixdep > [...] > HOSTLD scripts/kconfig/mconf > scripts/kconfig/mconf arch/i386/Kconfig > Warning! Found recursive dependency: INET GFS2_FS_LOCKING_DLM SYSFS > OCFS2_FS INET > > Maybe this is the problem? Yes, patch below. > regards, cu Adrian <-- snip --> This patch fixes a circular dependency by letting GFS2_FS_LOCKING_DLM and DLM depend on instead of select SYSFS. Since SYSFS depends on EMBEDDED this change shouldn't cause any problems for users. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[GFS2/DLM] use sysfsRandy Dunlap
With CONFIG_DLM=m, CONFIG_PROC_FS=n, and CONFIG_SYSFS=n, kernel build fails with: WARNING: "kernel_subsys" [fs/gfs2/locking/dlm/lock_dlm.ko] undefined! WARNING: "kernel_subsys" [fs/dlm/dlm.ko] undefined! WARNING: "kernel_subsys" [fs/configfs/configfs.ko] undefined! make[1]: *** [__modpost] Error 1 make: *** [modules] Error 2 Since fs/dlm/lockspace.c and fs/gfs2/locking/dlm/sysfs.c use kernel_subsys, they should either DEPEND on it or SELECT it. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] can miss clearing resend flagDavid Teigland
A long, complicated sequence of events, beginning with the RESEND flag not being cleared on an lkb, can result in an unlock never completing. - lkb on waiters list for remote lookup - the remote node is both the dir node and the master node, so it optimizes the lookup into a request and sends a request reply back - the request reply is saved on the requestqueue to be processed after recovery - recovery runs dlm_recover_waiters_pre() which sets RESEND flag so the lookup will be resent after recovery - end of recovery: process_requestqueue takes saved request reply which removes the lkb off the waitesr list, _without_ clearing the RESEND flag - end of recovery: dlm_recover_waiters_post() doesn't do anything with the now completed lookup lkb (would usually clear RESEND) - later, the node unmounts, unlocks this lkb that still has RESEND flag set - the lkb is on the waiters list again, now for unlock, when recovery occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND set, doesn't do anything since the master still exists - end of recovery: dlm_recover_waiters_post() takes this lkb off the waiters list because it has the RESEND flag set, then reports an error because unlocks are never supposed to be handled in recover_waiters_post(). - later, the unlock reply is received, doesn't find the lkb on the waiters list because recover_waiters_post() has wrongly removed it. - the unlock operation has been lost, and we're left with a stray granted lock - unmount spins waiting for the unlock to complete The visible evidence of this problem will be a node where gfs umount is spinning, the dlm waiters list will be empty, and the dlm locks list will show a granted lock. The fix is simply to clear the RESEND flag when taking an lkb off the waiters list. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] saved dlm message can be droppedDavid Teigland
dlm_receive_message() returns 0 instead of returning 'error'. What would happen is that process_requestqueue would take a saved message off the requestqueue and call receive_message on it. receive_message would then see that recovery had been aborted, set error to EINTR, and 'goto out', expecting that the error would be returned. Instead, 0 was always returned, so process_requestqueue would think that the message had been processed and delete it instead of saving it to process next time. This means the message (usually an unlock in my tests) would be lost. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] Make sock_sem into a mutexPatrick Caulfield
Now that there can be multiple dlm_recv threads running we need to prevent two recvs running for the same connection - it's unlikely but it can happen and it causes message corruption. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] fix lowcomms receivingPatrick Caulfield
This patch fixes a bug whereby data on a newly accepted connection would be ignored if it arrived soon after the accept. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] lowcomms tidyPatrick Caulfield
This patch removes some redundant fields from the connection structure and adds some lockdep annotation to remove spurious warnings. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] fix master recoveryDavid Teigland
If master recovery happens on an rsb in one recovery sequence, then that sequence is aborted before lock recovery happens, then in the next sequence, we rely on the previous master recovery (which may now be invalid due to another node ignoring a lookup result) and go on do to the lock recovery where we get stuck due to an invalid master value. recovery cycle begins: master of rsb X has left nodes A and B send node C an rcom lookup for X to find the new master C gets lookup from B first, sets B as new master, and sends reply back to B C gets lookup from A next, and sends reply back to A saying B is master A gets lookup reply from C and sets B as the new master in the rsb recovery cycle on A, B and C is aborted to start a new recovery B gets lookup reply from C and ignores it since there's a new recovery recovery cycle begins: some other node has joined B doesn't think it's the master of X so it doesn't rebuild it in the directory C looks up the master of X, no one is master, so it becomes new master B looks up the master of X, finds it's C A believes that B is the master of X, so it sends its lock to B B sends an error back to A A resends this repeats forever, the incorrect master value on A is never corrected The fix is to do master recovery on an rsb that still has the NEW_MASTER flag set from an earlier recovery sequence, and therefore didn't complete lock recovery. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] fix user unlockingDavid Teigland
When a user process exits, we clear all the locks it holds. There is a problem, though, with locks that the process had begun unlocking before it exited. We couldn't find the lkb's that were in the process of being unlocked remotely, to flag that they are DEAD. To solve this, we move lkb's being unlocked onto a new list in the per-process structure that tracks what locks the process is holding. We can then go through this list to flag the necessary lkb's when clearing locks for a process when it exits. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] Use workqueues for dlm lowcommsPatrick Caulfield
This patch converts the DLM TCP lowcomms to use workqueues rather than using its own daemon functions. Simultaneously removing a lot of code and making it more scalable on multi-processor machines. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] expose dlm_config_info fields in configfsDavid Teigland
Make the dlm_config_info values readable and writeable via configfs entries. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] add config entry to enable log_debugDavid Teigland
Add a new dlm_config_info field to enable log_debug output and change log_debug() to use it. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] rename dlm_config_info fieldsDavid Teigland
Add a "ci_" prefix to the fields in the dlm_config_info struct so that we can use macros to add configfs functions to access them (in a later patch). No functional changes in this patch, just naming changes. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] change some log_error to log_debugDavid Teigland
Some common, non-error messages should use log_debug instead of log_error so they can be turned off. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] Fix spin lock already unlocked bugPatrick Caulfield
I just noticed this message when testing some other changes I'd made to lowcomms (to use workqueues) but the problem seems to be in the current git trees too. I'm amazed no-one has seen it. BUG: spinlock already unlocked on CPU#1, dlm_recoverd/16868 Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] Fix schedule() callsPatrick Caulfield
I was a little over-enthusiastic turning schedule() calls int cond_sched() when fixing the DLM for Andrew Morton. These four should really be calls to schedule() or the dlm can busy-wait. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] fs/dlm/lowcomms-tcp.c: remove 2 functionsAdrian Bunk
Remove the following unused functions: - lowcomms_send_message() - lowcomms_max_buffer_size() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] fix lost flags in stub repliesDavid Teigland
When the dlm fakes an unlock/cancel reply from a failed node using a stub message struct, it wasn't setting the flags in the stub message. So, in the process of receiving the fake message the lkb flags would be updated and cleared from the zero flags in the message. The problem observed in tests was the loss of the USER flag which caused the dlm to think a user lock was a kernel lock and subsequently fail an assertion checking the validity of the ast/callback field. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] fix receive_request() lvb copyingDavid Teigland
LVB's are not sent as part of new requests, but the code receiving the request was copying data into the lvb anyway. The space in the message where it mistakenly thought the lvb lived actually contained the resource name, so it wound up incorrectly copying this name data into the lvb. Fix is to just create the lvb, not copy junk into it. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] fix send_args() lvb copyingDavid Teigland
The send_args() function is used to copy parameters into a message for a number different message types. Only some of those types are set up beforehand (in create_message) to include space for sending lvb data. send_args was wrongly copying the lvb for all message types as long as the lock had an lvb. This means that the lvb data was being written past the end of the message into unknown space. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] add version checkDavid Teigland
Check if we receive a message from another lockspace member running a version of the dlm with an incompatible inter-node message protocol. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] fix old rcom messagesDavid Teigland
A reply to a recovery message will often be received after the relevant recovery sequence has aborted and the next recovery sequence has begun. We need to ignore replies to these old messages from the previous recovery. There's already a way to do this for synchronous recovery requests using the rc_id number, but not for async. Each recovery sequence already has a locally unique sequence number associated with it. This patch adds a field to the rcom (recovery message) structure where this recovery sequence number can be placed, rc_seq. When a node sends a reply to a recovery request, it copies the rc_seq number it received into rc_seq_reply. When the first node receives the reply to its recovery message, it will check whether rc_seq_reply matches the current recovery sequence number, ls_recover_seq, and if not then it ignores the old reply. An old, inadequate approach to filtering out old replies (checking if the current stage of recovery has moved back to the start) has been removed from two spots. The protocol version number is changed to reflect the different rcom structures. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] fix resend rcom lockDavid Teigland
There's a chance the new master of resource hasn't learned it's the new master before another node sends it a lock during recovery. The node sending the lock needs to resend if this happens. - A sends a master lookup for resource R to C - B sends a master lookup for resource R to C - C receives A's lookup, assigns A to be master of R and sends a reply back to A - C receives B's lookup and sends a reply back to B saying that A is the master - B receives lookup reply from C and sends its lock for R to A - A receives lock from B, doesn't think it's the master of R and sends an error back to B - A receives lookup reply from C and becomes master of R - B gets error back from A and resends its lock back to A (this resending is what this patch does) - A receives lock from B, it now sees it's the master of R and takes the lock Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-12-15[DLM] fix compile warningPatrick Caulfield
This patch fixes a compile warning in lowcomms-tcp.c indicating that kmem_cache_t is deprecated. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-12-07Merge master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (73 commits) [DLM] Clean up lowcomms [GFS2] Change gfs2_fsync() to use write_inode_now() [GFS2] Fix indent in recovery.c [GFS2] Don't flush everything on fdatasync [GFS2] Add a comment about reading the super block [GFS2] Mount problem with the GFS2 code [GFS2] Remove gfs2_check_acl() [DLM] fix format warnings in rcom.c and recoverd.c [GFS2] lock function parameter [DLM] don't accept replies to old recovery messages [DLM] fix size of STATUS_REPLY message [GFS2] fs/gfs2/log.c:log_bmap() fix printk format warning [DLM] fix add_requestqueue checking nodes list [GFS2] Fix recursive locking in gfs2_getattr [GFS2] Fix recursive locking in gfs2_permission [GFS2] Reduce number of arguments to meta_io.c:getbuf() [GFS2] Move gfs2_meta_syncfs() into log.c [GFS2] Fix journal flush problem [GFS2] mark_inode_dirty after write to stuffed file [GFS2] Fix glock ordering on inode creation ...
2006-12-07[PATCH] slab: remove kmem_cache_tChristoph Lameter
Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[DLM] Clean up lowcommsPatrick Caulfield
This fixes up most of the things pointed out by akpm and Pavel Machek with comments below indicating why some things have been left: Andrew Morton wrote: > >> +static struct nodeinfo *nodeid2nodeinfo(int nodeid, gfp_t alloc) >> +{ >> + struct nodeinfo *ni; >> + int r; >> + int n; >> + >> + down_read(&nodeinfo_lock); > > Given that this function can sleep, I wonder if `alloc' is useful. > > I see lots of callers passing in a literal "0" for `alloc'. That's in fact > a secret (GFP_ATOMIC & ~__GFP_HIGH). I doubt if that's what you really > meant. Particularly as the code could at least have used __GFP_WAIT (aka > GFP_NOIO) which is much, much more reliable than "0". In fact "0" is the > least reliable mode possible. > > IOW, this is all bollixed up. When 0 is passed into nodeid2nodeinfo the function does not try to allocate a new structure at all. it's an indication that the caller only wants the nodeinfo struct for that nodeid if there actually is one in existance. I've tidied the function itself so it's more obvious, (and tidier!) >> +/* Data received from remote end */ >> +static int receive_from_sock(void) >> +{ >> + int ret = 0; >> + struct msghdr msg; >> + struct kvec iov[2]; >> + unsigned len; >> + int r; >> + struct sctp_sndrcvinfo *sinfo; >> + struct cmsghdr *cmsg; >> + struct nodeinfo *ni; >> + >> + /* These two are marginally too big for stack allocation, but this >> + * function is (currently) only called by dlm_recvd so static should be >> + * OK. >> + */ >> + static struct sockaddr_storage msgname; >> + static char incmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))]; > > whoa. This is globally singly-threaded code?? Yes. it is only ever run in the context of dlm_recvd. >> >> +static void initiate_association(int nodeid) >> +{ >> + struct sockaddr_storage rem_addr; >> + static char outcmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))]; > > Another static buffer to worry about. Globally singly-threaded code? Yes. Only ever called by dlm_sendd. >> + >> +/* Send a message */ >> +static int send_to_sock(struct nodeinfo *ni) >> +{ >> + int ret = 0; >> + struct writequeue_entry *e; >> + int len, offset; >> + struct msghdr outmsg; >> + static char outcmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))]; > > Singly-threaded? Yep. >> >> +static void dealloc_nodeinfo(void) >> +{ >> + int i; >> + >> + for (i=1; i<=max_nodeid; i++) { >> + struct nodeinfo *ni = nodeid2nodeinfo(i, 0); >> + if (ni) { >> + idr_remove(&nodeinfo_idr, i); > > Didn't that need locking? Not. it's only ever called at DLM shutdown after all the other threads have been stopped. >> >> +static int write_list_empty(void) >> +{ >> + int status; >> + >> + spin_lock_bh(&write_nodes_lock); >> + status = list_empty(&write_nodes); >> + spin_unlock_bh(&write_nodes_lock); >> + >> + return status; >> +} > > This function's return value is meaningless. As soon as the lock gets > dropped, the return value can get out of sync with reality. > > Looking at the caller, this _might_ happen to be OK, but it's a nasty and > dangerous thing. Really the locking should be moved into the caller. It's just an optimisation to allow the caller to schedule if there is no work to do. if something arrives immediately afterwards then it will get picked up when the process re-awakes (and it will be woken by that arrival). The 'accepting' atomic has gone completely. as Andrew pointed out it didn't really achieve much anyway. I suspect it was a plaster over some other startup or shutdown bug to be honest. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Pavel Machek <pavel@ucw.cz>
2006-11-30[DLM] fix format warnings in rcom.c and recoverd.cRyusuke Konishi
This fixes the following gcc warnings generated on the architectures where uint64_t != unsigned long long (e.g. ppc64). fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'uint64_t' fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'uint64_t' fs/dlm/recoverd.c:48: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t' fs/dlm/recoverd.c:202: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t' fs/dlm/recoverd.c:210: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t' Signed-off-by: Ryusuke Konishi <ryusuke@osrg.net> Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] don't accept replies to old recovery messagesDavid Teigland
We often abort a recovery after sending a status request to a remote node. We want to ignore any potential status reply we get from the remote node. If we get one of these unwanted replies, we've often moved on to the next recovery message and incremented the message sequence counter, so the reply will be ignored due to the seq number. In some cases, we've not moved on to the next message so the seq number of the reply we want to ignore is still correct, causing the reply to be accepted. The next recovery message will then mistake this old reply as a new one. To fix this, we add the flag RCOM_WAIT to indicate when we can accept a new reply. We clear this flag if we abort recovery while waiting for a reply. Before the flag is set again (to allow new replies) we know that any old replies will be rejected due to their sequence number. We also initialize the recovery-message sequence number to a random value when a lockspace is first created. This makes it clear when messages are being rejected from an old instance of a lockspace that has since been recreated. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] fix size of STATUS_REPLY messageDavid Teigland
When the not_ready routine sends a "fake" status reply with blank status flags, it needs to use the correct size for a normal STATUS_REPLY by including the size of the would-be config parameters. We also fill in the non-existant config parameters with an invalid lvblen value so it's easier to notice if these invalid paratmers are ever being used. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] fix add_requestqueue checking nodes listDavid Teigland
Requests that arrive after recovery has started are saved in the requestqueue and processed after recovery is done. Some of these requests are purged during recovery if they are from nodes that have been removed. We move the purging of the requests (dlm_purge_requestqueue) to later in the recovery sequence which allows the routine saving requests (dlm_add_requestqueue) to avoid filtering out requests by nodeid since the same will be done by the purge. The current code has add_requestqueue filtering by nodeid but doesn't hold any locks when accessing the list of current nodes. This also means that we need to call the purge routine when the lockspace is being shut down since the add routine will not be rejecting requests itself any more. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] Fix DLM configPatrick Caulfield
The attached patch fixes the DLM config so that it selects the chosen network transport. It should fix the bug where DLM can be left selected when NET gets unselected. This incorporates all the comments received about this patch. Cc: Adrian Bunk <bunk@stusta.de> Cc: Andrew Morton <akpm@osdl.org> Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] clear sbflags on lock masterDavid Teigland
RH BZ 211622 The ALTMODE flag can be set in the lock master's copy of the lock but never cleared, so ALTMODE will also be returned in a subsequent conversion of the lock when it shouldn't be. This results in lock_dlm incorrectly switching to the alternate lock mode when returning the result to gfs which then asserts when it sees the wrong lock state. The fix is to propagate the cleared sbflags value to the master node when the lock is requested. QA's d_rwrandirectlarge test triggers this bug very quickly. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] do full recover_locks barrierDavid Teigland
Red Hat BZ 211914 The previous patch "[DLM] fix aborted recovery during node removal" was incomplete as discovered with further testing. It set the bit for the RS_LOCKS barrier but did not then wait for the barrier. This is often ok, but sometimes it will cause yet another recovery hang. If it's a new node that also has the lowest nodeid that skips the barrier wait, then it misses the important step of collecting and reporting the barrier status from the other nodes (which is the job of the low nodeid in the barrier wait routine). Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] fix stopping unstarted recoveryDavid Teigland
Red Hat BZ 211914 When many nodes are joining a lockspace simultaneously, the dlm gets a quick sequence of stop/start events, a pair for adding each node. dlm_controld in user space sends dlm_recoverd in the kernel each stop and start event. dlm_controld will sometimes send the stop before dlm_recoverd has had a chance to take up the previously queued start. The stop aborts the processing of the previous start by setting the RECOVERY_STOP flag. dlm_recoverd is erroneously clearing this flag and ignoring the stop/abort if it happens to take up the start after the stop meant to abort it. The fix is to check the sequence number that's incremented for each stop/start before clearing the flag. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] fix aborted recovery during node removalDavid Teigland
Red Hat BZ 211914 With the new cluster infrastructure, dlm recovery for a node removal can be aborted and restarted for a node addition. When this happens, the restarted recovery isn't aware that it's doing recovery for the earlier removal as well as the addition. So, it then skips the recovery steps only required when nodes are removed. This can result in locks not being purged for failed/removed nodes. The fix is to check for removed nodes for which recovery has not been completed at the start of a new recovery sequence. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] fix requestqueue raceDavid Teigland
Red Hat BZ 211914 There's a race between dlm_recoverd (1) enabling locking and (2) clearing out the requestqueue, and dlm_recvd (1) checking if locking is enabled and (2) adding a message to the requestqueue. An order of recoverd(1), recvd(1), recvd(2), recoverd(2) will result in a message being left on the requestqueue. The fix is to have dlm_recvd check if dlm_recoverd has enabled locking after taking the mutex for the requestqueue and if it has processing the message instead of queueing it. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] status messages ping-pong between unmounted nodesDavid Teigland
Red Hat BZ 213682 If two nodes leave the lockspace (while unmounting the fs in the case of gfs) after one has sent a STATUS message to the other, STATUS/STATUS_REPLY messages will then ping-pong between the nodes when neither of them can find the lockspace in question any longer. We kill this by not sending another STATUS message when we get a STATUS_REPLY for an unknown lockspace. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] res_recover_locks_count not reset when recover_locks is abortedDavid Teigland
Red Hat BZ 213684 If a node sends an lkb to the new master (RCOM_LOCK message) during recovery and recovery is then aborted on both nodes before it gets a reply, the res_recover_locks_count needs to be reset to 0 so that when the subsequent recovery comes along and sends the lkb to the new master again the assertion doesn't trigger that checks that counter is zero. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] Add support for tcp communicationsPatrick Caulfield
The following patch adds a TCP based communications layer to the DLM which is compile time selectable. The existing SCTP layer gives the advantage of allowing multihoming, whereas the TCP layer has been heavily tested in previous versions of the DLM and is known to be robust and therefore can be used as a baseline for performance testing. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-06[DLM] fix oops in kref_put when removing a lockspacePatrick Caulfield
Now that the lockspace struct is freed when the last sysfs object is released this patch prevents use of that lockspace by sysfs. We attempt to re-get the lockspace from the lockspace list and fail the request if it has been removed. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-06[DLM] Fix kref_put oopsPatrick Caulfield
This patch fixes the recounting on the lockspace kobject. Previously the lockspace was freed while userspace could have had a reference to one of its sysfs files, causing an oops in kref_put. Now the lockspace kfree is moved into the kobject release() function Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-20[DLM] fix iovec length in recvmsgPatrick Caulfield
I didn't spot that the msg_iovlen was set to 2 if there were two elements in the iovec but left at zero if not :( I think this might be why bob was still seeing trouble. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-12[DLM] fix iovec length in recvmsgPatrick Caulfield
The DLM always passes the iovec length as 1, this is wrong when the circular buffer wraps round. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-12[DLM] Kconfig: don't show an empty DLM menuAdrian Bunk
Don't show an empty "Distributed Lock Manager" menu if IP_SCTP=n. Reported by Dmytro Bagrii in kernel Bugzilla #7268. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-09[PATCH] dlm gfp_t annotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-28[GFS2] inode_diet: Replace inode.u.generic_ip with inode.i_private (gfs)Theodore Ts'o
The following patches reduce the size of the VFS inode structure by 28 bytes on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction in the inode size on a UP kernel that is configured in a production mode (i.e., with no spinlock or other debugging functions enabled; if you want to save memory taken up by in-core inodes, the first thing you should do is disable the debugging options; they are responsible for a huge amount of bloat in the VFS inode structure). This patch: The filesystem or device-specific pointer in the inode is inside a union, which is pretty pointless given that all 30+ users of this field have been using the void pointer. Get rid of the union and rename it to i_private, with a comment to explain who is allowed to use the void pointer. This is just a cleanup, but it allows us to reuse the union 'u' for something something where the union will actually be used. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org>