aboutsummaryrefslogtreecommitdiff
path: root/fs/ocfs2/cluster/tcp_internal.h
AgeCommit message (Collapse)Author
2008-08-22[PATCH] ocfs2/cluster/tcp.c: make some functions staticAdrian Bunk
Commit 0f475b2abed6cbccee1da20a0bef2895eb2a0edd (ocfs2/net: Silence build warnings) made sense as far as it fixed compile warnings, but it was not required that it made the functions global. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-05-30[PATCH 3/3] ocfs2/net: Silence build warningsSunil Mushran
This patch silences the build warnings concerning o2net_init_nst() and friends when building without CONFIG_DEBUG_FS enabled. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2/net: Add debug interface to o2netSunil Mushran
This patch exposes o2net information via debugfs. The information includes the list of sockets (sock_containers) as well as the list of outstanding messages (send_tracking). Useful for o2dlm debugging. (This patch is derived from an earlier one written by Zach Brown that exposed the same information via /proc.) [Mark: checkpatch fixes] Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Reconnect after idle time out.Tao Ma
Currently, o2net connects to a node on hb_up and disconnects on hb_down and net timeout. It disconnects on net timeout is ok, but it should attempt to reconnect back. This is because sometimes nodes get overloaded enough that the network connection breaks but the disk hb does not. And if we get into that situation, we either fence (unnecessarily) or wait for its disk hb to die (and sometimes hang in the process). So in this updated scheme, when the network disconnects, we keep attempting to reconnect till we succeed or we get a disk hb down event. If the other node is really dead, then we will eventually get a node down event. If not, we should be able to connect again and continue. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-02-06ocfs2: Negotiate locking protocol versions.Joel Becker
Currently, when ocfs2 nodes connect via TCP, they advertise their compatibility level. If the versions do not match, two nodes cannot speak to each other and they disconnect. As a result, this provides no forward or backwards compatibility. This patch implements a simple protocol negotiation at the dlm level by introducing a major/minor version number scheme for entities that communicate. Specifically, o2dlm has a major/minor version for interaction with o2dlm on other nodes, and ocfs2 itself has a major/minor version for interacting with the filesystem on other nodes. This will allow rolling upgrades of ocfs2 clusters when changes to the locking or network protocols can be done in a backwards compatible manner. In those cases, only the minor number is changed and the negotatied protocol minor is returned from dlm join. In the far less likely event that a required protocol change makes backwards compatibility impossible, we simply bump the major number. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2: Remove data locksMark Fasheh
The meta lock now covers both meta data and data, so this just removes the now-redundant data lock. Combining locks saves us a round of lock mastery per inode and one less lock to ping between nodes during read/write. We don't lose much - since meta locks were always held before a data lock (and at the same level) ordered writeout mode (the default) ensured that flushing for the meta data lock also pushed out data anyways. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2: Remove mount/unmount votesMark Fasheh
The node maps that are set/unset by these votes are no longer relevant, thus we can remove the mount and umount votes. Since those are the last two remaining votes, we can also remove the entire vote infrastructure. The vote thread has been renamed to the downconvert thread, and the small amount of functionality related to managing it has been moved into fs/ocfs2/dlmglue.c. All references to votes have been removed or updated. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Remove delete inode voteTiger Yang
Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2: introduce sc->sc_send_lock to protect outbound outbound messagesZhen Wei
When there is a lot of multithreaded I/O usage, two threads can collide while sending out a message to the other nodes. This is due to the lack of locking between threads while sending out the messages. When a connected TCP send(), sendto(), or sendmsg() arrives in the Linux kernel, it eventually comes through tcp_sendmsg(). tcp_sendmsg() protects itself by acquiring a lock at invocation by calling lock_sock(). tcp_sendmsg() then loops over the buffers in the iovec, allocating associated sk_buff's and cache pages for use in the actual send. As it does so, it pushes the data out to tcp for actual transmission. However, if one of those allocation fails (because a large number of large sends is being processed, for example), it must wait for memory to become available. It does so by jumping to wait_for_sndbuf or wait_for_memory, both of which eventually cause a call to sk_stream_wait_memory(). sk_stream_wait_memory() contains a code path that calls sk_wait_event(). Finally, sk_wait_event() contains the call to release_sock(). The following patch adds a lock to the socket container in order to properly serialize outbound requests. From: Zhen Wei <zwei@novell.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2_dlm: disallow a domain join if node maps mismatchSrinivas Eeda
There is a small window where a joining node may not see the node(s) that just died but are still part of the domain. To fix this, we must disallow join requests if the joining node has a different node map. A new field node_map is added to dlm_query_join_request to send the current nodes nodemap along with join request. On the receiving end the nodes that are part of the cluster verifies if this new node sees all the nodes that are still part of the cluster. They disallow the join if the maps mismatch. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2: Added post handler callable function in o2net message handlerKurt Hackel
Currently o2net allows one handler function per message type. This patch adds the ability to call another function to be called after the handler has returned the message to the other node. Handlers are now given the option of returning a context (in the form of a void **) which will be passed back into the post message handler function. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2_dlm: fix cluster-wide refcounting of lock resourcesKurt Hackel
This was previously broken and migration of some locks had to be temporarily disabled. We use a new (and backward-incompatible) set of network messages to account for all references to a lock resources held across the cluster. once these are all freed, the master node may then free the lock resource memory once its local references are dropped. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-11[patch 3/3] OCFS2 Configurable timeouts - Protocol changesAndrew Beekhof
Modify the OCFS2 handshake to ensure essential timeouts are configured identically on all nodes. Only allow changes when there are no connected peers Improves the logic in o2net_advance_rx() which broke now that sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg) Included is the field for userspace-heartbeat timeout to avoid the need for further protocol changes. Uses a global spinlock to ensure the decisions to update configfs entries are made on the correct value. The region covered by the spinlock when incrementing the counter is much larger as this is the more critical case. Small cleanup contributed by Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Beekhof <abeekhof@suse.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07[patch 2/3] OCFS2 Configurable timeoutsJeff Mahoney
Allow configuration of OCFS2 timeouts from userspace via configfs Signed-off-by: Andrew Beekhof <abeekhof@suse.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-11-22WorkStruct: make allyesconfigDavid Howells
Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-09-24ocfs2: Remove i_generation from inode lock namesMark Fasheh
OCFS2 puts inode meta data in the "lock value block" provided by the DLM. Typically, i_generation is encoded in the lock name so that a deleted inode on and a new one in the same block don't share the same lvb. Unfortunately, that scheme means that the read in ocfs2_read_locked_inode() is potentially thrown away as soon as the meta data lock is taken - we cannot encode the lock name without first knowing i_generation, which requires a disk read. This patch encodes i_generation in the inode meta data lvb, and removes the value from the inode meta data lock name. This way, the read can be covered by a lock, and at the same time we can distinguish between an up to date and a stale LVB. This will help cold-cache stat(2) performance in particular. Since this patch changes the protocol version, we take the opportunity to do a minor re-organization of two of the LVB fields. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Hook rest of the file system into dentry locking APIMark Fasheh
Actually replace the vote calls with the new dentry operations. Make any necessary adjustments to get the scheme to work. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemZach Brown
Node messaging via tcp. Used by the dlm and the file system for point to point communication between nodes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>