aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2006-09-25Driver core: add ability for classes to handle devices properlyGreg Kroah-Hartman
This adds two new callbacks to the class structure: int (*dev_uevent)(struct device *dev, char **envp, int num_envp, char *buffer, int buffer_size); void (*dev_release)(struct device *dev); And one pointer: struct device_attribute * dev_attrs; which all corrispond with the same thing as the "normal" class devices do, yet this is for when a struct device is bound to a class. Someday soon, struct class_device will go away, and then the other fields in this structure can be removed too. But this is necessary in order to get the transition to work properly. Tested out on a network core patch that converted it to use struct device instead of struct class_device. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25Driver core: add groups support to struct deviceGreg Kroah-Hartman
This is needed for the network class devices in order to be able to convert over to use struct device. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25PM: platform_bus and late_suspend/early_resumeDavid Brownell
Teach platform_bus about the new suspend_late/resume_early PM calls, issued with IRQs off. Do we really need sysdev and friends any more, or can janitors start switching its users over to platform_device so we can do a minor code-ectomy? Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25PM: no suspend_prepare() phaseDavid Brownell
Remove the new suspend_prepare() phase. It doesn't seem very usable, has never been tested, doesn't address fault cleanup, and would need a sibling resume_complete(); plus there are no real use cases. It could be restored later if those issues get resolved. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25PM: define PM_EVENT_PRETHAWDavid Brownell
This adds a new pm_message_t event type to use when preparing to restore a swsusp snapshot. Devices that have been initialized by Linux after resume (rather than left in power-up-reset state) may need to be reset; this new event type give drivers the chance to do that. The drivers that will care about this are those which understand more hardware states than just "on" and "reset", relying on hardware state during resume() methods to be either the state left by the preceding suspend(), or a power-lost reset. The best current example of this class of drivers are USB host controller drivers, which currently do not work through swsusp when they're statically linked. When the swsusp freeze/thaw mechanism kicks in, a troublesome third state could exist: one state set up by a different kernel instance, before a snapshot image is resumed. This mechanism lets drivers prevent that state. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25Suspend changes for PCI coreLinus Torvalds
Changes the PCI core to use the new suspend infrastructure changes. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25Suspend infrastructure cleanup and extensionLinus Torvalds
Allow devices to participate in the suspend process more intimately, in particular, allow the final phase (with interrupts disabled) to also be open to normal devices, not just system devices. Also, allow classes to participate in device suspend. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25Driver core: add const to class_createMiguel Ojeda Sandonis
Adds const to class_create second parameter, because: struct class { const char * name; /*...*/ } Signed-off-by: Miguel Ojeda Sandonis <maxextreme@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25device_create(): make fmt argument 'const char *'Greg Kroah-Hartman
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25class_device_create(): make fmt argument 'const char *'Dmitry Torokhov
Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NetLabel]: update docs with website information [NetLabel]: rework the Netlink attribute handling (part 2) [NetLabel]: rework the Netlink attribute handling (part 1) [Netlink]: add nla_validate_nested() [NETLINK]: add nla_for_each_nested() to the interface list [NetLabel]: change the SELinux permissions [NetLabel]: make the CIPSOv4 cache spinlocks bottom half safe [NetLabel]: correct improper handling of non-NetLabel peer contexts [TCP]: make cubic the default [TCP]: default congestion control menu [ATM] he: Fix __init/__devinit conflict [NETFILTER]: Add dscp,DSCP headers to header-y [DCCP]: Introduce dccp_probe [DCCP]: Use constants for CCIDs [DCCP]: Introduce constants for CCID numbers [DCCP]: Allow default/fallback service code.
2006-09-25[libata] No need for all those arch libata-portmap.h headersJeff Garzik
They all contain the same thing. Instead, have a single generic one in include/asm-generic, and permit an arch to override as needed. Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-09-24Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6David S. Miller
2006-09-24[NETFILTER]: Add dscp,DSCP headers to header-yYasuyuki Kozakai
This patch adds xt_dscp.h and xt_DSCP.h to the kernel headers which are exported via 'make headers_install'. These are necessary for userspace to add rules using dscp match and DSCP target. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-24[PATCH] fix iptables __user misannotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-24Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (28 commits) ocfs2: Teach ocfs2_drop_lock() to use ->set_lvb() callback ocfs2: Remove ->unblock lockres operation ocfs2: move downconvert worker to lockres ops ocfs2: Remove unused dlmglue functions ocfs2: Have the metadata lock use generic dlmglue functions ocfs2: Add ->set_lvb callback in dlmglue ocfs2: Add ->check_downconvert callback in dlmglue ocfs2: Check for refreshing locks in generic unblock function ocfs2: don't unconditionally pass LVB flags ocfs2: combine inode and generic blocking AST functions ocfs2: Add ->get_osb() dlmglue locking operation ocfs2: remove ->unlock_ast() callback from ocfs2_lock_res_ops ocfs2: combine inode and generic AST functions ocfs2: Clean up lock resource refresh flags ocfs2: Remove i_generation from inode lock names ocfs2: Encode i_generation in the meta data lvb ocfs2: Free up some space in the lvb ocfs2: Remove special casing for inode creation in ocfs2_dentry_attach_lock() ocfs2: manually d_move() during ocfs2_rename() [PATCH] Allow file systems to manually d_move() inside of ->rename() ...
2006-09-24Remove dead netfilter_logging.h from include/linux/KbuildDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-09-24Merge branch 'master' of ↵David Woodhouse
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
2006-09-24[DCCP]: Introduce constants for CCID numbersIan McDonald
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-09-24[PATCH] Allow file systems to manually d_move() inside of ->rename()Mark Fasheh
Some file systems want to manually d_move() the dentries involved in a rename. We can do this by making use of the FS_ODD_RENAME flag if we just have nfs_rename() unconditionally do the d_move(). While there, we rename the flag to be more descriptive. OCFS2 uses this to protect that part of the rename operation with a cluster lock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-24[DCCP]: Allow default/fallback service code.Gerrit Renker
This has been discussed on dccp@vger and removes the necessity for applications to supply service codes in each and every case. If an application does not want to provide a service code, that's fine, it will be given 0. Otherwise, service codes can be set via socket options as before. This patch has been tested using various client/server configurations (including listening on multiple service codes). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-09-24Merge branch 'upstream-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (50 commits) [libata] Delete pata_it8172 driver [PATCH] libata: improve handling of diagostic fail (and hardware that misreports it) [PATCH] libata: fix non-uniform ports handling Fix libata resource conflict for legacy mode [libata] ata_piix: build fix [PATCH] pata_amd: Check enable bits on Nvidia [PATCH] Update SiS PATA [libata] Add pata_jmicron driver to Kconfig, Makefile [libata #pata-drivers] Trim trailing whitespace. [libata] Trim trailing whitespace. [libata] Add a bunch of PATA drivers. Rename libata-bmdma.c to libata-sff.c. libata: Grand renaming. Clean up drivers/ata/Kconfig a bit. [PATCH] CONFIG_PM=n slim: drivers/scsi/sata_sil* [PATCH] sata_via: Add SATA support for vt8237a [PATCH] libata: change path to libata in libata.tmpl [PATCH] libata: s/CONFIG_SCSI_SATA/CONFIG_[S]ATA/g in pci/quirks.c libata: Make sure drivers/ata is a separate Kconfig menu [libata] ata_piix: add missing kfree() ...
2006-09-24Merge branch 'upstream-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (217 commits) net/ieee80211: fix more crypto-related build breakage [PATCH] Spidernet: add ethtool -S (show statistics) [NET] GT96100: Delete bitrotting ethernet driver [PATCH] mv643xx_eth: restrict to 32-bit PPC_MULTIPLATFORM [PATCH] Cirrus Logic ep93xx ethernet driver r8169: the MMIO region of the 8167 stands behin BAR#1 e1000, ixgb: Remove pointless wrappers [PATCH] Remove powerpc specific parts of 3c509 driver [PATCH] s2io: Switch to pci_get_device [PATCH] gt96100: move to pci_get_device API [PATCH] ehea: bugfix for register access functions [PATCH] e1000 disable device on PCI error drivers/net/phy/fixed: #if 0 some incomplete code drivers/net: const-ify ethtool_ops declarations [PATCH] ethtool: allow const ethtool_ops [PATCH] sky2: big endian [PATCH] sky2: fiber support [PATCH] sky2: tx pause bug fix drivers/net: Trim trailing whitespace [PATCH] ehea: IBM eHEA Ethernet Device Driver ... Manually resolved conflicts in drivers/net/ixgb/ixgb_main.c and drivers/net/sky2.c related to CHECKSUM_HW/CHECKSUM_PARTIAL changes by commit 84fa7933a33f806bbbaae6775e87459b1ec584c0 that just happened to be next to unrelated changes in this update.
2006-09-24Move several *_SUPER_MAGIC symbols to include/linux/magic.h.Jeff Garzik
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-09-24Merge branch 'master' into upstreamJeff Garzik
2006-09-23Merge mulgrave-w:git/linux-2.6James Bottomley
Conflicts: include/linux/blkdev.h Trivial merge to incorporate tag prototypes.
2006-09-23Merge git://git.linux-nfs.org/pub/linux/nfs-2.6Linus Torvalds
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (74 commits) NFS: unmark NFS direct I/O as experimental NFS: add comments clarifying the use of nfs_post_op_update() NFSv4: rpc_mkpipe creating socket inodes w/out sk buffers NFS: Use SEEK_END instead of hardcoded value NFSv4: When mounting with a port=0 argument, substitute port=2049 NFSv4: Poll more aggressively when handling NFS4ERR_DELAY NFSv4: Handle the condition NFS4ERR_FILE_OPEN NFSv4: Retry lease recovery if it failed during a synchronous operation. NFS: Don't invalidate the symlink we just stuffed into the cache NFS: Make read() return an ESTALE if the file has been deleted NFSv4: It's perfectly legal for clp to be NULL here.... NFS: nfs_lookup - don't hash dentry when optimising away the lookup SUNRPC: Fix Oops in pmap_getport_done SUNRPC: Add refcounting to the struct rpc_xprt SUNRPC: Clean up soft task error handling SUNRPC: Handle ENETUNREACH, EHOSTUNREACH and EHOSTDOWN socket errors SUNRPC: rpc_delay() should not clobber the rpc_task->tk_status Fix a referral error Oops NFS: NFS_ROOT should use the new rpc_create API NFS: Fix up compiler warnings on 64-bit platforms in client.c ... Manually resolved conflict in net/sunrpc/xprtsock.c
2006-09-23Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (353 commits) [IPV6] ADDRCONF: Mobile IPv6 Home Address support. [IPV6] ADDRCONF: Allow non-DAD'able addresses. [IPV6] NDISC: Fix is_router flag setting. [IPV6] ADDRCONF: Convert addrconf_lock to RCU. [IPV6] NDISC: Add proxy_ndp sysctl. [IPV6] NDISC: Set per-entry is_router flag in Proxy NA. [IPV6] NDISC: Avoid updating neighbor cache for proxied address in receiving NA. [IPV6]: Don't forward packets to proxied link-local address. [IPV6] NDISC: Handle NDP messages to proxied addresses. [NETFILTER]: PPTP conntrack: fix another GRE keymap leak [NETFILTER]: PPTP conntrack: fix GRE keymap leak [NETFILTER]: PPTP conntrack: fix PPTP_IN_CALL message types [NETFILTER]: PPTP conntrack: check call ID before changing state [NETFILTER]: PPTP conntrack: clean up debugging cruft [NETFILTER]: PPTP conntrack: consolidate header parsing [NETFILTER]: PPTP conntrack: consolidate header size checks [NETFILTER]: PPTP conntrack: simplify expectation handling [NETFILTER]: PPTP conntrack: remove unnecessary cid/pcid header pointers [NETFILTER]: PPTP conntrack: fix header definitions [NETFILTER]: PPTP conntrack: remove more dead code ...
2006-09-23Merge mulgrave-w:git/scsi-misc-2.6James Bottomley
Conflicts: drivers/scsi/iscsi_tcp.c drivers/scsi/iscsi_tcp.h Pretty horrible merge between crypto hash consolidation and crypto_digest_...->crypto_hash_... conversion Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-23[KERNEL] Do not truncate to 'int' in ALIGN() macro.David Miller
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-22SUNRPC: Add refcounting to the struct rpc_xprtTrond Myklebust
In a subsequent patch, this will allow the portmapper to take a reference to the rpc_xprt for which it is updating the port number, fixing an Oops. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: Make rpc_mkpipe() take the parent dentry as an argumentTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22NFSv4: Fix a use-after-free issue with the nfs server.Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22Add a real API for dealing with blk_congestion_wait()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22NFS: Use cached page as buffer for NFS symlink requestsChuck Lever
Now that we have a copy of the symlink path in the page cache, we can pass a struct page down to the XDR routines instead of a string buffer. Test plan: Connectathon, all NFS versions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22NFS: Fix double d_drop in nfs_instantiate() error pathChuck Lever
If the LOOKUP or GETATTR in nfs_instantiate fail, nfs_instantiate will do a d_drop before returning. But some callers already do a d_drop in the case of an error return. Make certain we do only one d_drop in all error paths. This issue was introduced because over time, the symlink proc API diverged slightly from the create/mkdir/mknod proc API. To prevent other coding mistakes of this type, change the symlink proc API to be more like create/mkdir/mknod and move the nfs_instantiate call into the symlink proc routines so it is used in exactly the same way for create, mkdir, mknod, and symlink. Test plan: Connectathon, all versions of NFS. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: Eliminate xprt_create_proto and rpc_create_clientChuck Lever
The two function call API for creating a new RPC client is now obsolete. Remove it. Also, remove an unnecessary check to see whether the caller is capable of using privileged network services. The kernel RPC client always uses a privileged ephemeral port by default; callers are responsible for checking the authority of users to make use of any RPC service, or for specifying that a nonprivileged port is acceptable. Test plan: Repeated runs of Connectathon locking suite. Check network trace to ensure correctness of NLM requests and replies. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: use sockaddr + size when creating remote transport endpointsChuck Lever
Prepare for more generic transport endpoint handling needed by transports that might use different forms of addressing, such as IPv6. Introduce a single function call to replace the two-call xprt_create_proto/rpc_create_client API. Define a new rpc_create_args structure that allows callers to pass in remote endpoint addresses of varying length. Test-plan: Compile kernel with CONFIG_NFS enabled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: Clean-up after previous patches.Chuck Lever
Remove some unused macros related to accessing an RPC peer address Test plan: Compile kernel with CONFIG_NFS option enabled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: Use "sockaddr_storage" for storing RPC client's remote peer addressChuck Lever
IPv6 addresses are big (128 bytes). Now that no RPC client consumers treat the addr field in rpc_xprt structs as an opaque, and access it only via the API calls, we can safely widen the field in the rpc_xprt struct to accomodate larger addresses. Test plan: Compile kernel with CONFIG_NFS enabled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: Create API for displaying remote peer addressChuck Lever
Provide an API for formatting the remote peer address for printing without exposing its internal structure. The address could be dynamic, so we support a function call to get the address rather than reading it straight out of a structure. Test-plan: Destructive testing (unplugging the network temporarily). Probably need to rig a server where certain services aren't running, or that returns an error for some typical operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: add xprt switch API for printing formatted remote peer addressesChuck Lever
Add a new method to the transport switch API to provide a way to convert the opaque contents of xprt->addr to a human-readable string. Test plan: Compile kernel with CONFIG_NFS enabled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: remove extraneous header inclusionsChuck Lever
include/linux/sunrpc/clnt.h already includes include/linux/sunrpc/xprt.h. We can remove xprt.h from source files that already include clnt.h. Likewise include/linux/sunrpc/timer.h. Test plan: Compile kernel with CONFIG_NFS enabled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: create API for getting remote peer addressChuck Lever
Provide an API for retrieving the remote peer address without allowing direct access to the rpc_xprt struct. Test-plan: Compile kernel with CONFIG_NFS enabled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: Introduce transport switch callout for pluggable rpcbindChuck Lever
Introduce a clean transport switch API for plugging in different types of rpcbind mechanisms. For instance, rpcbind can cleanly replace the existing portmapper client, or a transport can choose to implement RPC binding any way it likes. Test plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked. Probably need to rig a server where certain services aren't running, or that returns an error for some typical operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: Support for RPC child tasks no longer neededChuck Lever
The previous patches removed the last user of RPC child tasks, so we can remove support for child tasks from net/sunrpc/sched.c now. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: Make RPC portmapper use per-transport storageChuck Lever
Move connection and bind state that was maintained in the rpc_clnt structure to the rpc_xprt structure. This will allow the creation of a clean API for plugging in different types of bind mechanisms. This brings improvements such as the elimination of a single spin lock to control serialization for all in-kernel RPC binding. A set of per-xprt bitops is used to serialize tasks during RPC binding, just like it now works for making RPC transport connections. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked. Probably need to rig a server where certain services aren't running, or that returns an error for some typical operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22SUNRPC: Create a helper to tell whether a transport is boundChuck Lever
Hide the contents and format of xprt->addr by eliminating direct uses of the xprt->addr.sin_port field. This change is required to support alternate RPC host address formats (eg IPv6). Test-plan: Destructive testing (unplugging the network temporarily). Repeated runs of Connectathon locking suite with UDP and TCP. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22NFS: Share NFS superblocks per-protocol per-server per-FSIDDavid Howells
The attached patch makes NFS share superblocks between mounts from the same server and FSID over the same protocol. It does this by creating each superblock with a false root and returning the real root dentry in the vfsmount presented by get_sb(). The root dentry set starts off as an anonymous dentry if we don't already have the dentry for its inode, otherwise it simply returns the dentry we already have. We may thus end up with several trees of dentries in the superblock, and if at some later point one of anonymous tree roots is discovered by normal filesystem activity to be located in another tree within the superblock, the anonymous root is named and materialises attached to the second tree at the appropriate point. Why do it this way? Why not pass an extra argument to the mount() syscall to indicate the subpath and then pathwalk from the server root to the desired directory? You can't guarantee this will work for two reasons: (1) The root and intervening nodes may not be accessible to the client. With NFS2 and NFS3, for instance, mountd is called on the server to get the filehandle for the tip of a path. mountd won't give us handles for anything we don't have permission to access, and so we can't set up NFS inodes for such nodes, and so can't easily set up dentries (we'd have to have ghost inodes or something). With this patch we don't actually create dentries until we get handles from the server that we can use to set up their inodes, and we don't actually bind them into the tree until we know for sure where they go. (2) Inaccessible symbolic links. If we're asked to mount two exports from the server, eg: mount warthog:/warthog/aaa/xxx /mmm mount warthog:/warthog/bbb/yyy /nnn We may not be able to access anything nearer the root than xxx and yyy, but we may find out later that /mmm/www/yyy, say, is actually the same directory as the one mounted on /nnn. What we might then find out, for example, is that /warthog/bbb was actually a symbolic link to /warthog/aaa/xxx/www, but we can't actually determine that by talking to the server until /warthog is made available by NFS. This would lead to having constructed an errneous dentry tree which we can't easily fix. We can end up with a dentry marked as a directory when it should actually be a symlink, or we could end up with an apparently hardlinked directory. With this patch we need not make assumptions about the type of a dentry for which we can't retrieve information, nor need we assume we know its place in the grand scheme of things until we actually see that place. This patch reduces the possibility of aliasing in the inode and page caches for inodes that may be accessed by more than one NFS export. It also reduces the number of superblocks required for NFS where there are many NFS exports being used from a server (home directory server + autofs for example). This in turn makes it simpler to do local caching of network filesystems, as it can then be guaranteed that there won't be links from multiple inodes in separate superblocks to the same cache file. Obviously, cache aliasing between different levels of NFS protocol could still be a problem, but at least that gives us another key to use when indexing the cache. This patch makes the following changes: (1) The server record construction/destruction has been abstracted out into its own set of functions to make things easier to get right. These have been moved into fs/nfs/client.c. All the code in fs/nfs/client.c has to do with the management of connections to servers, and doesn't touch superblocks in any way; the remaining code in fs/nfs/super.c has to do with VFS superblock management. (2) The sequence of events undertaken by NFS mount is now reordered: (a) A volume representation (struct nfs_server) is allocated. (b) A server representation (struct nfs_client) is acquired. This may be allocated or shared, and is keyed on server address, port and NFS version. (c) If allocated, the client representation is initialised. The state member variable of nfs_client is used to prevent a race during initialisation from two mounts. (d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find the root filehandle for the mount (fs/nfs/getroot.c). For NFS2/3 we are given the root FH in advance. (e) The volume FSID is probed for on the root FH. (f) The volume representation is initialised from the FSINFO record retrieved on the root FH. (g) sget() is called to acquire a superblock. This may be allocated or shared, keyed on client pointer and FSID. (h) If allocated, the superblock is initialised. (i) If the superblock is shared, then the new nfs_server record is discarded. (j) The root dentry for this mount is looked up from the root FH. (k) The root dentry for this mount is assigned to the vfsmount. (3) nfs_readdir_lookup() creates dentries for each of the entries readdir() returns; this function now attaches disconnected trees from alternate roots that happen to be discovered attached to a directory being read (in the same way nfs_lookup() is made to do for lookup ops). The new d_materialise_unique() function is now used to do this, thus permitting the whole thing to be done under one set of locks, and thus avoiding any race between mount and lookup operations on the same directory. (4) The client management code uses a new debug facility: NFSDBG_CLIENT which is set by echoing 1024 to /proc/net/sunrpc/nfs_debug. (5) Clone mounts are now called xdev mounts. (6) Use the dentry passed to the statfs() op as the handle for retrieving fs statistics rather than the root dentry of the superblock (which is now a dummy). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22NFS: Eliminate client_sys in favour of cl_rpcclientDavid Howells
Eliminate nfs_server::client_sys in favour of nfs_client::cl_rpcclient as we only really need one per server that we're talking to since it doesn't have any security on it. The retransmission management variables are also moved to the common struct as they're required to set up the cl_rpcclient connection. The NFS2/3 client and client_acl connections are thenceforth derived by cloning the cl_rpcclient connection and post-applying the authorisation flavour. The code for setting up the initial common connection has been moved to client.c as nfs_create_rpc_client(). All the NFS program definition tables are also moved there as that's where they're now required rather than super.c. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>