aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2008-11-20Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/ixgbe/ixgbe_main.c include/net/mac80211.h net/phonet/af_phonet.c
2008-11-20SUNRPC: Fix a performance regression in the RPC authentication codeTrond Myklebust
Fix a regression reported by Max Kellermann whereby kernel profiling showed that his clients were spending 45% of their time in rpcauth_lookup_credcache. It turns out that although his processes had identical uid/gid/groups, generic_match() was failing to detect this, because the task->group_info pointers were not shared. This again lead to the creation of a huge number of identical credentials at the RPC layer. The regression is fixed by comparing the contents of task->group_info if the actual pointers are not identical. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits) net: fix tiny output corruption of /proc/net/snmp6 atl2: don't request irq on resume if netif running ipv6: use seq_release_private for ip6mr.c /proc entries pkt_sched: fix missing check for packet overrun in qdisc_dump_stab() smc911x: Fix printf format typo in smc911x driver. asix: Fix asix-based cards connecting to 10/100Mbs LAN. mv643xx_eth: fix recycle check bound mv643xx_eth: fix the order of mdiobus_{unregister, free}() calls sh: sh_eth: Update to change of mii_bus TPROXY: supply a struct flowi->flags argument in inet_sk_rebuild_header() TPROXY: fill struct flowi->flags in udp_sendmsg() net: ipg.c fix bracing on endian swapping phylib: Fix auto-negotiation restart avoidance net: jme.c rxdesc.flags is __le16, other missing endian swaps phylib: fix phy name example in documentation net: Do not fire linkwatch events until the device is registered. phonet: fix compilation with gcc-3.4 ixgbe: fix compilation with gcc-3.4 pktgen: fix multiple queue warning net: fix ip_mr_init() error path ...
2008-11-20netdevice wanrouter: Convert directly reference of netdev->privWang Chen
1. Make device driver to allocate memory for netdev. 2. Convert all directly reference of netdev->priv to netdev_priv(). Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20net: fix tiny output corruption of /proc/net/snmp6Alexey Dobriyan
Because "name" is static, it can be occasionally be filled with somewhat garbage if two processes read /proc/net/snmp6. Also, remove useless casts and "-1" -- snprintf() correctly terminates it's output. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20ipv6: use seq_release_private for ip6mr.c /proc entriesBenjamin Thery
In ip6mr.c, /proc entries /proc/net/ip6_mr_cache and /proc/net/ip6_mr_vif are opened with seq_open_private(), thus seq_release_private() should be used to release them. Should fix a small memory leak. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20pkt_sched: remove unnecessary xchg() in packet classifiersPatrick McHardy
The use of xchg() hasn't been necessary since 2.2.something when proper locking was added to packet schedulers. In the case of classifiers they mostly weren't even necessary before that since they're mainly used to assign a NULL pointer to the filter root in the ->destroy path; the root is destroyed immediately after that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20pkt_sched: remove unnecessary xchg() in packet schedulersPatrick McHardy
The use of xchg() hasn't been necessary since 2.2.something when proper locking was added to packet schedulers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20pkt_sched: add DRR schedulerPatrick McHardy
Add classful DRR scheduler as a more flexible replacement for SFQ. The main difference to the algorithm described in "Efficient Fair Queueing using Deficit Round Robin" is that this implementation doesn't drop packets from the longest queue on overrun because its classful and limits are handled by each individual child qdisc. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20pkt_sched: fix missing check for packet overrun in qdisc_dump_stab()Patrick McHardy
nla_nest_start() might return NULL, causing a NULL pointer dereference. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2008-11-20net: ip_sockglue.c add static, annotate ports' endiannessHarvey Harrison
Fixes sparse warnings: net/ipv4/ip_sockglue.c:146:15: warning: incorrect type in assignment (different base types) net/ipv4/ip_sockglue.c:146:15: expected restricted __be16 [assigned] [usertype] sin_port net/ipv4/ip_sockglue.c:146:15: got unsigned short [unsigned] [short] [usertype] <noident> net/ipv4/ip_sockglue.c:130:6: warning: symbol 'ip_cmsg_recv_dstaddr' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20TPROXY: supply a struct flowi->flags argument in inet_sk_rebuild_header()Balazs Scheidler
inet_sk_rebuild_header() does a new route lookup if the dst_entry associated with a socket becomes stale. However inet_sk_rebuild_header() didn't use struct flowi->flags, causing the route lookup to fail for foreign-bound IP_TRANSPARENT sockets, causing an error state to be set for the sockets in question. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20TPROXY: fill struct flowi->flags in udp_sendmsg()Balazs Scheidler
udp_sendmsg() didn't fill struct flowi->flags, which means that the route lookup would fail for non-local IPs even if the IP_TRANSPARENT sockopt was set. This prevents sendto() to work properly for UDP sockets, whereas bind(foreign-ip) + connect() + send() worked fine. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20dccp: Fix bracing in dccp_feat_list_lookup.Gerrit Renker
From: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20filter: add SKF_AD_NLATTR_NEST to look for nested attributesPablo Neira Ayuso
SKF_AD_NLATTR allows us to find the first matching attribute in a stream of netlink attributes from one offset to the end of the netlink message. This is not suitable to look for a specific matching inside a set of nested attributes. For example, in ctnetlink messages, if we look for the CTA_V6_SRC attribute in a message that talks about an IPv4 connection, SKF_AD_NLATTR returns the offset of CTA_STATUS which has the same value of CTA_V6_SRC but outside the nest. To differenciate CTA_STATUS and CTA_V6_SRC, we would have to make assumptions on the size of the attribute and the usual offset, resulting in horrible BSF code. This patch adds SKF_AD_NLATTR_NEST, which is a variant of SKF_AD_NLATTR, that looks for an attribute inside the limits of a nested attributes, but not further. This patch validates that we have enough room to look for the nested attributes - based on a suggestion from Patrick McHardy. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20net: listening_hash get a spinlock per bucketEric Dumazet
This patch prepares RCU migration of listening_hash table for TCP/DCCP protocols. listening_hash table being small (32 slots per protocol), we add a spinlock for each slot, instead of a single rwlock for whole table. This should reduce hold time of readers, and writers concurrency. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19vlan: convert to net_device_opsStephen Hemminger
Convert vlan devices and function pointers to net_device_ops. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19ip: convert to net_device_ops for ioctlStephen Hemminger
Convert to net_device_ops function table pointer for ioctl. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19bridge: convert to net_device_opsStephen Hemminger
Convert to net_device_ops function table. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19netdev: expose ethernet address primitivesStephen Hemminger
When ethernet devices are converted, the function pointer setup by eth_setup() need to be done during intialization. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19netdev: introduce dev_get_stats()Stephen Hemminger
In order for the network device ops get_stats call to be immutable, the handling of the default internal network device stats block has to be changed. Add a new helper function which replaces the old use of internal_get_stats. Note: change return code to make it clear that the caller should not go changing the returned statistics. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19netdev: network device operations infrastructureStephen Hemminger
This patch changes the network device internal API to move adminstrative operations out of the network device structure and into a separate structure. This patch involves some hackery to maintain compatablity between the new and old model, so all 300+ drivers don't have to be changed at once. For drivers that aren't converted yet, the netdevice_ops virt function list still resides in the net_device structure. For old protocols, the new net_device_ops are copied out to the old net_device pointers. After the transistion is completed the nag message can be changed to an WARN_ON, and the compatiablity code can be made configurable. Some function pointers aren't moved: * destructor can't be in net_device_ops because it may need to be referenced after the module is unloaded. * neighbor setup is manipulated in a couple of places that need special consideration * hard_start_xmit is in the fast path for transmit. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19reintroduce accept4Ulrich Drepper
Introduce a new accept4() system call. The addition of this system call matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(), inotify_init1(), epoll_create1(), pipe2()) which added new system calls that differed from analogous traditional system calls in adding a flags argument that can be used to access additional functionality. The accept4() system call is exactly the same as accept(), except that it adds a flags bit-mask argument. Two flags are initially implemented. (Most of the new system calls in 2.6.27 also had both of these flags.) SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled for the new file descriptor returned by accept4(). This is a useful security feature to avoid leaking information in a multithreaded program where one thread is doing an accept() at the same time as another thread is doing a fork() plus exec(). More details here: http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling", Ulrich Drepper). The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag to be enabled on the new open file description created by accept4(). (This flag is merely a convenience, saving the use of additional calls fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result. Here's a test program. Works on x86-32. Should work on x86-64, but I (mtk) don't have a system to hand to test with. It tests accept4() with each of the four possible combinations of SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies that the appropriate flags are set on the file descriptor/open file description returned by accept4(). I tested Ulrich's patch in this thread by applying against 2.6.28-rc2, and it passes according to my test program. /* test_accept4.c Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk <mtk.manpages@gmail.com> Licensed under the GNU GPLv2 or later. */ #define _GNU_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <sys/socket.h> #include <netinet/in.h> #include <stdlib.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #define PORT_NUM 33333 #define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0) /**********************************************************************/ /* The following is what we need until glibc gets a wrapper for accept4() */ /* Flags for socket(), socketpair(), accept4() */ #ifndef SOCK_CLOEXEC #define SOCK_CLOEXEC O_CLOEXEC #endif #ifndef SOCK_NONBLOCK #define SOCK_NONBLOCK O_NONBLOCK #endif #ifdef __x86_64__ #define SYS_accept4 288 #elif __i386__ #define USE_SOCKETCALL 1 #define SYS_ACCEPT4 18 #else #error "Sorry -- don't know the syscall # on this architecture" #endif static int accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags) { printf("Calling accept4(): flags = %x", flags); if (flags != 0) { printf(" ("); if (flags & SOCK_CLOEXEC) printf("SOCK_CLOEXEC"); if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK)) printf(" "); if (flags & SOCK_NONBLOCK) printf("SOCK_NONBLOCK"); printf(")"); } printf("\n"); #if USE_SOCKETCALL long args[6]; args[0] = fd; args[1] = (long) sockaddr; args[2] = (long) addrlen; args[3] = flags; return syscall(SYS_socketcall, SYS_ACCEPT4, args); #else return syscall(SYS_accept4, fd, sockaddr, addrlen, flags); #endif } /**********************************************************************/ static int do_test(int lfd, struct sockaddr_in *conn_addr, int closeonexec_flag, int nonblock_flag) { int connfd, acceptfd; int fdf, flf, fdf_pass, flf_pass; struct sockaddr_in claddr; socklen_t addrlen; printf("=======================================\n"); connfd = socket(AF_INET, SOCK_STREAM, 0); if (connfd == -1) die("socket"); if (connect(connfd, (struct sockaddr *) conn_addr, sizeof(struct sockaddr_in)) == -1) die("connect"); addrlen = sizeof(struct sockaddr_in); acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen, closeonexec_flag | nonblock_flag); if (acceptfd == -1) { perror("accept4()"); close(connfd); return 0; } fdf = fcntl(acceptfd, F_GETFD); if (fdf == -1) die("fcntl:F_GETFD"); fdf_pass = ((fdf & FD_CLOEXEC) != 0) == ((closeonexec_flag & SOCK_CLOEXEC) != 0); printf("Close-on-exec flag is %sset (%s); ", (fdf & FD_CLOEXEC) ? "" : "not ", fdf_pass ? "OK" : "failed"); flf = fcntl(acceptfd, F_GETFL); if (flf == -1) die("fcntl:F_GETFD"); flf_pass = ((flf & O_NONBLOCK) != 0) == ((nonblock_flag & SOCK_NONBLOCK) !=0); printf("nonblock flag is %sset (%s)\n", (flf & O_NONBLOCK) ? "" : "not ", flf_pass ? "OK" : "failed"); close(acceptfd); close(connfd); printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL"); return fdf_pass && flf_pass; } static int create_listening_socket(int port_num) { struct sockaddr_in svaddr; int lfd; int optval; memset(&svaddr, 0, sizeof(struct sockaddr_in)); svaddr.sin_family = AF_INET; svaddr.sin_addr.s_addr = htonl(INADDR_ANY); svaddr.sin_port = htons(port_num); lfd = socket(AF_INET, SOCK_STREAM, 0); if (lfd == -1) die("socket"); optval = 1; if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval)) == -1) die("setsockopt"); if (bind(lfd, (struct sockaddr *) &svaddr, sizeof(struct sockaddr_in)) == -1) die("bind"); if (listen(lfd, 5) == -1) die("listen"); return lfd; } int main(int argc, char *argv[]) { struct sockaddr_in conn_addr; int lfd; int port_num; int passed; passed = 1; port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM; memset(&conn_addr, 0, sizeof(struct sockaddr_in)); conn_addr.sin_family = AF_INET; conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); conn_addr.sin_port = htons(port_num); lfd = create_listening_socket(port_num); if (!do_test(lfd, &conn_addr, 0, 0)) passed = 0; if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0)) passed = 0; if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK)) passed = 0; if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK)) passed = 0; close(lfd); exit(passed ? EXIT_SUCCESS : EXIT_FAILURE); } [mtk.manpages@gmail.com: rewrote changelog, updated test program] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19net: af_unix should use KERN_INFO instead of KERN_DEBUGEric Dumazet
As spotted by Joe Perches, we should use KERN_INFO in unix_sock_destructor() Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19include/net net/ - csum_partial - remove unnecessary castsJoe Perches
The first argument to csum_partial is const void * casts to char/u8 * are not necessary Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19net: inet_diag_handler structs can be constEric Dumazet
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19net: Do not fire linkwatch events until the device is registered.David S. Miller
Several device drivers try to do things like netif_carrier_off() before register_netdev() is invoked. This is bogus, but too many drivers do this to fix them all up in one go. Reported-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19net: make /proc/net/protocols namespace awareEric Dumazet
Converting /proc/net/protocols to be namespace aware is quite easy and permits us to use sock_prot_inuse_get(). This provides seperate counters for each protocol. For example we can really count TCPv6 sockets and TCPv4 sockets, while previously, we had the same value, and this value was not namespace aware. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19net: af_packet should update its inuse counterEric Dumazet
This patch is a preparation to namespace conversion of /proc/net/protocols In order to have relevant information for PACKET protocols, we should use sock_prot_inuse_add() to update a (percpu and pernamespace) counter of inuse sockets. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19phonet: fix compilation with gcc-3.4Alexey Dobriyan
CC [M] net/phonet/af_phonet.o net/phonet/af_phonet.c: In function `pn_socket_create': net/phonet/af_phonet.c:38: sorry, unimplemented: inlining failed in call to 'phonet_proto_put': function body not available net/phonet/af_phonet.c:99: sorry, unimplemented: called from here make[3]: *** [net/phonet/af_phonet.o] Error 1 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19pktgen: fix multiple queue warningRobert Olsson
As number of TX queues in unrelated to number of CPU's we remove this test and just make sure nxtq never gets exceeded. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19net: fix ip_mr_init() error pathBenjamin Thery
Similarly to IPv6 ip6_mr_init() (fixed last week), the order of cleanup operations in the error/exit section of ip_mr_init() is completely inversed. It should be the other way around. Also a del_timer() is missing in the error path. I should have guessed last week that this same error existed in ipmr.c too, as ip6mr.c is largely inspired by ipmr.c. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-18Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/isdn/i4l/isdn_net.c fs/cifs/connect.c
2008-11-18mac80211: remove ieee80211_notify_macJohannes Berg
Before ieee80211_notify_mac() was added, it was presented with the use case of using it to tell mac80211 that the association may have been lost because the firmware crashed/reset. Since then, it has also been used by iwlwifi to (slightly) speed up re-association after resume, a workaround around the fact that mac80211 has no suspend/resume handling yet. It is also not used by any other drivers, so clearly it cannot be necessary for "good enough" suspend/resume. Unfortunately, the callback suffers from a severe problem: It only works for station mode. If suspend/resume happens while in IBSS or any other mode (but station), then the callback is pointless. Recently, it has created a number of locking issues, first because it required rtnl locking rather than RCU due to calling sleeping functions within the critical section, and now because it's called by iwlwifi from the mac80211 workqueue that may not use the rtnl because it is flushed under rtnl. (cf. http://bugzilla.kernel.org/show_bug.cgi?id=12046) I think, therefore, that we should take a step back, remove it entirely for now and add the small feature it provided properly. For suspend and resume we will need to introduce new hooks, and for the case where the firmware was reset the driver will probably simply just pretend it has done a suspend/resume cycle to get mac80211 to reprogram the hardware completely, not just try to connect to the current AP again in station mode. When doing so, we will need to take into account locking issues and possibly defer to schedule_work from within mac80211 for the resume operation, while the suspend operation must be done directly. Proper suspend/resume should also not necessarily try to reconnect to the current AP, the time spent in suspend may have been short enough to not be disconnected from the AP, mac80211 will detect that the AP went out of range quickly if it did, and if the association is lost then the AP will disassoc as soon as a data frame is sent. We might also take into account WWOL then, and have mac80211 program the hardware into such a mode where it is available and requested. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-17net: sctp should update its inuse counterEric Dumazet
This patch is a preparation to namespace conversion of /proc/net/protocols In order to have relevant information for SCTP protocols, we should use sock_prot_inuse_add() to update a (percpu and pernamespace) counter of inuse sockets. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-17net: af_unix should update its inuse counterEric Dumazet
This patch is a preparation to namespace conversion of /proc/net/protocols In order to have relevant information for UNIX protocol, we should use sock_prot_inuse_add() to update a (percpu and pernamespace) counter of inuse sockets. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-17net: af_unix can make unix_nr_socks visbile in /procEric Dumazet
Currently, /proc/net/protocols displays socket counts only for TCP/TCPv6 protocols We can provide unix_nr_socks for free here, this counter being already maintained in af_unix Before patch : # grep UNIX /proc/net/protocols UNIX 428 -1 -1 NI 0 yes kernel After patch : # grep UNIX /proc/net/protocols UNIX 428 98 -1 NI 0 yes kernel Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16rtnetlink: propagate error from dev_change_flags in do_setlink()Johannes Berg
Unlike ifconfig, iproute doesn't report an error when setting an interface up fails: (example: put wireless network mac80211 interface into repeater mode with iwconfig but do not set a peer MAC address, it should fail with -ENOLINK) without patch: # ip link set wlan0 up ; echo $? 0 # with patch: # ip link set wlan0 up ; echo $? RTNETLINK answers: Link has been severed 2 # Propagate the return value from dev_change_flags() to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16ematch: simpler tcf_em_unregister()Alexey Dobriyan
Simply delete ops from list and let list debugging do the job. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16net: Cleanup of af_unixEric Dumazet
This is a pure cleanup of net/unix/af_unix.c to meet current code style standards Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16dccp: Tidy up setsockopt callsGerrit Renker
This splits the setsockopt calls into two groups, depending on whether an integer argument (val) is required and whether routines being called do their own locking. Some options (such as setting the CCID) use u8 rather than int, so that for these the test with regard to integer-sizeof can not be used. The second switch-case statement now only has those statements which need locking and which make use of `val'. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Eugene Teo <eugeneteo@kernel.sg> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16dccp: Deprecate Ack Ratio sysctlGerrit Renker
This patch deprecates the Ack Ratio sysctl, since * Ack Ratio is entirely ignored by CCID-3 and CCID-4, * Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1); * even if it would work in CCID-2, there is no point for a user to change it: - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2), - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts (since waiting for Acks which will never arrive in this window), - cwnd is not a user-configurable value. The only reasonable place for Ack Ratio is to print it for debugging. It is planned to do this later on, as part of e.g. dccp_probe. With this patch Ack Ratio is now under full control of feature negotiation: * Ack Ratio is resolved as a dependency of the selected CCID; * if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to the default of 2, following RFC 4340, 11.3 - "New connections start with Ack Ratio 2 for both endpoints"; * what happens then is part of another patch set, since it concerns the dynamic update of Ack Ratio while the connection is in full flight. Thanks to Tomasz Grobelny for discussion leading up to this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16dccp: Feature negotiation for minimum-checksum-coverageGerrit Renker
This provides feature negotiation for server minimum checksum coverage which so far has been missing. Since sender/receiver coverage values range only from 0...15, their type has also been reduced in size from u16 to u4. Feature-negotiation options are now generated for both sender and receiver coverage, i.e. when the peer has `forgotten' to enable partial coverage then feature negotiation will automatically enable (negotiate) the partial coverage value for this connection. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16dccp: Deprecate old setsockopt frameworkGerrit Renker
The previous setsockopt interface, which passed socket options via struct dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to ugly code since the old approach did not distinguish between NN and SP values. This patch removes the old setsockopt interface and replaces it with two new functions to register NN/SP values for feature negotiation. These are essentially wrappers around the internal __feat_register functions, with checking added to avoid * wrong usage (type); * changing values while the connection is in progress. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16dccp: Mechanism to resolve CCID dependenciesGerrit Renker
This adds a hook to resolve features whose value depends on the choice of CCID. It is done at the server since it can only be done after the CCID values have been negotiated; i.e. the client will add its CCID preference list on the Change options sent in the Request, which will be reconciled with the local preference list of the server. The concept is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\ implementation_notes.html#ccid_dependencies Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16net: use %pF for /proc/net/ptypeAlexey Dobriyan
Technically, patch changes format for modules, but I think nobody cares. -86dd :ipv6:ipv6_rcv+0x0 +86dd ipv6_rcv+0x0/0x400 [ipv6] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16Phonet: refuse to send bigger than MTU packetsRémi Denis-Courmont
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16net: Convert TCP & DCCP hash tables to use RCU / hlist_nullsEric Dumazet
RCU was added to UDP lookups, using a fast infrastructure : - sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the price of call_rcu() at freeing time. - hlist_nulls permits to use few memory barriers. This patch uses same infrastructure for TCP/DCCP established and timewait sockets. Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications using short lived TCP connections. A followup patch, converting rwlocks to spinlocks will even speedup this case. __inet_lookup_established() is pretty fast now we dont have to dirty a contended cache line (read_lock/read_unlock) Only established and timewait hashtable are converted to RCU (bind table and listen table are still using traditional locking) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16udp: Use hlist_nulls in UDP RCU codeEric Dumazet
This is a straightforward patch, using hlist_nulls infrastructure. RCUification already done on UDP two weeks ago. Using hlist_nulls permits us to avoid some memory barriers, both at lookup time and delete time. Patch is large because it adds new macros to include/net/sock.h. These macros will be used by TCP & DCCP in next patch. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>