aboutsummaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)Author
2006-04-11[PATCH] for_each_possible_cpu: network codesKAMEZAWA Hiroyuki
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu under /net Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09[NET]: Fix hotplug race during device registration.Sergey Vlasov
From: Thomas de Grenier de Latour <degrenier@easyconnect.fr> On Sun, 9 Apr 2006 21:56:59 +0400, Sergey Vlasov <vsu@altlinux.ru> wrote: > However, show_address() does not output anything unless > dev->reg_state == NETREG_REGISTERED - and this state is set by > netdev_run_todo() only after netdev_register_sysfs() returns, so in > the meantime (while netdev_register_sysfs() is busy adding the > "statistics" attribute group) some process may see an empty "address" > attribute. I've tried the attached patch, suggested by Sergey Vlasov on hotplug-devel@, and as far as i can test it works just fine. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09[NET]: More kzalloc conversions.Andrew Morton
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09[NET] kzalloc: use in alloc_netdevPaolo 'Blaisorblade' Giarrusso
Noticed this use, fixed it. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09[NET]: Fix an off-by-21-or-49 error.Adrian Bunk
This patch fixes an off-by-21-or-49 error ;-) spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-31[NET]: add SO_RCVBUF commentAndrew Morton
Put a comment in there explaining why we double the setsockopt() caller's SO_RCVBUF. People keep wondering. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-29[NET]: Deinline some larger functions from netdevice.hDenis Vlasenko
On a allyesconfig'ured kernel: Size Uses Wasted Name and definition ===== ==== ====== ================================================ 95 162 12075 netif_wake_queue include/linux/netdevice.h 129 86 9265 dev_kfree_skb_any include/linux/netdevice.h 127 56 5885 netif_device_attach include/linux/netdevice.h 73 86 4505 dev_kfree_skb_irq include/linux/netdevice.h 46 60 1534 netif_device_detach include/linux/netdevice.h 119 16 1485 __netif_rx_schedule include/linux/netdevice.h 143 5 492 netif_rx_schedule include/linux/netdevice.h 81 7 366 netif_schedule include/linux/netdevice.h netif_wake_queue is big because __netif_schedule is a big inline: static inline void __netif_schedule(struct net_device *dev) { if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) { unsigned long flags; struct softnet_data *sd; local_irq_save(flags); sd = &__get_cpu_var(softnet_data); dev->next_sched = sd->output_queue; sd->output_queue = dev; raise_softirq_irqoff(NET_TX_SOFTIRQ); local_irq_restore(flags); } } static inline void netif_wake_queue(struct net_device *dev) { #ifdef CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state)) __netif_schedule(dev); } By de-inlining __netif_schedule we are saving a lot of text at each callsite of netif_wake_queue and netif_schedule. __netif_rx_schedule is also big, and it makes more sense to keep both of them out of line. Patch also deinlines dev_kfree_skb_any. We can deinline dev_kfree_skb_irq instead... oh well. netif_device_attach/detach are not hot paths, we can deinline them too. Signed-off-by: Denis Vlasenko <vda@ilport.com.ua> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-28[NET]: deinline 200+ byte inlines in sock.hDenis Vlasenko
Sizes in bytes (allyesconfig, i386) and files where those inlines are used: 238 sock_queue_rcv_skb 2.6.16/net/x25/x25_in.o 238 sock_queue_rcv_skb 2.6.16/net/rose/rose_in.o 238 sock_queue_rcv_skb 2.6.16/net/packet/af_packet.o 238 sock_queue_rcv_skb 2.6.16/net/netrom/nr_in.o 238 sock_queue_rcv_skb 2.6.16/net/llc/llc_sap.o 238 sock_queue_rcv_skb 2.6.16/net/llc/llc_conn.o 238 sock_queue_rcv_skb 2.6.16/net/irda/af_irda.o 238 sock_queue_rcv_skb 2.6.16/net/ipx/af_ipx.o 238 sock_queue_rcv_skb 2.6.16/net/ipv6/udp.o 238 sock_queue_rcv_skb 2.6.16/net/ipv6/raw.o 238 sock_queue_rcv_skb 2.6.16/net/ipv4/udp.o 238 sock_queue_rcv_skb 2.6.16/net/ipv4/raw.o 238 sock_queue_rcv_skb 2.6.16/net/ipv4/ipmr.o 238 sock_queue_rcv_skb 2.6.16/net/econet/econet.o 238 sock_queue_rcv_skb 2.6.16/net/econet/af_econet.o 238 sock_queue_rcv_skb 2.6.16/net/bluetooth/sco.o 238 sock_queue_rcv_skb 2.6.16/net/bluetooth/l2cap.o 238 sock_queue_rcv_skb 2.6.16/net/bluetooth/hci_sock.o 238 sock_queue_rcv_skb 2.6.16/net/ax25/ax25_in.o 238 sock_queue_rcv_skb 2.6.16/net/ax25/af_ax25.o 238 sock_queue_rcv_skb 2.6.16/net/appletalk/ddp.o 238 sock_queue_rcv_skb 2.6.16/drivers/net/pppoe.o 276 sk_receive_skb 2.6.16/net/decnet/dn_nsp_in.o 276 sk_receive_skb 2.6.16/net/dccp/ipv6.o 276 sk_receive_skb 2.6.16/net/dccp/ipv4.o 276 sk_receive_skb 2.6.16/net/dccp/dccp_ipv6.o 276 sk_receive_skb 2.6.16/drivers/net/pppoe.o 209 sk_dst_check 2.6.16/net/ipv6/ip6_output.o 209 sk_dst_check 2.6.16/net/ipv4/udp.o 209 sk_dst_check 2.6.16/net/decnet/dn_nsp_out.o Large inlines with multiple callers: Size Uses Wasted Name and definition ===== ==== ====== ================================================ 238 21 4360 sock_queue_rcv_skb include/net/sock.h 109 10 801 sock_recv_timestamp include/net/sock.h 276 4 768 sk_receive_skb include/net/sock.h 94 8 518 __sk_dst_check include/net/sock.h 209 3 378 sk_dst_check include/net/sock.h 131 4 333 sk_setup_caps include/net/sock.h 152 2 132 sk_stream_alloc_pskb include/net/sock.h 125 2 105 sk_stream_writequeue_purge include/net/sock.h Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-27Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NET]: drop duplicate assignment in request_sock [IPSEC]: Fix tunnel error handling in ipcomp6
2006-03-27[PATCH] Notifier chain update: API changesAlan Stern
The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[NET]: drop duplicate assignment in request_sockNorbert Kiesel
Just noticed that request_sock.[ch] contain a useless assignment of rskq_accept_head to itself. I assume this is a typo and the 2nd one was supposed to be _tail. However, setting _tail to NULL is not needed, so the patch below just drops the 2nd assignment. Signed-off-By: Norbert Kiesel <nkiesel@tbdnetworks.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-25Merge branch 'audit.b3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits) [PATCH] fix audit_init failure path [PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format [PATCH] sem2mutex: audit_netlink_sem [PATCH] simplify audit_free() locking [PATCH] Fix audit operators [PATCH] promiscuous mode [PATCH] Add tty to syscall audit records [PATCH] add/remove rule update [PATCH] audit string fields interface + consumer [PATCH] SE Linux audit events [PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c [PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL [PATCH] Fix IA64 success/failure indication in syscall auditing. [PATCH] Miscellaneous bug and warning fixes [PATCH] Capture selinux subject/object context information. [PATCH] Exclude messages by message type [PATCH] Collect more inode information during syscall processing. [PATCH] Pass dentry, not just name, in fsnotify creation hooks. [PATCH] Define new range of userspace messages. [PATCH] Filter rule comparators ... Fixed trivial conflict in security/selinux/hooks.c
2006-03-25Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NETFILTER] x_table.c: sem2mutex [IPV4]: Aggregate route entries with different TOS values [TCP]: Mark tcp_*mem[] __read_mostly. [TCP]: Set default max buffers from memory pool size [SCTP]: Fix up sctp_rcv return value [NET]: Take RTNL when unregistering notifier [WIRELESS]: Fix config dependencies. [NET]: Fill in a 32-bit hole in struct sock on 64-bit platforms. [NET]: Ensure device name passed to SO_BINDTODEVICE is NULL terminated. [MODULES]: Don't allow statically declared exports [BRIDGE]: Unaligned accesses in the ethernet bridge
2006-03-25[PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notificationsDavide Libenzi
Implement the half-closed devices notifiation, by adding a new POLLRDHUP (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the existing POLLHUP handling, that does not report correctly half-closed devices, was feared to be changed, this implementation leaves the current POLLHUP reporting unchanged and simply add a new bit that is set in the few places where it makes sense. The same thing was discussed and conceptually agreed quite some time ago: http://lkml.org/lkml/2003/7/12/116 Since this new event bit is added to the existing Linux poll infrastruture, even the existing poll/select system calls will be able to use it. As far as the existing POLLHUP handling, the patch leaves it as is. The pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing archs and sets the bit in the six relevant files. The other attached diff is the simple change required to sys/epoll.h to add the EPOLLRDHUP definition. There is "a stupid program" to test POLLRDHUP delivery here: http://www.xmailserver.org/pollrdhup-test.c It tests poll(2), but since the delivery is same epoll(2) will work equally. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] slab: implement /proc/slab_allocatorsAl Viro
Implement /proc/slab_allocators. It produces output like: idr_layer_cache: 80 idr_pre_get+0x33/0x4e buffer_head: 2555 alloc_buffer_head+0x20/0x75 mm_struct: 9 mm_alloc+0x1e/0x42 mm_struct: 20 dup_mm+0x36/0x370 vm_area_struct: 384 dup_mm+0x18f/0x370 vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3 vm_area_struct: 1 split_vma+0x5a/0x10e vm_area_struct: 11 do_brk+0x206/0x2e2 vm_area_struct: 2 copy_vma+0xda/0x142 vm_area_struct: 9 setup_arg_pages+0x99/0x214 fs_cache: 8 copy_fs_struct+0x21/0x133 fs_cache: 29 copy_process+0xf38/0x10e3 files_cache: 30 alloc_files+0x1b/0xcf signal_cache: 81 copy_process+0xbaa/0x10e3 sighand_cache: 77 copy_process+0xe65/0x10e3 sighand_cache: 1 de_thread+0x4d/0x5f8 anon_vma: 241 anon_vma_prepare+0xd9/0xf3 size-2048: 1 add_sect_attrs+0x5f/0x145 size-2048: 2 journal_init_revoke+0x99/0x302 size-2048: 2 journal_init_revoke+0x137/0x302 size-2048: 2 journal_init_inode+0xf9/0x1c4 Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexander Nyberg <alexn@telia.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> DESC slab-leaks3-locking-fix EDESC From: Andrew Morton <akpm@osdl.org> Update for slab-remove-cachep-spinlock.patch Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexander Nyberg <alexn@telia.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[NET]: Take RTNL when unregistering notifierHerbert Xu
The netdev notifier call chain is currently unregistered without taking any locks outside the notifier system. Because the notifier system itself does not synchronise unregistration with respect to the calling of the chain, we as its user need to do our own locking. We are supposed to take the RTNL for all calls to netdev notifiers, so taking the RTNL should be sufficient to protect it. The registration path in dev.c already takes the RTNL so it's OK. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-24[NET]: Ensure device name passed to SO_BINDTODEVICE is NULL terminated.David S. Miller
Found by Solar Designer. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-23Merge branch 'upstream-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (45 commits) [PATCH] Restore channel setting after scan. [PATCH] hostap: Fix memory leak on PCI probe error path [PATCH] hostap: Remove dead code (duplicated idx != 0) [PATCH] hostap: Fix unlikely read overrun in CIS parsing [PATCH] hostap: Fix double free in prism2_config() error path [PATCH] hostap: Fix ap_add_sta() return value verification [PATCH] hostap: Fix hw reset after CMDCODE_ACCESS_WRITE timeout [PATCH] wireless/airo: cache wireless scans [PATCH] wireless/airo: define default MTU [PATCH] wireless/airo: clean up printk usage to print device name [PATCH] WE-20 for kernel 2.6.16 [PATCH] softmac: remove function_enter() [PATCH] skge: version 1.5 [PATCH] skge: compute available ring buffers [PATCH] skge: dont free skb until multi-part transmit complete [PATCH] skge: multicast statistics fix [PATCH] skge: rx_reuse called twice [PATCH] skge: dont use dev_alloc_skb for rx buffs [PATCH] skge: align receive buffers [PATCH] sky2: dont need to use dev_kfree_skb_any ...
2006-03-23[PATCH] WE-20 for kernel 2.6.16Jean Tourrilhes
This is version 20 of the Wireless Extensions. This is the completion of the RtNetlink work I started early 2004, it enables the full Wireless Extension API over RtNetlink. Few comments on the patch : o totally driver transparent, no change in drivers needed. o iwevent were already RtNetlink based since they were created (around 2.5.7). This adds all the regular SET and GET requests over RtNetlink, using the exact same mechanism and data format as iwevents. o This is a Kconfig option, as currently most people have no need for it. Surprisingly, patch is actually small and well encapsulated. o Tested on SMP, attention as been paid to make it 64 bits clean. o Code do probably too many checks and could be further optimised, but better safe than sorry. o RtNetlink based version of the Wireless Tools available on my web page for people inclined to try out this stuff. I would also like to thank Alexey Kuznetsov for his helpful suggestions to make this patch better. Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-23[PKTGEN]: Add MPLS extension.Steven Whitehouse
Signed-off-by: Steven Whitehouse <steve@chygwyn.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: Identation & other cleanups related to compat_[gs]etsockopt csetArnaldo Carvalho de Melo
No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs to just after the function exported, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[SK_BUFF]: export skb_pull_rcsumArnaldo Carvalho de Melo
*** Warning: "skb_pull_rcsum" [net/bridge/bridge.ko] undefined! *** Warning: "skb_pull_rcsum" [net/8021q/8021q.ko] undefined! *** Warning: "skb_pull_rcsum" [drivers/net/pppoe.ko] undefined! *** Warning: "skb_pull_rcsum" [drivers/net/ppp_generic.ko] undefined! Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: {get|set}sockopt compatibility layerDmitry Mishin
This patch extends {get|set}sockopt compatibility layer in order to move protocol specific parts to their place and avoid huge universal net/compat.c file in the future. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: Replace skb_pull/skb_postpull_rcsum with skb_pull_rcsumHerbert Xu
We're now starting to have quite a number of places that do skb_pull followed immediately by an skb_postpull_rcsum. We can merge these two operations into one function with skb_pull_rcsum. This makes sense since most pull operations on receive skb's need to update the checksum. I've decided to make this out-of-line since it is fairly big and the fast path where hardware checksums are enabled need to call csum_partial anyway. Since this is a brand new function we get to add an extra check on the len argument. As it is most callers of skb_pull ignore its return value which essentially means that there is no check on the len argument. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[SECURITY]: TCP/UDP getpeersecCatherine Zhang
This patch implements an application of the LSM-IPSec networking controls whereby an application can determine the label of the security association its TCP or UDP sockets are currently connected to via getsockopt and the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of an IPSec security association a particular TCP or UDP socket is using. The application can then use this security context to determine the security context for processing on behalf of the peer at the other end of this connection. In the case of UDP, the security context is for each individual packet. An example application is the inetd daemon, which could be modified to start daemons running at security contexts dependent on the remote client. Patch design approach: - Design for TCP The patch enables the SELinux LSM to set the peer security context for a socket based on the security context of the IPSec security association. The application may retrieve this context using getsockopt. When called, the kernel determines if the socket is a connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry cache on the socket to retrieve the security associations. If a security association has a security context, the context string is returned, as for UNIX domain sockets. - Design for UDP Unlike TCP, UDP is connectionless. This requires a somewhat different API to retrieve the peer security context. With TCP, the peer security context stays the same throughout the connection, thus it can be retrieved at any time between when the connection is established and when it is torn down. With UDP, each read/write can have different peer and thus the security context might change every time. As a result the security context retrieval must be done TOGETHER with the packet retrieval. The solution is to build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). Patch implementation details: - Implementation for TCP The security context can be retrieved by applications using getsockopt with the existing SO_PEERSEC flag. As an example (ignoring error checking): getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen); printf("Socket peer context is: %s\n", optbuf); The SELinux function, selinux_socket_getpeersec, is extended to check for labeled security associations for connected (TCP_ESTABLISHED == sk->sk_state) TCP sockets only. If so, the socket has a dst_cache of struct dst_entry values that may refer to security associations. If these have security associations with security contexts, the security context is returned. getsockopt returns a buffer that contains a security context string or the buffer is unmodified. - Implementation for UDP To retrieve the security context, the application first indicates to the kernel such desire by setting the IP_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for UDP should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_IP && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow a server socket to receive security context of the peer. A new ancillary message type SCM_SECURITY. When the packet is received we get the security context from the sec_path pointer which is contained in the sk_buff, and copy it to the ancillary message space. An additional LSM hook, selinux_socket_getpeersec_udp, is defined to retrieve the security context from the SELinux space. The existing function, selinux_socket_getpeersec does not suit our purpose, because the security context is copied directly to user space, rather than to kernel space. Testing: We have tested the patch by setting up TCP and UDP connections between applications on two machines using the IPSec policies that result in labeled security associations being built. For TCP, we can then extract the peer security context using getsockopt on either end. For UDP, the receiving end can retrieve the security context using the auxiliary data mechanism of recvmsg. Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET] sem2mutex: net/Arjan van de Ven
Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: minor net_rx_action optimizationStephen Hemminger
The functions list_del followed by list_add_tail is equivalent to the existing inline list_move_tail. list_move_tail avoids unnecessary _LIST_POISON. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: Move destructor from neigh->ops to neigh_paramsMichael S. Tsirkin
struct neigh_ops currently has a destructor field, which no in-kernel drivers outside of infiniband use. The infiniband/ulp/ipoib in-tree driver stashes some info in the neighbour structure (the results of the second-stage lookup from ARP results to real link-level path), and it uses neigh->ops->destructor to get a callback so it can clean up this extra info when a neighbour is freed. We've run into problems with this: since the destructor is in an ops field that is shared between neighbours that may belong to different net devices, there's no way to set/clear it safely. The following patch moves this field to neigh_parms where it can be safely set, together with its twin neigh_setup. Two additional patches in the patch series update ipoib to use this new interface. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKTGEN]: Updates version.Luiz Capitulino
Due to the thread's lock changes, we're at a new version now. Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKTGEN]: Removes thread_{un,}lock() macros.Luiz Capitulino
As suggested by Arnaldo, this patch replaces the thread_lock()/thread_unlock() by directly calls to mutex_lock()/mutex_unlock(). This change makes the code a bit more readable, and the direct calls are used everywhere in the kernel. Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKTGEN]: Convert thread lock to mutexes.Luiz Capitulino
pktgen's thread semaphores are strict mutexes, convert them to the mutex implementation. Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: Convert RTNL to mutex.Stephen Hemminger
This patch turns the RTNL from a semaphore to a new 2.6.16 mutex and gets rid of some of the leftover legacy. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKTGEN]: Updates version.Luiz Capitulino
With all the previous changes, we're at a new version now. Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKTGEN]: Ports if_list to the in-kernel implementation.Luiz Capitulino
This patch ports the per-thread interface list list to the in-kernel linked list implementation. In the general, the resulting code is a bit simpler. Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKTGEN]: Fix Initialization fail leak.Luiz Capitulino
Even if pktgen's thread initialization fails for all CPUs, the module will be successfully loaded. This patch changes that behaivor, by returning an error on module load time, and also freeing all the resources allocated. It also prints a warning if a thread initialization has failed. Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKTGEN]: Fix kernel_thread() fail leak.Luiz Capitulino
Free all the alocated resources if kernel_thread() call fails. Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKTGEN]: Ports thread list to Kernel list implementation.Luiz Capitulino
The final result is a simpler and smaller code. Note that I'm adding a new member in the struct pktgen_thread called 'removed'. The reason is that I didn't find a better wait condition to be used in the place of the replaced one. Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKTGEN]: Lindent run.Luiz Capitulino
Lindet run, with some fixes made by hand. Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: Uninline kfree_skb and allow NULL argumentJörn Engel
o Uninline kfree_skb, which saves some 15k of object code on my notebook. o Allow kfree_skb to be called with a NULL argument. Subsequent patches can remove conditional from drivers and further reduce source and object size. Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET] pktgen: Fix races between control/worker threads.Arthur Kepner
There's a race in pktgen which can lead to a double free of a pktgen_dev's skb. If a worker thread is in the midst of doing fill_packet(), and the controlling thread gets a "stop" message, the already freed skb can be freed once again in pktgen_stop_device(). This patch gives all responsibility for cleaning up a pktgen_dev's skb to the associated worker thread. Signed-off-by: Arthur Kepner <akepner@sgi.com> Acked-by: Robert Olsson <Robert.Olsson@data.slu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[IPSEC]: Sync series - core changesJamal Hadi Salim
This patch provides the core functionality needed for sync events for ipsec. Derived work of Krisztian KOVACS <hidden@balabit.hu> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET] core: add RFC2863 operstateStefan Rompf
this patch adds a dormant flag to network devices, RFC2863 operstate derived from these flags and possibility for userspace interaction. It allows drivers to signal that a device is unusable for user traffic without disabling queueing (and therefore the possibility for protocol establishment traffic to flow) and a userspace supplicant (WPA, 802.1X) to mark a device unusable without changes to the driver. It is the result of our long discussion. However I must admit that it represents what Jamal and I agreed on with compromises towards Krzysztof, but Thomas and Krzysztof still disagree with some parts. Anyway I think it should be applied. Signed-off-by: Stefan Rompf <stefan@loplof.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: NEIGHBOUR: Ensure to record time to neigh->updated when neighbour's ↵YOSHIFUJI Hideaki
state changed. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PATCH] promiscuous modeSteve Grubb
Hi, When a network interface goes into promiscuous mode, its an important security issue. The attached patch is intended to capture that action and send an event to the audit system. The patch carves out a new block of numbers for kernel detected anomalies. These are events that may indicate suspicious activity. Other examples of potential kernel anomalies would be: exceeding disk quota, rlimit violations, changes to syscall entry table. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-03[PATCH] bonding: suppress duplicate packetsJay Vosburgh
Originally submitted by Kenzo Iwami; his original description is: The current bonding driver receives duplicate packets when broadcast/ multicast packets are sent by other devices or packets are flooded by the switch. In this patch, new flags are added in priv_flags of net_device structure to let the bonding driver discard duplicate packets in dev.c:skb_bond(). Modified by Jay Vosburgh to change a define name, update some comments, rearrange the new skb_bond() for clarity, clear all bonding priv_flags on slave release, and update the driver version. Signed-off-by: Kenzo Iwami <k-iwami@cj.jp.nec.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-03-01Merge branch 'upstream-fixes'Jeff Garzik
2006-02-27[REQSK]: Don't reset rskq_defer_accept in reqsk_queue_allocArnaldo Carvalho de Melo
In 295f7324ff8d9ea58b4d3ec93b1aaa1d80e048a9 I moved defer_accept from tcp_sock to request_queue and mistakingly reset it at reqsl_queue_alloc, causing calls to setsockopt(TCP_DEFER_ACCEPT ) to be lost after bind, the fix is to remove the zeroing of rskq_defer_accept from reqsl_queue_alloc. Thanks to Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru> for reporting and testing the suggested fix. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-23Merge branch 'upstream-fixes'Jeff Garzik
2006-02-19[NET]: NETFILTER: remove duplicated lines and fix order in skb_clone().YOSHIFUJI Hideaki
Some of netfilter-related members are initalized / copied twice in skb_clone(). Remove one. Pointed out by Olivier MATZ <olivier.matz@6wind.com>. And this patch also fixes order of copying / clearing members. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-17Merge branch 'upstream-fixes'Jeff Garzik