aboutsummaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)Author
2009-03-18net: kfree(napi->skb) => kfree_skbRoel Kluin
struct sk_buff pointers should be freed with kfree_skb. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-17gro: Fix legacy path napi_complete crashHerbert Xu
On the legacy netif_rx path, I incorrectly tried to optimise the napi_complete call by using __napi_complete before we reenable IRQs. This simply doesn't work since we need to flush the held GRO packets first. This patch fixes it by doing the obvious thing of reenabling IRQs first and then calling napi_complete. Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04vlan: Fix vlan-in-vlan crashes.David S. Miller
As analyzed by Patrick McHardy, vlan needs to reset it's netdev_ops pointer in it's ->init() function but this leaves the compat method pointers stale. Add a netdev_resync_ops() and call it from the vlan code. Any other driver which changes ->netdev_ops after register_netdevice() will need to call this new function after doing so too. With help from Patrick McHardy. Tested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04net: Fix missing dev->neigh_setup in register_netdevice().David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03netns: Remove net_aliveEric W. Biederman
It turns out that net_alive is unnecessary, and the original problem that led to it being added was simply that the icmp code thought it was a network device and wound up being unable to handle packets while there were still packets in the network namespace. Now that icmp and tcp have been fixed to properly register themselves this problem is no longer present and we have a stronger guarantee that packets will not arrive in a network namespace then that provided by net_alive in netif_receive_skb. So remove net_alive allowing packet reception run a little faster. Additionally document the strong reason why network namespace cleanup is safe so that if something happens again someone else will have a chance of figuring it out. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03net: Avoid race between network down and sysfsStephen Hemminger
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-01netpoll: Add drop checks to all entry pointsHerbert Xu
The netpoll entry checks are required to ensure that we don't receive normal packets when invoked via netpoll. Unfortunately it only ever worked for the netif_receive_skb/netif_rx entry points. The VLAN (and subsequently GRO) entry point didn't have the check and therefore can trigger all sorts of weird problems. This patch adds the netpoll check to all entry points. I'm still uneasy with receiving at all under netpoll (which apparently is only used by the out-of-tree kdump code). The reason is it is perfectly legal to receive all data including headers into highmem if netpoll is off, but if you try to do that with netpoll on and someone gets a printk in an IRQ handler you're going to get a nice BUG_ON. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-23net: amend the fix for SO_BSDCOMPAT gsopt infoleakEugene Teo
The fix for CVE-2009-0676 (upstream commit df0bca04) is incomplete. Note that the same problem of leaking kernel memory will reappear if someone on some architecture uses struct timeval with some internal padding (for example tv_sec 64-bit and tv_usec 32-bit) --- then, you are going to leak the padded bytes to userspace. Signed-off-by: Eugene Teo <eugeneteo@kernel.sg> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-23netns: build fix for net_alloc_genericClemens Noss
net_alloc_generic was defined in #ifdef CONFIG_NET_NS, but used unconditionally. Move net_alloc_generic out of #ifdef. Signed-off-by: Clemens Noss <cnoss@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22netns: fix double free at netns creationDaniel Lezcano
This patch fix a double free when a network namespace fails. The previous code does a kfree of the net_generic structure when one of the init subsystem initialization fails. The 'setup_net' function does kfree(ng) and returns an error. The caller, 'copy_net_ns', call net_free on error, and this one calls kfree(net->gen), making this pointer freed twice. This patch make the code symetric, the net_alloc does the net_generic allocation and the net_free frees the net_generic. Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-17net: Kill skb_truesize_check(), it only catches false-positives.David S. Miller
A long time ago we had bugs, primarily in TCP, where we would modify skb->truesize (for TSO queue collapsing) in ways which would corrupt the socket memory accounting. skb_truesize_check() was added in order to try and catch this error more systematically. However this debugging check has morphed into a Frankenstein of sorts and these days it does nothing other than catch false-positives. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-12net: 4 bytes kernel memory disclosure in SO_BSDCOMPAT gsopt try #2Clément Lecigne
In function sock_getsockopt() located in net/core/sock.c, optval v.val is not correctly initialized and directly returned in userland in case we have SO_BSDCOMPAT option set. This dummy code should trigger the bug: int main(void) { unsigned char buf[4] = { 0, 0, 0, 0 }; int len; int sock; sock = socket(33, 2, 2); getsockopt(sock, 1, SO_BSDCOMPAT, &buf, &len); printf("%x%x%x%x\n", buf[0], buf[1], buf[2], buf[3]); close(sock); } Here is a patch that fix this bug by initalizing v.val just after its declaration. Signed-off-by: Clément Lecigne <clement.lecigne@netasq.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06net_dma: call dmaengine_get only if NET_DMA enabledDavid S. Miller
Based upon a patch from Atsushi Nemoto <anemo@mba.ocn.ne.jp> -------------------- The commit 649274d993212e7c23c0cb734572c2311c200872 ("net_dma: acquire/release dma channels on ifup/ifdown") added unconditional call of dmaengine_get() to net_dma. The API should be called only if NET_DMA was enabled. -------------------- Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Dan Williams <dan.j.williams@intel.com>
2009-02-06neigh: some entries can be skipped during dumpingGautam Kachroo
neightbl_dump_info and neigh_dump_table can skip entries if the *fill*info functions return an error. This results in an incomplete dump ((invoked by netlink requests for RTM_GETNEIGHTBL or RTM_GETNEIGH) nidx and idx should not be incremented if the current entry was not placed in the output buffer Signed-off-by: Gautam Kachroo <gk@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-29net: Fix OOPS in skb_seq_read().Shyam Iyer
It oopsd for me in skb_seq_read. addr2line said it was linux-2.6/net/core/skbuff.c:2228, which is this line: while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) { I added some printks in there and it looks like we hit this: } else if (st->root_skb == st->cur_skb && skb_shinfo(st->root_skb)->frag_list) { st->cur_skb = skb_shinfo(st->root_skb)->frag_list; st->frag_idx = 0; goto next_skb; } Actually I did some testing and added a few printks and found that the st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null. This caused the kernel panic. if (abs_offset < block_limit) { - *data = st->cur_skb->data + abs_offset; + *data = st->cur_skb->data + (abs_offset - st->stepped_offset); I enabled the debug_tcp and with a few printks found that the code did not go to the next_skb label and could find that the sequence being followed was this - It hit this if condition - if (st->cur_skb->next) { st->cur_skb = st->cur_skb->next; st->frag_idx = 0; goto next_skb; And so, now the st pointer is shifted to the next skb whereas actually it should have hit the second else if first since the data is in the frag_list. else if (st->root_skb == st->cur_skb && skb_shinfo(st->root_skb)->frag_list) { st->cur_skb = skb_shinfo(st->root_skb)->frag_list; goto next_skb; } Reversing the two conditions the attached patch fixes the issue for me on top of Herbert's patches. Signed-off-by: Shyam Iyer <shyam_iyer@dell.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-29net: Fix frag_list handling in skb_seq_readHerbert Xu
The frag_list handling was broken in skb_seq_read: 1) We didn't add the stepped offset when looking at the head are of fragments other than the first. 2) We didn't take the stepped offset away when setting the data pointer in the head area. 3) The frag index wasn't reset. This patch fixes both issues. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-20gro: Fix merging of paged packetsHerbert Xu
The previous fix to paged packets broke the merging because it reset the skb->len before we added it to the merged packet. This wasn't detected because it simply resulted in the truncation of the packet while the missing bit is subsequently retransmitted. The fix is to store skb->len before we clobber it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-20gro: Fix error handling on extremely short fragsHerbert Xu
When a frag is shorter than an Ethernet header, we'd return a zeroed packet instead of aborting. This patch fixes that. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-20NET: net_namespace, fix lock imbalanceJiri Slaby
register_pernet_gen_subsys omits mutex_unlock in one fail path. Fix it. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-19net: Fix data corruption when splicing from sockets.Jarek Poplawski
The trick in socket splicing where we try to convert the skb->data into a page based reference using virt_to_page() does not work so well. The idea is to pass the virt_to_page() reference via the pipe buffer, and refcount the buffer using a SKB reference. But if we are splicing from a socket to a socket (via sendpage) this doesn't work. The from side processing will grab the page (and SKB) references. The sendpage() calls will grab page references only, return, and then the from side processing completes and drops the SKB ref. The page based reference to skb->data is not enough to keep the kmalloc() buffer backing it from being reused. Yet, that is all that the socket send side has at this point. This leads to data corruption if the skb->data buffer is reused by SLAB before the send side socket actually gets the TX packet out to the device. The fix employed here is to simply allocate a page and copy the skb->data bytes into that page. This will hurt performance, but there is no clear way to fix this properly without a copy at the present time, and it is important to get rid of the data corruption. With fixes from Herbert Xu. Tested-by: Willy Tarreau <w@1wt.eu> Foreseen-by: Changli Gao <xiaosuo@gmail.com> Diagnosed-by: Willy Tarreau <w@1wt.eu> Reported-by: Willy Tarreau <w@1wt.eu> Fixed-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-19net: Add debug info to track down GSO checksum bugHerbert Xu
I'm trying to track down why people're hitting the checksum warning in skb_gso_segment. As the problem seems to be hitting lots of people and I can't reproduce it or locate the bug, here is a patch to print out more details which hopefully should help us to track this down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-14net: Add init_dummy_netdev() and fix EMAC driver using itBenjamin Herrenschmidt
This adds an init_dummy_netdev() function that gets a network device structure (allocation and lifetime entirely under caller's control) and initialize the minimum amount of fields so it can be used to schedule NAPI polls without registering a full blown interface. This is to be used by drivers that need to tie several hardware interfaces to a single NAPI poll scheduler due to HW limitations. It also updates the ibm_newemac driver to use that, this fixing the oops on 2.6.29 due to passing NULL as "dev" to netif_napi_add() Symbol is exported GPL only a I don't think we want binary drivers doing that sort of acrobatics (if we want them at all). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-14gro: Fix page ref count for skbs freed normallyHerbert Xu
When an skb with page frags is merged into an existing one, we cannibalise its reference count. This is OK when the skb is reused because we set nr_frags to zero in that case. However, for the case where the skb is freed through kfree_skb, we didn't clear nr_frags which causes the page to be freed prematurely. This is fixed by moving the skb resetting into skb_gro_receive. Reported-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-14gro: Check for GSO packets and packets with frag_listHerbert Xu
As GRO cannot be applied to packets with frag_list we need to make sure that we reject such packets if they are fed to us, e.g., through a tunnel device. Also there is no point in applying GRO on GSO packets so they too should be rejected. This allows GRO to be used in virtio-net which may produce GSO packets directly but may still benefit from GRO if the other end of it doesn't support GSO. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-11net_dma: acquire/release dma channels on ifup/ifdownDan Williams
The recent dmaengine rework removed the capability to remove dma device driver modules while net_dma is active. Rather than notify dmaengine-clients that channels are trying to be removed, we now rely on clients to notify dmaengine when they no longer have a need for channels. Teach net_dma to release channels by taking dmaengine references at netdevice open and dropping references at netdevice close. Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-09Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (22 commits) ioat: fix self test for multi-channel case dmaengine: bump initcall level to arch_initcall dmaengine: advertise all channels on a device to dma_filter_fn dmaengine: use idr for registering dma device numbers dmaengine: add a release for dma class devices and dependent infrastructure ioat: do not perform removal actions at shutdown iop-adma: enable module removal iop-adma: kill debug BUG_ON iop-adma: let devm do its job, don't duplicate free dmaengine: kill enum dma_state_client dmaengine: remove 'bigref' infrastructure dmaengine: kill struct dma_client and supporting infrastructure dmaengine: replace dma_async_client_register with dmaengine_get atmel-mci: convert to dma_request_channel and down-level dma_slave dmatest: convert to dma_request_channel dmaengine: introduce dma_request_channel and private channels net_dma: convert to dma_find_channel dmaengine: provide a common 'issue_pending_all' implementation dmaengine: centralize channel allocation, introduce dma_find_channel dmaengine: up-level reference counting to the module level ...
2009-01-06gro: Add internal interfaces for VLANHerbert Xu
Previously GRO's only entry point from the outside is through napi_gro_receive and napi_gro_frags. These interfaces are for device drivers. This patch rearranges things to provide a new set of interfaces for VLANs. These interfaces are for internal use only. The VLAN code itself can then provide a set of entry points for device drivers. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06dmaengine: kill struct dma_client and supporting infrastructureDan Williams
All users have been converted to either the general-purpose allocator, dma_find_channel, or dma_request_channel. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06dmaengine: replace dma_async_client_register with dmaengine_getDan Williams
Now that clients no longer need to be notified of channel arrival dma_async_client_register can simply increment the dmaengine_ref_count. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06net_dma: convert to dma_find_channelDan Williams
Use the general-purpose channel allocation provided by dmaengine. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06dmaengine: provide a common 'issue_pending_all' implementationDan Williams
async_tx and net_dma each have open-coded versions of issue_pending_all, so provide a common routine in dmaengine. The implementation needs to walk the global device list, so implement rcu to allow dma_issue_pending_all to run lockless. Clients protect themselves from channel removal events by holding a dmaengine reference. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-05Revert "net: Fix for initial link state in 2.6.28"David S. Miller
This reverts commit 22604c866889c4b2e12b73cbf1683bda1b72a313. We can't fix this issue in this way, because we now can try to take the dev_base_lock rwlock as a writer in software interrupt context and that is not allowed without major surgery elsewhere. This initial link state problem needs to be solved in some other way. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04net: Fix for initial link state in 2.6.28Michael Marineau
From: Michael Marineau <mike@marineau.org> Commit b47300168e770b60ab96c8924854c3b0eb4260eb "Do not fire linkwatch events until the device is registered." was made as a workaround for drivers that call netif_carrier_off before registering the device. Unfortunately this causes these drivers to incorrectly report their link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING flag when the interface is first brought up. This issues was previously pointed out[1] but was dismissed saying that IFF_RUNNING is not related to the link status. From my digging IFF_RUNNING, as reported to userspace, is based on the link state. It is set based on __LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3], and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is not reported to user space so it may well be independent of the link, I don't know if and when it may get set.) The end result depends slightly depending on the driver. The the two I tested were e1000e and b44. With e1000e if the system is booted without a network cable attached the interface will falsely report RUNNING when it is brought up causing NetworkManager to attempt to start it and eventually time out. With b44 when the system is booted with a network cable attached and brought up with dhcpcd it will time out the first time. The attached patch that will still set the operstate variable correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called but then return rather than skipping the linkwatch_fire_event call entirely as the previous fix did. (sorry it isn't inline, I don't have a patch friendly email client at the moment) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04gro: Add page frag supportHerbert Xu
This patch allows GRO to merge page frags (skb_shinfo(skb)->frags) in one skb, rather than using the less efficient frag_list. It also adds a new interface, napi_gro_frags to allow drivers to inject page frags directly into the stack without allocating an skb. This is intended to be the GRO equivalent for LRO's lro_receive_frags interface. The existing GSO interface can already handle page frags with or without an appended frag_list so nothing needs to be changed there. The merging itself is rather simple. We store any new frag entries after the last existing entry, without checking whether the first new entry can be merged with the last existing entry. Making this check would actually be easy but since no existing driver can produce contiguous frags anyway it would just be mental masturbation. If the total number of entries would exceed the capacity of a single skb, we simply resort to using frag_list as we do now. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04gro: Use gso_size to store MSSHerbert Xu
In order to allow GRO packets without frag_list at all, we need to store the MSS in the packet itself. The obvious place is gso_size. The only thing to watch out for is if the packet ends up not being GRO then we need to clear gso_size before pushing the packet into the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits: netRusty Russell
In future all cpumask ops will only be valid (in general) for bit numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators and other comparisons. This is always safe: no cpu number can be >= nr_cpu_ids, and nr_cpu_ids is initialized to NR_CPUS at boot. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29netns: foreach_netdev_safe is insufficient in default_device_exitEric W. Biederman
During network namespace teardown we either move or delete all of the network devices associated with a network namespace. In the case of veth devices deleting one will also delete it's pair device. If both devices are in the same network namespace then for_each_netdev_safe is insufficient as next may point to the second veth device we have deleted. To avoid problems I do what we do in __rtnl_kill_links and restart the scan of the device list, after we have deleted a device. Currently dev_change_netnamespace does not appear to suffer from this problem, but wireless devices are also paired and likely should be moved between network namespaces together. So I have errored on the side of caution and restart the scan of the network devices in that case as well. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits) net: Allow dependancies of FDDI & Tokenring to be modular. igb: Fix build warning when DCA is disabled. net: Fix warning fallout from recent NAPI interface changes. gro: Fix potential use after free sfc: If AN is enabled, always read speed/duplex from the AN advertising bits sfc: When disabling the NIC, close the device rather than unregistering it sfc: SFT9001: Add cable diagnostics sfc: Add support for multiple PHY self-tests sfc: Merge top-level functions for self-tests sfc: Clean up PHY mode management in loopback self-test sfc: Fix unreliable link detection in some loopback modes sfc: Generate unique names for per-NIC workqueues 802.3ad: use standard ethhdr instead of ad_header 802.3ad: generalize out mac address initializer 802.3ad: initialize ports LACPDU from const initializer 802.3ad: remove typedef around ad_system 802.3ad: turn ports is_individual into a bool 802.3ad: turn ports is_enabled into a bool 802.3ad: make ntt bool ixgbe: Fix set_ringparam in ixgbe to use the same memory pools. ... Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due to the conversion to %pI (in this networking merge) and the addition of doing IPv6 addresses (from the earlier merge of CIFS).
2008-12-26gro: Fix potential use after freeHerbert Xu
The initial skb may have been freed after napi_gro_complete in napi_gro_receive if it was merged into an existing packet. Thus we cannot check same_flow (which indicates whether it was merged) after calling napi_gro_complete. This patch fixes this by saving the same_flow status before the call to napi_gro_complete. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-26net: Init NAPI dev_list on napi_delPeter P Waskiewicz Jr
The recent GRO patches introduced the NAPI removal of devices in free_netdev. For drivers that can change the number of queues during driver operation, the NAPI infrastructure doesn't allow the freeing and re-addition of NAPI entities without reloading the driver. This change reinitializes the dev_list in each NAPI struct on delete, instead of just deleting it (and assigning the list pointers to POISON). Drivers that wish to remove/re-add NAPI will need to re-initialize the netdev napi_list after removing all NAPI instances, before re-adding NAPI devices again. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25Merge branch 'next' into for-linusJames Morris
2008-12-22net: Fix oops in dev_ifsioc()Jarek Poplawski
A command like this: "brctl addif br1 eth1" issued as a user gave me an oops when bridge module wasn't loaded. It's caused by using a dev pointer before checking for NULL. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17Revert "net: release skb->dst in sock_queue_rcv_skb()"David S. Miller
This reverts commit 70355602879229c6f8bd694ec9c0814222bc4936. As pointed out by Mark McLoughlin IP_PKTINFO cmsg data is one post-queueing user, so this optimization is not valid right now. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17Phonet: allocate separate ARP type for GPRS over a Phonet pipeRémi Denis-Courmont
A separate xmit lock class supports GPRS over a Phonet pipe over a TUN device (type ARPHRD_NONE). Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17Phonet: allocate a non-Ethernet ARP typeRémi Denis-Courmont
Also leave some room for more 802.11 types. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15ethtool: Add GGRO and SGRO opsHerbert Xu
This patch adds the ethtool ops to enable and disable GRO. It also makes GRO depend on RX checksum offload much the same as how TSO depends on SG support. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15net: Add skb_gro_receiveHerbert Xu
This patch adds the helper skb_gro_receive to merge packets for GRO. The current method is to allocate a new header skb and then chain the original packets to its frag_list. This is done to make it easier to integrate into the existing GSO framework. In future as GSO is moved into the drivers, we can undo this and simply chain the original packets together. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15net: Add Generic Receive Offload infrastructureHerbert Xu
This patch adds the top-level GRO (Generic Receive Offload) infrastructure. This is pretty similar to LRO except that this is protocol-independent. Instead of holding packets in an lro_mgr structure, they're now held in napi_struct. For drivers that intend to use this, they can set the NETIF_F_GRO bit and call napi_gro_receive instead of netif_receive_skb or just call netif_rx. The latter will call napi_receive_skb automatically. When napi_gro_receive is used, the driver must either call napi_complete/napi_rx_complete, or call napi_gro_flush in softirq context if the driver uses the primitives __napi_complete/__napi_rx_complete. Protocols will set the gro_receive and gro_complete function pointers in order to participate in this scheme. In addition to the packet, gro_receive will get a list of currently held packets. Each packet in the list has a same_flow field which is non-zero if it is a potential match for the new packet. For each packet that may match, they also have a flush field which is non-zero if the held packet must not be merged with the new packet. Once gro_receive has determined that the new skb matches a held packet, the held packet may be processed immediately if the new skb cannot be merged with it. In this case gro_receive should return the pointer to the existing skb in gro_list. Otherwise the new skb should be merged into the existing packet and NULL should be returned, unless the new skb makes it impossible for any further merges to be made (e.g., FIN packet) where the merged skb should be returned. Whenever the skb is merged into an existing entry, the gro_receive function should set NAPI_GRO_CB(skb)->same_flow. Note that if an skb merely matches an existing entry but can't be merged with it, then this shouldn't be set. If gro_receive finds it pointless to hold the new skb for future merging, it should set NAPI_GRO_CB(skb)->flush. Held packets will be flushed by napi_gro_flush which is called by napi_complete and napi_rx_complete. Currently held packets are stored in a singly liked list just like LRO. The list is limited to a maximum of 8 entries. In future, this may be expanded to use a hash table to allow more flows to be held for merging. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15net: Add frag_list support to GSOHerbert Xu
This patch allows GSO to handle frag_list in a limited way for the purposes of allowing packets merged by GRO to be refragmented on output. Most hardware won't (and aren't expected to) support handling GRO frag_list packets directly. Therefore we will perform GSO in software for those cases. However, for drivers that can support it (such as virtual NICs) we may not have to segment the packets at all. Whether the added overhead of GRO/GSO is worthwhile for bridges and routers when weighed against the benefit of potentially increasing the MTU within the host is still an open question. However, for the case of host nodes this is undoubtedly a win. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15net: Add frag_list support to skb_segmentHerbert Xu
This patch adds limited support for handling frag_list packets in skb_segment. The intention is to support GRO (Generic Receive Offload) packets which will be constructed by chaining normal packets using frag_list. As such we require all frag_list members terminate on exact MSS boundaries. This is checked using BUG_ON. As there should only be one producer in the kernel of such packets, namely GRO, this requirement should not be difficult to maintain. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>