aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/ulp/ipoib/ipoib_cm.c
AgeCommit message (Collapse)Author
2008-07-14IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq()Roland Dreier
For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(), and these two callers are not synchronized against each other. However, ipoib_cm_post_receive_nonsrq() always reuses the same receive work request and scatter list structures, so multiple callers can end up stepping on each other, which leads to posting garbled work requests. Fix this by having the caller pass in the ib_recv_wr and ib_sge structures to use, and allocating new local structures in ipoib_cm_nonsrq_init_rx(). Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam Nguyen <hnguyen@de.ibm.com>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14IPoIB: Copy small received SKBs in connected modeEli Cohen
The connected mode implementation in the IPoIB driver has a large overhead in the way SKBs are handled in the receive flow. It usually allocates an SKB with as big as was used in the currently received SKB and moves unused fragments from the old SKB to the new one. This involves a loop on all the remaining fragments and incurs overhead on the CPU. This patch, for small SKBs, allocates an SKB just large enough to contain the received data and copies to it the data from the received SKB. The newly allocated SKB is passed to the stack and the old SKB is reposted. When running netperf, UDP small messages, without this pach I get: UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 14.4.3.178 (14.4.3.178) port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 114688 128 10.00 5142034 0 526.31 114688 10.00 1130489 115.71 With this patch I get both send and receive at ~315 mbps. The reason that send performance actually slows down is as follows: When using this patch, the overhead of the CPU for handling RX packets is dramatically reduced. As a result, we do not experience RNR NAK messages from the receiver which cause the connection to be closed and reopened again; when the patch is not used, the receiver cannot handle the packets fast enough so there is less time to post new buffers and hence the mentioned RNR NACKs. So what happens is that the application *thinks* it posted a certain number of packets for transmission but these packets are flushed and do not really get transmitted. Since the connection gets opened and closed many times, each time netperf gets the CPU time that otherwise would have been given to IPoIB to actually transmit the packets. This can be verified when looking at the port counters -- the output of ifconfig and the oputput of netperf (this is for the case without the patch): tx packets ========== port counter: 1,543,996 ifconfig: 1,581,426 netperf: 5,142,034 rx packets ========== netperf 1,1304,089 Signed-off-by: Eli Cohen <eli@mellanox.co.il>
2008-07-14RDMA: Remove subversion $Id tagsRoland Dreier
They don't get updated by git and so they're worse than useless. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29IPoIB: Use separate CQ for UD send completionsEli Cohen
Use a dedicated CQ for UD send completions. Also, do not arm the UD send CQ, which reduces the number of interrupts generated. This patch farther reduces overhead by not calling poll CQ for every posted send WR -- it does polls only when there 16 or more outstanding work requests. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16IPoIB: Handle case when P_Key is deleted and re-added at same indexRoland Dreier
If a P_Key is deleted and then re-added at the same index, then IPoIB gets confused because __ipoib_ib_dev_flush() only checks whether the index is the same without checking whether the P_Key was present, so the interface is stopped when the P_Key is deleted, but the event when the P_Key is re-added gets ignored and the interface never gets restarted. Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey() everywhere in IPoIB, since none of the places that look for P_Keys are in a fast path or in non-sleeping context, and in general we want to kill off the whole caching infrastructure eventually. This also fixes consistency problems caused because some IPoIB queries were cached and some were uncached during the window where the cache was not updated. Thanks to Venkata Subramonyam <vsubramo@cisco.com> for debugging this problem and testing this fix. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16IPoIB: Add LSO supportEli Cohen
For HCAs that support TCP segmentation offload (IB_DEVICE_UD_TSO), set NETIF_F_TSO and use HW LSO to offload TCP segmentation. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16IPoIB: Use checksum offload support if availableEli Cohen
For HCAs that support checksum offload (ie that set IB_DEVICE_UD_IP_CSUM in the device capabilities flags), have IPoIB set NETIF_F_IP_CSUM and use the HCA to generate and verify IP checksums. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-12IPoIB: Allocate priv->tx_ring with vmalloc()Roland Dreier
Commit 7143740d ("IPoIB: Add send gather support") made struct ipoib_tx_buf significantly larger, since the mapping member changed from a single u64 to an array with MAX_SKB_FRAGS + 1 entries. This means that allocating tx_rings with kzalloc() may fail because there is not enough contiguous memory for the new, much bigger size. Fix this regression by allocating the rings with vmalloc() instead. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11IPoIB/cm: Set tx_wr.num_sge in connected mode post_send()Roland Dreier
Commit 7143740d ("IPoIB: Add send gather support") made it possible for tx_wr.num_sge to be != 1 -- this happens if send gather support is enabled. However, the code in the connected mode post_send() function assumes the old invariant, namely that tx_wr.num_sge is always 1. Fix this by explicitly setting tx_wr.num_sge to 1 in the CM post_send(). Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-19IPoIB/cm: Fix ipoib_cm_dev_stop() cleanup when drain times outPradeep Satyanarayana
Commit efcd9971 ("IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()") introduced a bug in ipoib_cm_dev_stop() when the receive drain times out. In that case, the function moves all the pending rx stuff into a private list but then calls ipoib_cm_free_rx_reap_list(), which handles a different list. Fix this by moving everything to the rx_reap_list that will actually get freed up. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=906>. Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08IPoIB: Add send gather supportEli Cohen
This patch acts as a preparation for using checksum offload for IB devices capable of inserting/verifying checksum in IP packets. The patch does not actaully turn on NETIF_F_SG - we defer that to the patches adding checksum offload capabilities. We only add support for send gathers for datagram mode, since existing HW does not support checksum offload on connected QPs. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB/CM: Enable SRQ support on HCAs that support fewer than 16 SG entriesPradeep Satyanarayana
Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG entries for SRQs. Currently IPoIB/CM implicitly assumes all HCAs will support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This patch removes that restriction by limiting the maximum MTU in connected mode to what the maximum number of SRQ SG entries allows. This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728> Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB/cm: Add connected mode support for devices without SRQsPradeep Satyanarayana
Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared receive queues). The current IPoIB connected mode support only works on devices that support SRQs. Fix this by adding support for using the receive queue of each connected mode receive QP. The disadvantage of this compared to using an SRQ is that it means a full queue of receives must be posted for each remote connected mode peer, which means that total memory usage is potentially much higher than when using SRQs. To manage this, add a new module parameter "max_nonsrq_conn_qp" that limits the number of connections allowed per interface. The rest of the changes are fairly straightforward: we use a table of struct ipoib_cm_rx to hold all the active connections, and put the table index of the connection in the high bits of receive WR IDs. This is needed because we cannot rely on the struct ib_wc.qp field for non-SRQ receive completions. Most of the rest of the changes just test whether or not an SRQ is available, and post receives or find received packets in the right place depending on the answer. Cleaning up dead connections actually becomes simpler, because we do not have to do the "last WQE reached" dance that is required to destroy QPs attached to an SRQ. We just move the QP to the error state and wait for all pending receives to be flushed. Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> [ Completely rewritten and split up, based on Pradeep's work. Several bugs fixed and no doubt several bugs introduced. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()Roland Dreier
Factor out the code for going through the rx_reap list of struct ipoib_cm_rx and freeing each one. This consolidates the code duplicated between ipoib_cm_dev_stop() and ipoib_cm_rx_reap() and reduces the risk of error when adding additional accounting. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB/cm: Factor out ipoib_cm_create_srq()Roland Dreier
Factor out the code to create an SRQ and allocate the receive ring in ipoib_cm_dev_init() into a new function ipoib_cm_create_srq(). This will make the code neater when support for devices that don't implement SRQs is added. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB/cm: Factor out ipoib_cm_free_rx_ring()Roland Dreier
Factor out the code to unmap/free skbs and free the receive ring in ipoib_cm_dev_cleanup() into a new function ipoib_cm_free_rx_ring(). This function will be called from a couple of other places when support for devices that don't implement SRQs is added. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB: Trivial formatting cleanupsRoland Dreier
Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-26IPoIB/cm: Fix receive QP cleanupRoland Dreier
Commit 1b524963 ("IPoIB/cm: Use common CQ for CM send completions") changed how the high-order bits of work request IDs were used, which had the effect that IPOIB_CM_RX_DRAIN_WRID was no longer handled as a connected mode receive completion. This leads to the messages ib1: cm send completion event with wrid 1073741823 (> 64) ib1: RX drain timing out when an interface with connected mode QPs is brought down. Fix this by making sure that both IPOIB_OP_CM and IPOIB_OP_RECV are set in IPOIB_CM_RX_DRAIN_WRID. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-19IPoIB/cm: Use common CQ for CM send completionsMichael S. Tsirkin
Use the same CQ for CM send completions as for all other IPoIB completions. This means all completions are processed via the same NAPI polling routine. This should help reduce the number of interrupts for bi-directional traffic (such as TCP) and fixes "driver is hogging interrupts" errors reported for IPoIB send side, e.g. <https://bugs.openfabrics.org/show_bug.cgi?id=508> To do this, keep a per-interface counter of outstanding send WRs, and stop the interface when this counter reaches the send queue size to avoid CQ overruns. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-17IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"Roland Dreier
It's too hard to figure out what "!likely(...)" really means, and who knows how compilers interpret the hint. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits) mlx4_core: Fix section mismatches IPoIB: Allow setting policy to ignore multicast groups IB/mthca: Mark error paths as unlikely() in post_srq_recv functions IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages. IB/ipath: Remove redundant link state checks IB/ipath: Fix IB_EVENT_PORT_ERR event IB/ipath: Better handling of unexpected GPIO interrupts IB/ipath: Maintain active time on all chips IB/ipath: Fix QHT7040 serial number check IB/ipath: Indicate a couple of chip bugs to userspace IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close IB/ipath: Remove duplicate copy of LMC IB/ipath: Add ability to set the LMC via the sysfs debugging interface IB/ipath: Optimize completion queue entry insertion and polling IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED IB/ipath: Generate flush CQE when QP is in error state IB/ipath: Remove redundant code IB/ipath: Future proof eeprom checksum code (contents reading) IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate ...
2007-10-10[IPoIB]: Convert to netdevice internal statsRoland Dreier
Use the stats member of struct netdevice in IPoIB, so we can save memory by deleting the stats member of struct ipoib_dev_priv, and save code by deleting ipoib_get_stats(). Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-09IPoIB/cm: Clean up initialization of QP attr in ipoib_cm_create_tx_qp()Dotan Barak
Make the way QP is being created in ipoib_cm_create_tx_qp() consistent with ipoib_cm_create_rx_qp(). Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10IB/cm: Include HCA ACK delay in local ACK timeoutSean Hefty
The IB CM should include the HCA ACK delay when calculating the local ACK timeout value to use for RC QPs. If the HCA ACK delay is large enough relative to the packet life time, then if it is not taken into account, the calculated timeout value ends up being too small, which can result in "retry exceeded" errors. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10IPoIB/cm: Fix warning if IPV6 is not enabledRoland Dreier
Fix drivers/infiniband/ulp/ipoib/ipoib_cm.c:1151: warning: unused variable 'dev' by getting rid of the variable dev, which is only used if CONFIG_IPV6 is enabled, and replacing the one use of it with the value it is assigned, namely priv->dev. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-02IPoIB/cm: Partial error clean up unmaps wrong addressRalph Campbell
If a page can't be allocated for the frag list of a skb, the code to unmap the partially allocated list is off by one. For exaple, if 'frags' equals one, i == 0, and the alloc_page() fails, then the old loop would have unmapped mapping[1] which is uninitialized. The same would happen if the call to ib_dma_map_page() failed. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21IPoIB/cm: Remove dead definition of struct ipoib_cm_idRoland Dreier
It's completely unused. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21IPoIB/cm: Fix interoperability when MTU doesn't matchMichael S. Tsirkin
IPoIB connected mode currently rejects a connection request unless the supported MTU is >= the local netdevice MTU. This breaks interoperability with implementations that might have tweaked IPOIB_CM_MTU, and there's real no longer a reason to do so: this test is just a leftover from when we did not tweak MTU per-connection. Fix this by making the test as permissive as possible. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21IPoIB/cm: Initialize RX before moving QP to RTRMichael S. Tsirkin
Fix a crasher bug in IPoIB CM: once a QP is in the RTR state, a receive completion (or even an asynchronous error) might be observed on this QP, so we have to initialize all of our receive data structures before moving to the RTR state. As an optimization (since modify_qp might take a long time), the jiffies update done when moving RX to the passive_ids list is also left in place to reduce the chance of the RX being misdetected as stale. This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=662>. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29IPoIB/cm: Fix performance regression on MellanoxMichael S. Tsirkin
commit 518b1646 ("IPoIB/cm: Fix SRQ WR leak") introduced a severe performance regression on Mellanox cards, because keeping a QP in the error state for extended periods of time moves hardware to the slow path (until the QP is destroyed). For example, MPI latency goes from ~3 usecs to ~7 usecs. Fix this by posting a send WR on one of the QPs that are being flushed, instead of using a separate drain QP that is kept in the error state. This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=636>, reported and bisected by Scott Weitzenkamp at Cisco and debugged by Sasha Mikheev at Voltaire. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24IPoIB/cm: Drain cq in ipoib_cm_dev_stop()Michael S. Tsirkin
Since NAPI polling is disabled while ipoib_cm_dev_stop() is running, ipoib_cm_dev_stop() must poll the CQ itself in order to see the packets draining. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24IPoIB/cm: Fix timeout check in ipoib_cm_dev_stop()Michael S. Tsirkin
time_after() was used backwards, so the timeout occurred immediately. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21IPoIB/cm: Fix SRQ WR leakMichael S. Tsirkin
SRQ WR leakage has been observed with IPoIB/CM: e.g. flipping ports on and off will, with time, leak out all WRs and then all connections will start getting RNR NAKs. Fix this in the way suggested by spec: move the QP being destroyed to the error state, wait for "Last WQE Reached" event and then post WR on a "drain QP" connected to the same CQ. Once we observe a completion on the drain QP, it's safe to call ib_destroy_qp. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IPoIB/cm: Optimize stale connection detectionMichael S. Tsirkin
In the presence of some running RX connections, we repeat queue_delayed_work calls each 4 RX WRs, which is a waste. It's enough to start stale task when a first passive connection is added, and rerun it every IPOIB_CM_RX_DELAY as long as there are outstanding passive connections. This removes some code from RX data path. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06IPoIB: Convert to NAPIRoland Dreier
Convert the IP-over-InfiniBand network device driver over to using NAPI to handle completions for the main CQ. This covers all receives as well as datagram mode sends; send completions for connected mode connections are still handled from interrupt context. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06IB: Add CQ comp_vector supportMichael S. Tsirkin
Add a num_comp_vectors member to struct ib_device and extend ib_create_cq() to pass in a comp_vector parameter -- this parallels the userspace libibverbs API. Update all hardware drivers to set num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector value. Pass the value of num_comp_vectors to userspace rather than hard-coding a value of 1. We want multiple CQ event vector support (via MSI-X or similar for adapters that can generate multiple interrupts), but it's not clear how many vectors we want, or how we want to deal with policy issues such as how to decide which vector to use or how to set up interrupt affinity. This patch is useful for experimenting, since no core changes will be necessary when updating a driver to support multiple vectors, and we know that we want to make at least these changes anyway. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06IPoIB/cm: Don't crash if remote side uses one QP for both directionsMichael S. Tsirkin
The IPoIB CM spec allows the use of a single connection in both active->passive and passive->active directions. The current Linux code uses one connection for both directions, but if another node only uses one connection for both directions, we oops when we try to look up the passive connection. Fix by checking that qp_context is non-NULL before dereferencing it. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
2007-04-30IPoIB/cm: Fix error handling in ipoib_cm_dev_open()Michael S. Tsirkin
If skb allocation fails when we start the device, we call ipoib_cm_dev_stop() even though ipoib_cm_dev_open() did not run to completion, so we pass an invalid pointer to ib_destroy_cm_id and get an oops. Fix by clearing cm.id on error, and testing it in cm_dev_stop(). This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=561> Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-27Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: (49 commits) IB: Set class_dev->dev in core for nice device symlink IB/ehca: Implement modify_port IB/umad: Clarify documentation of transaction ID IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements IB/mad: Change SMI to use enums rather than magic return codes IB/umad: Implement GRH handling for sent/received MADs IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr IB/sa: Set src_path_bits correctly in ib_init_ah_from_path() IB/ucm: Simplify ib_ucm_event() RDMA/ucma: Simplify ucma_get_event() IB/mthca: Simplify CQ cleaning in mthca_free_qp() IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory IB/mthca: Update HCA firmware revisions IB/ipath: Fix WC format drift between user and kernel space IB/ipath: Check that a UD work request's address handle is valid IB/ipath: Remove duplicate stuff from ipath_verbs.h IB/ipath: Check reserved memory keys IB/ipath: Fix unit selection when all CPU affinity bits set IB/ipath: Don't allow QPs 0 and 1 to be opened multiple times IB/ipath: Disable IB link earlier in shutdown sequence ...
2007-04-25[SK_BUFF]: Introduce skb_reset_mac_header(skb)Arnaldo Carvalho de Melo
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-24IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacementsRoland Dreier
There are quite a few places in ipoib_cm.c where we know IRQs are enabled because we do something that sleeps in the same function, so we can convert several occurrences of spin_lock_irqsave() to a plain spin_lock_irq(). This cleans up the source a little and makes the code smaller too: add/remove: 0/0 grow/shrink: 1/5 up/down: 3/-51 (-48) function old new delta ipoib_cm_tx_reap 403 406 +3 ipoib_cm_stale_task 146 145 -1 ipoib_cm_dev_stop 173 172 -1 ipoib_cm_tx_handler 964 956 -8 ipoib_cm_rx_handler 956 937 -19 ipoib_cm_skb_reap 212 190 -22 Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18IPoIB: Remove pointless opcode field from debugging outputRoland Dreier
There's no point in printing the opcode field in the completion handling debugging output, since the type of completion is already printed at the beginning of the line. In fact the opcode field is not even defined for completions with a status other than success. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-10IPoIB/cm: Fix DMA direction typoMichael S. Tsirkin
Receive buffers need to be mapped with DMA_FROM_DEVICE. Incorrectly mapping with DMA_TO_DEVICE causes a hard lock on ppc64 machines with an IOMMU. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=431> Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22IB/ipoib: Fix thinko in packet length checksMichael S. Tsirkin
The packet length checks in ipoib are broken: we add 4 bytes (IPoIB encapsulation header) when sending a packet, not 20 bytes (hardware address length) to each packet. Therefore, if connected mode is enabled so that the interface MTU is larger than the multicast MTU, IPoIB may end up trying to send too-long multicast packets. For example, multicast is broken if a message of size 2048 bytes is sent on an interface with UD MTU 2048, because 2048 is bigger than the real limit of 2044 but the code tests against the wrong limit of 2060. This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=418>, submitted by Scott Weitzenkamp <sweitzen@cisco.com>. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22IPoIB/cm: Fix reaping of stale connectionsMichael S. Tsirkin
The sense of the time_after_eq() test in ipoib_cm_stale_task() is reversed so that only non-stale connections are reaped. Fix this by changing to time_before_eq(). Noticed by Pradeep Satyanarayana <pradeep@us.ibm.com>. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-20IPoIB/cm: Improve small message bandwidthMichael S. Tsirkin
Avoid the overhead of freeing/reallocating and mapping/unmapping for DMA pages that have not been written to by hardware. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16IPoIB: CM error handling thinko fixMichael S. Tsirkin
ipoib_cm_alloc_rx_skb() might be called from IRQ context, so it must use dev_kfree_skb_any(), not kfree_skb(). Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16IPoIB: Only allow root to change between datagram and connected modeRoland Dreier
Change the permissions of the "mode" sysfs attribute to be S_IWUSR instead of S_IWUGO. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10IPoIB: Connected mode experimental supportMichael S. Tsirkin
The following patch adds experimental support for IPoIB connected mode, as defined by the draft from the IETF ipoib working group. The idea is to increase performance by increasing the MTU from the maximum of 2K (theoretically 4K) supported by IPoIB on top of UD. With this code, I'm able to get 800MByte/sec or more with netperf without options on a Mellanox 4x back-to-back DDR system. Some notes on code: 1. SRQ is used for scalability to large cluster sizes 2. Only RC connections are used (UC does not support SRQ now) 3. Retry count is set to 0 since spec draft warns against retries 4. Each connection is used for data transfers in only 1 direction, so each connection is either active(TX) or passive (RX). 2 sides that want to communicate create 2 connections. 5. Each active (TX) connection has a separate CQ for send completions - this keeps the code simple without CQ resize and other tricks 6. To detect stale passive side connections (where the remote side is down), we keep an LRU list of passive connections (updated once per second per connection) and destroy a connection after it has been unused for several seconds. The LRU rule makes it possible to avoid scanning connections that have recently been active. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>