aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2008-02-08IPoIB: Add send gather supportEli Cohen
This patch acts as a preparation for using checksum offload for IB devices capable of inserting/verifying checksum in IP packets. The patch does not actaully turn on NETIF_F_SG - we defer that to the patches adding checksum offload capabilities. We only add support for send gathers for datagram mode, since existing HW does not support checksum offload on connected QPs. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08IPoIB: Add high DMA feature flagEli Cohen
All current InfiniBand devices can handle all DMA addresses, and it's hard to imagine anyone would be silly enough to build a new device that couldn't. Therefore, enable the NETIF_F_HIGHDMA feature for IPoIB. This has no effect for no, but is needed when we enable gather/scatter support and checksum stateless offloads. Signed-off-by: Eli Cohen <eli@mellnaox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08IB/mlx4: Use multiple WQ blocks to post smaller send WQEsJack Morgenstein
ConnectX HCA supports shrinking WQEs, so that a single work request can be made of multiple units of wqe_shift. This way, WRs can differ in size, and do not have to be a power of 2 in size, saving memory and speeding up send WR posting. Unfortunately, if we do this then the wqe_index field in CQEs can't be used to look up the WR ID anymore, so our implementation does this only if selective signaling is off. Further, on 32-bit platforms, we can't use vmap() to make the QP buffer virtually contigious. Thus we have to use constant-sized WRs to make sure a WR is always fully within a single page-sized chunk. Finally, we use WRs with the NOP opcode to avoid wrapping around the queue buffer in the middle of posting a WR, and we set the NoErrorCompletion bit to avoid getting completions with error for NOP WRs. However, NEC is only supported starting with firmware 2.2.232, so we use constant-sized WRs for older firmware. And, since MLX QPs only support SEND, we use constant-sized WRs in this case. When stamping during NOP posting, do stamping following setting of the NOP WQE valid bit. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-06IB/mlx4: Consolidate code to get an entry from a struct mlx4_bufRoland Dreier
We use struct mlx4_buf for kernel QP, CQ and SRQ buffers, and the code to look up an entry is duplicated in get_cqe_from_buf() and the QP and SRQ versions of get_wqe(). Factor this out into mlx4_buf_offset(). This will also make it easier to switch over to using vmap() for buffers. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04RDMA/nes: Add a driver for NetEffect RNICsGlenn Streiff
Add a standard NIC and RDMA/iWARP driver for NetEffect 1/10Gb ethernet adapters. Signed-off-by: Glenn Streiff <gstreiff@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/mthca: Return proper error codes from mthca_fmr_alloc()Olaf Kirch
If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc() would return 0 (success) no matter what. This leads to crashes a little down the road, when we try to dereference eg mr->mtt, which was really ERR_PTR(-Ewhatever). Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB: Avoid marking __devinitdata as constRoland Dreier
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/mlx4: Actually print out the driver versionRoland Dreier
The string mlx4_ib_version was defined, but never used. Print out the version once when the first device is initialized. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/ib_mthca: Pre-link receive WQEs in Tavor modeEli Cohen
We have recently discovered that Tavor mode requires each WQE in a posted list of receive WQEs to have a valid NDA field at all times. This requirement holds true for regular QPs as well as for SRQs. This patch prelinks the receive queue in a regular QP and keeps the free list in SRQ always properly linked. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/mthca: Remove checks for srq->first_free < 0Eli Cohen
The SRQ receive posting functions make sure that srq->first_free never becomes negative, so we can remove tests of whether it is negative. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/fmr_pool: Allocate page list for pool FMRs only when caching enabledOr Gerlitz
Allocate memory for the page_list field of struct ib_pool_fmr only when caching is enabled for the FMR pool, since the field is not used otherwise. This can save significant amounts of memory for large pools with caching turned off. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/srp: Retry stale connectionsDavid Dillow
When a host just goes away (crash, power loss, etc.) without tearing down its IB connections, it can get stale connection errors when it tries to reconnect to targets upon rebooting. Retrying the connection a few times will prevent sysadmins from playing the "which disk(s) went missing?" game. This would have made things slightly quicker when tracking down some of the recent bugs, but it also helps quite a bit when you've got a large number of targets hanging off a wedged server. Signed-off-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04mlx4_core: Don't read reserved fields in mlx4_QUERY_ADAPTER()Jack Morgenstein
The firmware QUERY_ADAPTER command does not return vendor_id, device_id, and revision_id; eliminate these fields from the query. Initialize the rev_id field of the mlx4 device via init_node_data (MAD IFC query), as is done in the query_device verb implementation. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/mthca: Don't read reserved fields in mthca_QUERY_ADAPTER()Jack Morgenstein
For memfree devices, the firmware QUERY_ADAPTER command does not return vendor_id, device_id, and revision_id; do not return these fields in the QUERY_ADAPTER function for memfree devices. Instead, for memfree devices, initialize the rev_id field of the mthca device via init_node_data (MAD IFC query), as is done in the query_device verb implementation. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IPoIB: Remove a misleading debug printOr Gerlitz
Commit 732a2170 ("IB/ipoib: Bound the net device to the ipoib_neigh structue") left a misleading debug print (n->dev would be a bond device only if boding is used). Clean it up. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IPoIB: Handle bonding failover race for connected neighbours tooOr Gerlitz
Move up the code that checks for a situation where the remote GID stored in the ipoib_neigh is different than the one present in the neighbour (handle gratuitous ARP) or that a bonding fail over has happened but the neighbour still has a pointer to an ipoib_neigh created by a different device than the current slave. This will cause the driver to apply the check also for connected mode neighbours. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/mthca: Fix and simplify page size calculation in mthca_reg_phys_mr()Roland Dreier
In mthca_reg_phys_mr(), we calculate the page size for the HCA hardware to use to map the buffer list passed in by the consumer. For example, if the consumer passes in [0] addr 0x1000, size 0x1000 [1] addr 0x2000, size 0x1000 then the algorithm would come up with a page size of 0x2000 and a list of two pages, at 0x0000 and 0x2000. Usually, this would work fine since the memory region would start at an offset of 0x1000 and have a length of 0x2000. However, the old code did not take into account the alignment of the IO virtual address passed in. For example, if the consumer passed in a virtual address of 0x6000 for the above, then the offset of 0x1000 would not be used correctly because the page mask of 0x1fff would result in an offset of 0. We can fix this quite neatly by making sure that the page shift we use is no bigger than the first bit where the start of the first buffer and the IO virtual address differ. Also, we can further simplify the code by removing the special case for a single buffer by noticing that it doesn't matter if we use a page size that is too big. This allows the loop to compute the page shift to be replaced with __ffs(). Thanks to Bryan S Rosenburg <rosnbrg@us.ibm.com> for pointing out the original bug and suggesting several ways to improve this patch. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/ehca: Add PMA supportHoang-Nam Nguyen
This patch enables ehca to redirect any PMA queries to the actual PMA QP. Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Reviewed-by: Joachim Fenkes <fenkes@de.ibm.com> Reviewed-by: Christoph Raisch <raisch@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/ehca: Update sma_attr also in case of disruptive config changeJoachim Fenkes
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/ehca: Prevent sending UD packets to QP0Joachim Fenkes
The IB spec doesn't allow packets to QP0 sent on any other VL than VL15. Hardware doesn't filter those packets on the send side, so we need to do this in the driver and firmware. As eHCA doesn't support QP0, we can just filter out all traffic going to QP0, regardless of SL or VL. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/cm: Add interim support for routed pathsSean Hefty
Paths with hop_limit > 1 indicate that the connection will be routed between IB subnets. Update the subnet local field in the CM REQ based on the hop_limit value. In addition, if the path is routed, then set the LIDs in the REQ to the permissive LIDs. This is used to indicate to the passive side that it should use the LIDs in the received local route header (LRH) associated with the REQ when programming the QP. This is a temporary work-around to the IB CM to support IB router development until the IB router specification is completed. It is not anticipated that this work-around will cause any interoperability issues with existing stacks or future stacks that will properly support IB routers when defined. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-30[SCSI] remove use_sg_chainingJames Bottomley
With the sg table code, every SCSI driver is now either chain capable or broken (or has sg_tablesize set so chaining is never activated), so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits) [IPV6] ADDRLABEL: Fix double free on label deletion. [PPP]: Sparse warning fixes. [IPV4] fib_trie: remove unneeded NULL check [IPV4] fib_trie: More whitespace cleanup. [NET_SCHED]: Use nla_policy for attribute validation in ematches [NET_SCHED]: Use nla_policy for attribute validation in actions [NET_SCHED]: Use nla_policy for attribute validation in classifiers [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers [NET_SCHED]: sch_api: introduce constant for rate table size [NET_SCHED]: Use typeful attribute parsing helpers [NET_SCHED]: Use typeful attribute construction helpers [NET_SCHED]: Use NLA_PUT_STRING for string dumping [NET_SCHED]: Use nla_nest_start/nla_nest_end [NET_SCHED]: Propagate nla_parse return value [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get [NET_SCHED]: act_api: use nlmsg_parse [NET_SCHED]: act_api: fix netlink API conversion bug [NET_SCHED]: sch_netem: use nla_parse_nested_compat [NET_SCHED]: sch_atm: fix format string warning [NETNS]: Add namespace for ICMP replying code. ...
2008-01-28[NETNS]: Add namespace parameter to ip_route_output_key.Denis V. Lunev
Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Add namespace parameter to ip_route_output_flow.Denis V. Lunev
Needed to propagate it down to the __ip_route_output_key. Signed_off_by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Add namespace parameter to ip_dev_find.Denis V. Lunev
in_dev_find() need a namespace to pass it to fib_get_table(), so add an argument. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV4] drivers/infiniband: Use ipv4_is_<type>Joe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28INFINIBAND: Remove 'TOPDIR' from MakefilesWANG Cong
This patch removes TOPDIR from infiniband Makefile and delete one include statement pointing to a non-existing directory Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <mshefty@ichips.intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits) [SCSI] usbstorage: use last_sector_bug flag universally [SCSI] libsas: abstract STP task status into a function [SCSI] ultrastor: clean up inline asm warnings [SCSI] aic7xxx: fix firmware build [SCSI] aacraid: fib context lock for management ioctls [SCSI] ch: remove forward declarations [SCSI] ch: fix device minor number management bug [SCSI] ch: handle class_device_create failure properly [SCSI] NCR5380: fix section mismatch [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices [SCSI] IB/iSER: add logical unit reset support [SCSI] don't use __GFP_DMA for sense buffers if not required [SCSI] use dynamically allocated sense buffer [SCSI] scsi.h: add macro for enclosure bit of inquiry data [SCSI] sd: add fix for devices with last sector access problems [SCSI] fix pcmcia compile problem [SCSI] aacraid: add Voodoo Lite class of cards. [SCSI] aacraid: add new driver features flags [SCSI] qla2xxx: Update version number to 8.02.00-k7. [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command. ...
2008-01-25RDMA/cxgb3: Fix the T3A workaround checksSteve Wise
Correctly work around T3A issues by checking "hwtype != T3A" instead of "hwtype == T3B". This will be needed for new hardware types. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ipath: Remove unnecessary castJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB: Constify seq_operations function pointer tablesJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25RDMA/cxgb3: Mark QP as privileged based on user capabilitiesSteve Wise
This is needed to support zero-stag properly. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25RDMA/cxgb3: Fix page shift calculation in build_phys_page_list()Steve Wise
The existing logic incorrectly maps this buffer list: 0: addr 0x10001000, size 0x1000 1: addr 0x10002000, size 0x1000 To this bogus page list: 0: 0x10000000 1: 0x10002000 The shift calculation must also take into account the address of the first entry masked by the page_mask as well as the last address+size rounded up to the next page size. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25RDMA/cxgb3: Flush the receive queue when closingSteve Wise
- for kernel mode cqs, call event notification handler when flushing. - flush QP when moving from RTS -> CLOSING. - fix logic to identify a kernel mode qp. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ipath: Trivial simplification of ipath_make_ud_req()Ralph Campbell
Move the increment of s_hdrwords into the existing if block that tests if we're doing a send with immediate, to save one test of the opcode. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/mthca: Update latest "native Arbel" firmware revisionRoland Dreier
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB: Remove redundant check of netif_queue_stopped() in xmit handlerKrishna Kumar
qdisc_run() now tests for queue_stopped() before calling __qdisc_run(), and the same check is done in every iteration of __qdisc_run(), so another check is not required in the driver xmit. This means that ipoib_start_xmit() no longer needs to test netif_queue_stopped(); the test was added to fix earlier kernels, where the networking stack did not guarantee that the xmit method of an LLTX driver would not be called after the queue was stopped, but current kernels do provide this guarantee. To validate, I put a debug in the TX_BUSY path which never hit with 64 threads running overnight exercising this code a few 100 million times. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ipath: Add mappings from HW register to PortInfo port physical stateRalph Campbell
Add new mappings from port physical state (a HW register value) to the IB SubnGet(PortInfo) port physical state. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ipath: Changes to support PIO bandwidth check on IBA7220Dave Olson
The IBA7220 uses a count-based triggering mechanism, and therefore can't use the same bandwidth verification mechanism as older chips. To support the 7220, allow enabling and disabling armlaunch errors on application request. Minor robustness improvements as well. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ipath: Minor cleanup of unused fields and chip-specific errorsDave Olson
Clean up some unused header fields, minor related cleanup. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ipath: New sysfs entries to control 7220 featuresMichael Albaugh
IBA7220 includes many more configurable IB settings. Getting/setting these is now grouped into a pair of chip specific functions accessed via function pointers. Provide sysfs access to these settings. Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ipath: Add new chip-specific functions to older chips, consistent initDave Olson
This adds the new (sometimes empty) chip-specific functions to the older chips, and makes the initialization and related functions consistent across all 3 chips. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ipath: Remove unused MDIO interface codeDave Olson
This code has been unused for some time, but still had leftovers from when it was used. Signed-off-by: Dave Olson <dave.olson@qlogic.com Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ehca: Prevent RDMA-related connection failures on some eHCA2 hardwareJoachim Fenkes
Some HW revisions of eHCA2 may cause an RC connection to break if they received RDMA Reads over that connection before. This can be prevented by assuring that, after the first RDMA Read, the QP receives a new RDMA Read every few million link packets. Include code into the driver that inserts an empty (size 0) RDMA Read into the message stream every now and then if the consumer doesn't post them frequently enough. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ehca: Add "port connection autodetect mode"Hoang-Nam Nguyen
This patch enhances ehca with a capability to "autodetect" the ports being connected physically. In order to utilize that function the module option nr_ports must be set to -1 (default is 2 - two ports). This feature is experimental and will made the default later. More detail: If the user connects only one port to the switch, current code requires 1) port one to be connected and 2) module option nr_ports=1 to be given. If autodetect is enabled, ehca will not wait at creation of the GSI QP for the respective port to become active. Since firmware does not accept modify_qp() while the port is down at initialization, we need to cache all calls to modify_qp() for the SMI/GSI QP and just return a good return code. When a port is activated and we get a PORT_ACTIVE event, we replay the cached modify-qp() parms and re-trigger any posted recv WRs. Only then do we forward the PORT_ACTIVE event to registered clients. The result of this autodetect patch is that all ports will be accessible by the users. Depending on their respective cabling only those ports that are connected properly will become operable. If a user tries to modify a regular QP of a non-connected port, modify_qp() will fail. Furthermore, ibv_devinfo should show the port state accordingly. Note that this patch primarily improves the loading behaviour of ehca. If the cable is removed while the driver is operating and plugged in again, firmware will handle that properly by sending an appropriate async event. Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ehca: Define array to store SMI/GSI QPsHoang-Nam Nguyen
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/ehca: Remove CQ-QP-link before destroying QP in error path of create_qp()Hoang-Nam Nguyen
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/iser: Add change_queue_depth methodErez Zilber
Add a .change_queue_depth handler to the scsi_host_template in the iSER driver. iscsi_change_queue_depth was added to iscsi_tcp in order to solve the problem of queue depth which was too high for some targets. It is also applicable for iSER. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/iser: Print information about unhandled RDMA CM eventsErez Zilber
Some RDMA CM events are not supported or not handled in iSER. This patch adds some info (printk) for the user about them. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>