Age | Commit message (Collapse) | Author |
|
The layout for struct ioat_desc_sw is non-optimal and causes an extra
cache hit for every descriptor processed. By tightening up the struct
layout and removing one item, we pull in the fields that get used in
the speedpath and get a little better performance.
Before:
-------
struct ioat_desc_sw {
struct ioat_dma_descriptor * hw; /* 0 8
*/
struct list_head node; /* 8 16
*/
int tx_cnt; /* 24 4
*/
/* XXX 4 bytes hole, try to pack */
dma_addr_t src; /* 32 8
*/
__u32 src_len; /* 40 4
*/
/* XXX 4 bytes hole, try to pack */
dma_addr_t dst; /* 48 8
*/
__u32 dst_len; /* 56 4
*/
/* XXX 4 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
struct dma_async_tx_descriptor async_tx; /* 64 144
*/
/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
/* size: 208, cachelines: 4 */
/* sum members: 196, holes: 3, sum holes: 12 */
/* last cacheline: 16 bytes */
}; /* definitions: 1 */
After:
------
struct ioat_desc_sw {
struct ioat_dma_descriptor * hw; /* 0 8
*/
struct list_head node; /* 8 16
*/
int tx_cnt; /* 24 4
*/
__u32 len; /* 28 4
*/
dma_addr_t src; /* 32 8
*/
dma_addr_t dst; /* 40 8
*/
struct dma_async_tx_descriptor async_tx; /* 48 144
*/
/* --- cacheline 3 boundary (192 bytes) --- */
/* size: 192, cachelines: 3 */
}; /* definitions: 1 */
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current implementation assumes that a channel will only be used by one
client at a time. In order to enable channel sharing the dmaengine core is
changed to a model where clients subscribe to channel-available-events.
Instead of tracking how many channels a client wants and how many it has
received the core just broadcasts the available channels and lets the
clients optionally take a reference. The core learns about the clients'
needs at dma_event_callback time.
In support of multiple operation types, clients can specify a capability
mask to only be notified of channels that satisfy a certain set of
capabilities.
Changelog:
* removed DMA_TX_ARRAY_INIT, no longer needed
* dma_client_chan_free -> dma_chan_release: switch to global reference
counting only at device unregistration time, before it was also happening
at client unregistration time
* clients now return dma_state_client to dmaengine (ack, dup, nak)
* checkpatch.pl fixes
* fixup merge with git-ioat
Cc: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
|
|
The current dmaengine interface defines mutliple routines per operation,
i.e. dma_async_memcpy_buf_to_buf, dma_async_memcpy_buf_to_page etc. Adding
more operation types (xor, crc, etc) to this model would result in an
unmanageable number of method permutations.
Are we really going to add a set of hooks for each DMA engine
whizbang feature?
- Jeff Garzik
The descriptor creation process is refactored using the new common
dma_async_tx_descriptor structure. Instead of per driver
do_<operation>_<dest>_to_<src> methods, drivers integrate
dma_async_tx_descriptor into their private software descriptor and then
define a 'prep' routine per operation. The prep routine allocates a
descriptor and ensures that the tx_set_src, tx_set_dest, tx_submit routines
are valid. Descriptor creation and submission becomes:
struct dma_device *dev;
struct dma_chan *chan;
struct dma_async_tx_descriptor *tx;
tx = dev->device_prep_dma_<operation>(chan, len, int_flag)
tx->tx_set_src(dma_addr_t, tx, index /* for multi-source ops */)
tx->tx_set_dest(dma_addr_t, tx, index)
tx->tx_submit(tx)
In addition to the refactoring, dma_async_tx_descriptor also lays the
groundwork for definining cross-channel-operation dependencies, and a
callback facility for asynchronous notification of operation completion.
Changelog:
* drop dma mapping methods, suggested by Chris Leech
* fix ioat_dma_dependency_added, also caught by Andrew Morton
* fix dma_sync_wait, change from Andrew Morton
* uninline large functions, change from Andrew Morton
* add tx->callback = NULL to dmaengine calls to interoperate with async_tx
calls
* hookup ioat_tx_submit
* convert channel capabilities to a 'cpumask_t like' bitmap
* removed DMA_TX_ARRAY_INIT, no longer needed
* checkpatch.pl fixes
* make set_src, set_dest, and tx_submit descriptor specific methods
* fixup git-ioat merge
* move group_list and phys to dma_async_tx_descriptor
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adds a new ioatdma driver
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|