USE_ASYNC_COPY in cg_singledev (Defect #723)
This feature is currently not implemented, what is the reason for that?
Also, it appears only in the cg_singledev, not in cg_multidev. Shouldn`t it be the other way around?
I am unaware of it not being implemented. I would have to check into that.
The feature only makes sense in the single device scenario. The copy refers to copying data between host and device, not between devices. Here, residual copy was overlapped with other calculations. In the multi-device scenario this does not make sense because inside the matrix-vector multiplication there is a "synchroneous" copy anyhow. (The copy might still overlap with parts of the matrix-vector multiplication.)