Proper handling of bandwidth-optimized gaugefield storage (Feature #266)
For optimium performance of the dslash kernel the gaugefield has be be stored in a resorted and SOA fashion. For this it is copied into a special buffer which is used by the dslash kernel. Because of this all parts of the application using the dslash kernel have to make sure the correct version of the gaugefield is copied into this buffer. This is of course prone for error. Therefore it would be better to always directly store the gaugefield in the correct form. However this form might be different for different devices, while currently the main gaugefield is agnostic of the device.
An additional benefit of all kernels working on a properly formatted gaugefield should be performance gains in those kernels and of course the gain of not having to do conversions.