Refactor CG-M and speed it up (Feature #732)
Some work on the multi-shifted inverter has been done in the Feature #562 but still this part of the code is far from being well implemented. The CG-M should be refactored. Maybe it makes sense to implement a class that does what at the moment is done in the function physics::algorithms::solvers::cg_m.
Improved CG-M performance (avoided GPU-host comm. and do not check always residuum).
After several try I found out two points in the code that for sure were slowing down the code.
In the CG-M, each equation is solved at the same time, but when any reaches the desired precision (residuum per equation) then it is not considered any more. When all are below precision then the overall residuum is checked. Now, we know from the standard CG that it is convenient not to check the convergence at each iteration, but every some. This in the CG-M should be done not on the overall residuum but on the residuum per equation, since this is basically a call to the squarenorm kernel. This was not implemented and it brings already something if used properly.
The second even more important point is that I forgot to change a sax call like
1void sax(const Staggeredfield_eo* out, const hmc_float alpha, const Staggeredfield_eo& x);
to one like
1void sax(const Staggeredfield_eo* out, const Vector<hmc_float>& alpha, const int index_alpha, const Staggeredfield_eo& x);
in the case when alpha is a Vector (stored on the device). This means that I was doing an unnecessary communication host-device.
- Start date set to 03 Jun 2015
- % Done changed from 0 to 30