heatbath kernels has register spilling. (Defect #131)
|duplicates CL2QCD - Defect #132: overrelax kernels have register spilling||New||18 May 2011|
The problem lies almost certainly in the staple. this is a real 3x3 matrix and therefore blows the GPRs almost alone.
One can see that in the "staple test", where only the staple is calculated and one gets (on cypress with APP 2.5):
[12:30:55] DEBUG: Kernel: staple_test - 62 GPRs, 85 scratch registers, 0 bytes statically allocated local memory
The heatbath kernel may fit into the GPRs if one uses other su3-representations, although the staple is still the main problem.
with 1910abb5 I added a Udagger*Vdagger function for su3 matrices. This reduces the scratch register usage in the staple test even further:
[09:06:10] DEBUG: Kernel: staple_test - 64 GPRs, 64 scratch registers, 0 bytes statically allocated local memory
Also, in 5bfcc6c4 I restructured the heatbath- and overrelax kernels via inititalizing variables at the point of use (I guess this should be done by the compiler). This gave ~40 scratch register less. However, this is propably not a real improvement.
In the end, heatbath and overrelax now use:
[09:11:05] DEBUG: Kernel: heatbath_odd - 64 GPRs, 679 scratch registers, 0 bytes statically allocated local memory
[09:01:01] DEBUG: Kernel: overrelax_odd - 64 GPRs, 665 scratch registers, 0 bytes statically allocated local memory
which is anyway around 100 scratch registers less then before...