heatbath kernels has register spilling. (Defect #131)


Added by Matthias Bach almost 8 years ago. Updated over 7 years ago.


Status:New Start date:18 May 2011
Priority:Normal Due date:
Assignee:Matthias Bach % Done:

0%

Category:-
Target version:- Estimated time:4.00 hours

Related issues

duplicates CL2QCD - Defect #132: overrelax kernels have register spilling New 18 May 2011

History

Updated by Christopher Pinke over 7 years ago

The problem lies almost certainly in the staple. this is a real 3x3 matrix and therefore blows the GPRs almost alone.

One can see that in the "staple test", where only the staple is calculated and one gets (on cypress with APP 2.5):

[12:30:55] DEBUG: Kernel: staple_test - 62 GPRs, 85 scratch registers, 0 bytes statically allocated local memory

The heatbath kernel may fit into the GPRs if one uses other su3-representations, although the staple is still the main problem.

Updated by Christopher Pinke over 7 years ago

with 1910abb5 I added a Udagger*Vdagger function for su3 matrices. This reduces the scratch register usage in the staple test even further:

[09:06:10] DEBUG: Kernel: staple_test - 64 GPRs, 64 scratch registers, 0 bytes statically allocated local memory

Also, in 5bfcc6c4 I restructured the heatbath- and overrelax kernels via inititalizing variables at the point of use (I guess this should be done by the compiler). This gave ~40 scratch register less. However, this is propably not a real improvement.

In the end, heatbath and overrelax now use:

[09:11:05] DEBUG: Kernel: heatbath_odd - 64 GPRs, 679 scratch registers, 0 bytes statically allocated local memory
[09:01:01] DEBUG: Kernel: overrelax_odd - 64 GPRs, 665 scratch registers, 0 bytes statically allocated local memory

which is anyway around 100 scratch registers less then before...

Also available in: Atom PDF