Compiler-Behavior in saxsbypz kernel (Unit Test #201)


Added by Christopher Pinke almost 8 years ago.


Status:Feedback Start date:28 Sep 2011
Priority:Normal Due date:
Assignee:Matthias Bach % Done:

0%

Category:-
Target version:-

Description

I observed a strange compiler-behaviour in the saxsbypz kernel some time ago.

The essential part of the code is

spinor x_tmp = x[id_tmp];
spinor y_tmp = y[id_tmp];
spinor z_tmp = z[id_tmp];
x_tmp = spinor_times_complex(x_tmp, alpha_tmp);
y_tmp = spinor_times_complex(y_tmp, beta_tmp);
out[id_tmp] = spinor_acc_acc(y_tmp, x_tmp, z_tmp);

if this is changed to

spinor x_tmp = x[id_tmp];
x_tmp = spinor_times_complex(x_tmp, alpha_tmp);
spinor y_tmp = y[id_tmp];
y_tmp = spinor_times_complex(y_tmp, beta_tmp);
spinor z_tmp = z[id_tmp];
out[id_tmp] = spinor_acc_acc(y_tmp, x_tmp, z_tmp);

one could see a enormous change in the register usage.

I implemented a test case ("saxsbypz") which simply compiles both kernels using single and double precision.

The outcome is interesting, since it depends on the AMD-SDK version used:

AMD-APP 2.4:

[11:42:23] INFO: init saxsbypz kernels to see if the compiler does strange things...
[11:42:23] INFO: hmc_float = double
[11:42:24] DEBUG: Kernel: saxsbypz_1 - 64 GPRs, 17 scratch registers, 0 bytes statically allocated local memory
[11:42:25] DEBUG: Kernel: saxsbypz_2 - 34 GPRs, 0 scratch registers, 0 bytes statically allocated local memory
[11:42:25] INFO: hmc_float = single
[11:42:25] DEBUG: Kernel: saxsbypz_1 - 22 GPRs, 0 scratch registers, 0 bytes statically allocated local memory
[11:42:25] DEBUG: Kernel: saxsbypz_2 - 22 GPRs, 0 scratch registers, 0 bytes statically allocated local memory

AMD-APP 2.5:

[11:52:10] INFO: init saxsbypz kernels to see if the compiler does strange things...
[11:52:10] INFO: hmc_float = double
[11:52:11] DEBUG: Kernel: saxsbypz_1 - 40 GPRs, 0 scratch registers, 0 bytes statically allocated local memory
[11:52:12] DEBUG: Kernel: saxsbypz_2 - 40 GPRs, 0 scratch registers, 0 bytes statically allocated local memory
[11:52:12] INFO: hmc_float = single
[11:52:13] DEBUG: Kernel: saxsbypz_1 - 23 GPRs, 0 scratch registers, 0 bytes statically allocated local memory
[11:52:13] DEBUG: Kernel: saxsbypz_2 - 23 GPRs, 0 scratch registers, 0 bytes statically allocated local memory

The overall behaviour of 2.5 is obviously better, but still the number of registers grow. This is kind of strange...


History

Also available in: Atom PDF