Scalar Product broken for larger lattices with commit:14e176e6 (Defect #461)

Added by Christopher Pinke over 6 years ago. Updated over 6 years ago.

Status:Feedback Start date:23 Apr 2013
Priority:Normal Due date:
Assignee:Matthias Bach % Done:


Target version:-


In 14e176e6 the reduction of the scalar product is broken.
This can be seen for larger lattices only, in my tests this was the case for Ns=Nt=12, whereas 8 did not show any errors.

In the commit, for some kernels the local thread size ls is set to 64. Simulatenously, the number of groups is introduced as an explicit parameter in scalar_product_reduction, however, without adjusting it to the new group size.

In my tests, I therefore had something similar to ls,gs,num_groups = 64,16384,128. Here, the num_groups is not gs/ls anymore, it would have to be 256.

The kernel itself gets ls and gs as args only and computes the actual num_groups on the device out of that. In this case this means that only half of the reduction is performed, consistent with the values I get.

To fix this, one can either add

*num_groups = (*gs)/(*ls);

to the get_work_sizes fct. or re-introduce
for (int i = 1; i < get_num_groups(0); i++) {

in the kernel.

Related issues

related to CL2QCD - Defect #464: Work over get_work_sizes Feedback 23 Apr 2013

Associated revisions

Revision 3721479d
Added by Christopher Pinke over 6 years ago

added consistent num_groups to get_work_sizes in spinor module
adjusted scalar product reduction not to rely on num_groups argument
refs #461

Revision 26f69fa0
Added by Christopher Pinke over 6 years ago

corrected dangerous adjustement of localsize for gm squarenorm kernel
refs #461


Also available in: Atom PDF