Merge saxpy and gamma5 (Feature #726)


Added by Christopher Pinke almost 3 years ago. Updated almost 3 years ago.


Status:In Progress Start date:03 Dec 2014
Priority:Normal Due date:
Assignee:Francesca Cuteri % Done:

0%

Category:-
Target version:-

Associated revisions

Revision 6ffb6ac7
Added by Francesca Cuteri almost 3 years ago

added first test
refs #726

Revision deec7178
Added by Francesca Cuteri almost 3 years ago

added tests for gamma5AndSaxpy
refs #726

Revision 9bc53ff8
Added by Francesca Cuteri almost 3 years ago

working on test
refs #726

Revision f2ed6bff
Added by Francesca Cuteri almost 3 years ago

added new createParameter fct. for tests
refs #726

Revision ed581a3a
Added by Francesca Cuteri almost 3 years ago

refactoring
refs #726

Revision 324383cc
Added by Francesca Cuteri almost 3 years ago

worked over tests
refs #726

Revision 355769ee
Added by Francesca Cuteri almost 3 years ago

implemented new kernel
refs #726

Revision 965866d2
Added by Francesca Cuteri almost 3 years ago

added new kernel
refs #726

Revision 3deea37e
Added by Francesca Cuteri almost 3 years ago

worked over test
refs #726

Revision 9d324f55
Added by Francesca Cuteri almost 3 years ago

worked over merged test
refs #726

Revision 2aff90a3
Added by Francesca Cuteri almost 3 years ago

worked over fermion merged test
refs #726

Revision f1057a49
Added by Francesca Cuteri almost 3 years ago

saxpy and gamma5 merged
refs #726, #717

History

Updated by Christopher Pinke almost 3 years ago

To add the merged kernel, tests have to be added to fermions_merged_kernels_test.cpp, in similar fashion as the existing tests.

Updated by Christopher Pinke almost 3 years ago

Francesca, please start with the implementation.
To start, look at the existing tests in the file given above and also at the tests for the original kernels.
The tests for the merged kernel should more or less cover all these testcases, too.
If you have any questions, we can look at the code together.

  • Assignee changed from Christopher Pinke to Francesca Cuteri
  • Status changed from New to In Progress

Updated by Christopher Pinke almost 3 years ago

I tested the performance of the fermionmatrix with the setup given in #717.
I simply commented everything out of the solver except of the fermionmatrix application, and reported on the performance after 2000 iterations.

The result is that the fermionmatrix (which is QplusQminus in this case) performs at ~80 Gflops:

1[11:39:32] INFO:     SOLVER [CG] [002001]:    CG completed in 9302 ms @ 81.055 Gflops. Performed 2001 iterations. Performance after warmup: 80.975 Gflops

However, the dslash alone achieves ~110 Gflops. The difference of 30 Gflops should be caused by gamma5, which makes lousy 3 Gflops, and the saxpy operation.

Given that the complete CG performs at ~70 Gflops for this setup, this seems to indicate that merging the saxpy and gamma5 kernels could indeed give a visible speedup to the inverter.
Actually, simply leaving out the gamma5 from the fermion matrix gives ~9 Gflops more, which could be the benefit of the merging (in case the merging works "perfectly"):

111:47:30] INFO:     SOLVER [CG] [002001]:    CG completed in 8462 ms @ 89.102 Gflops. Performed 2001 iterations. Performance after warmup: 89.014 Gflops.

Still this would mean that one is loosing 20 Gflops compared to the single dslash. In case the saxpy cannot be accelerated anymore, one could then think about merging the gamma5 and saxpy operation with (1+dslash)...

Updated by Christopher Pinke almost 3 years ago

  • Tracker changed from Defect to Feature

Also available in: Atom PDF