Implement staggered inverter and profile it (Feature #735)


Added by Alessandro Sciarra over 2 years ago. Updated over 2 years ago.


Status:In Progress Start date:01 Jun 2015
Priority:Normal Due date:
Assignee:Alessandro Sciarra % Done:

0%

Category:-
Target version:-

Description

At the moment, the inverter cannot deal with the staggered case. It should be straightforward to implement it. Once done, it would be good to profile it as it was done in #717 for the Wilson code.


LayoutProfiling.sh (931 Bytes) Alessandro Sciarra, 02 Jun 2015 10:39 am


Associated revisions

Revision 57fcfd56
Added by Alessandro Sciarra over 2 years ago

Implemented staggered version of inverter executable.
refs #735

Revision 74e1c756
Added by Alessandro Sciarra over 2 years ago

Added correlator case to inverter for staggered fermions (not yet implemented)!
refs #735

Revision 2a35ed60
Added by Alessandro Sciarra over 2 years ago

Fixed bug in staggered inverter (num_tastes needed as command line parameter).
refs #735

History

Updated by Alessandro Sciarra over 2 years ago

  • Start date set to 01 Jun 2015
  • Status changed from New to In Progress

Updated by Alessandro Sciarra over 2 years ago

I copied from a real run from the LOEWE a thermalized configuration (mass=0.025, beta=5.12 on 4x8^3 lattice) and called the inverter executable with

1./inverter --sourcefile=conf.10000 --measure_pbp=1 --sourcetype=volume --sourcecontent=gaussian \
2           --num_sources=1 --pbp_measurements=1 --num_tastes=3 --ntime=4 --nspace=8 
3           --use_cpu=false --use_gpu=true --use_eo=1 --start=continue --mass=0.0250 \
4           --measure_correlators=0 --solver=cg --cgmax=5000 --beta=5.1200 \
5           --theta_fermion_temporal=1 --theta_fermion_spatial=0 --use_chem_pot_im=0 \
6           --fermact=rooted_stagg --enable_profiling=true

in order to get the profile (note that in the staggered case, to calculate the pbp, no rational approximation is needed, but the number of tastes enters the calculation as pre-factor).
The profile obtained is reported here below.

 1                                     #device0            Time[mus]                Calls        Avg_Time[mus]   Avg_Time/Site[mus]                  BW[GB/s]            FLOPS[GFLOP/s]                 Re/Wr[MB]                      FLOP
 2                                      D_KS_eo               155223                 2496                   62                    0          26.0821652461298           9.3856276453876                  1.546875                    583680
 3    scalar_product_real_part_staggered_eoprec                18798                 2495                    7                    0          14.1348824343015           1.6308152463028                 0.1015625                     12287
 4                  saxpy_real_staggered_eoprec                15176                 2494                    6                    0          25.5789478123353          2.01939061676331                 0.1484375                     12288
 5           global_squarenorm_staggered_eoprec                10037                 1248                    8                    0          7.13014964630866           1.6549646308658                 0.0546875                     13310
 6             saxpby_cplx_arg_staggered_eoprec                 8895                 1247                    7                    0          25.2658041596402          6.02933962900506                  0.171875                     43008
 7             saxpby_real_vec_staggered_eoprec                 8877                 1247                    7                    0          23.0154872141489          2.58924231159175                   0.15625                     18432
 8                sax_real_arg_staggered_eoprec                 7598                 1247                    6                    0          17.4783511450382          0.50418320610687                 0.1015625                      3072
 9              saxpy_real_vec_staggered_eoprec                 7571                 1247                    6                    0          25.6363830405495          2.02392497688548                 0.1484375                     12288
10                                    plaquette                 1350                    2                  675                    0          5.24743111111111         0.218453333333333              3.3779296875                    147456
11                          plaquette_reduction                  997                    2                  498                    0        0.0124212637913741          37004501652376.2        0.0059051513671875      18446744073709551615
12                                     polyakov                  182                    1                  182                    0          0.82145054945055          1.68228571428571               0.142578125                    306176
13                           polyakov_reduction                  155                    1                  155                    0        0.0133161290322581           119011252088449        0.0019683837890625      18446744073709551615
14                       convertGaugefieldToSOA                  120                    1                  120                    0                   19.6608                         0                      2.25                         0
15             saxpby_real_arg_staggered_eoprec                   29                    2                   14                    0          11.2993103448276           1.2711724137931                   0.15625                     18432
16              scalar_product_staggered_eoprec                   23                    2                   11                    0          9.97286956521739          2.13686956521739                  0.109375                     24574
17            set_zero_spinorfield_stagg_eoprec                   13                    1                   13                    0          3.78092307692308                         0                  0.046875                         0
18                                  complex_sum                    4                    1                    4                    0                     0.012                    0.0005         4.57763671875e-05                         2
19        set_gaussian_spinorfield_stagg_eoprec                    0                    0                    0                    0                         0                         0                  0.046875                      3072
20            set_cold_spinorfield_stagg_eoprec                    0                    0                    0                    0                         0                         0                  0.046875                      9216
21                sax_real_vec_staggered_eoprec                    0                    0                    0                    0                         0                         0                 0.1015625                      3072
22                    sax_real_staggered_eoprec                    0                    0                    0                    0                         0                         0                 0.1015625                      3072
23              saxpy_real_arg_staggered_eoprec                    0                    0                    0                    0                         0                         0                 0.1484375                     12288
24                  saxpy_cplx_staggered_eoprec                    0                    0                    0                    0                         0                         0                   0.15625                     24576
25              saxpy_cplx_arg_staggered_eoprec                    0                    0                    0                    0                         0                         0                   0.15625                     24576
26                 saxpby_real_staggered_eoprec                    0                    0                    0                    0                         0                         0                   0.15625                     18432
27               saxpbypz_cplx_staggered_eoprec                    0                    0                    0                    0                         0                         0                   0.21875                     49152
28           saxpbypz_cplx_arg_staggered_eoprec                    0                    0                    0                    0                         0                         0                   0.21875                     49152
29                 saxpby_cplx_staggered_eoprec                    0                    0                    0                    0                         0                         0                  0.171875                     43008
30                    sax_cplx_staggered_eoprec                    0                    0                    0                    0                         0                         0                  0.109375                     18432
31                sax_cplx_arg_staggered_eoprec                    0                    0                    0                    0                         0                         0                  0.109375                     18432
32                  global_squarenorm_staggered                    0                    0                    0                    0                         0                         0                  0.109375                     26622
33            convert_staggered_field_to_SoA_eo                    0                    0                    0                    0                         0                         0                   0.09375                         0
34          convert_staggered_field_from_SoA_eo                    0                    0                    0                    0                         0                         0                   0.09375                         0
35                     convertGaugefieldFromSOA                    0                    0                    0                    0                         0                         0                      2.25                         0
36                convert_from_eoprec_staggered                    0                    0                    0                    0                         0                         0                    0.1875                         0
37                convert_from_eoprec_staggered                    0                    0                    0                    0                         0                         0                    0.1875                         0
38                     convert_float_to_complex                    0                    0                    0                    0                         0                         0        2.288818359375e-05                         0
39                          complex_subtraction                    0                    0                    0                    0                         0                         0         4.57763671875e-05                         2
40                                complex_ratio                    0                    0                    0                    0                         0                         0         4.57763671875e-05                        11
41                              complex_product                    0                    0                    0                    0                         0                         0         4.57763671875e-05                         6
42
43 Total time spent in kernels: 0.235048 seconds

and it has been otained with the script attached (for readability reason). Now, if we consider that the call above to the inverter executable took 3.830s, it is clear that there should be some bottleneck in the host code. I will now try to find it just measuring time in different parts of the executed host code.

Also available in: Atom PDF