Profile the rhmc executable (Feature #733)


Added by Alessandro Sciarra over 2 years ago. Updated over 2 years ago.


Status:In Progress Start date:20 May 2015
Priority:Normal Due date:
Assignee:Alessandro Sciarra % Done:

0%

Category:-
Target version:-

Description

The quality of the performance of the rhmc executable in physical scenarios is not really clear. One should discover where most of the time is spent and find out which part of the code should be optimized. A good starting point could be with the attached input file (rhmcInputForProfile) that is one of the first run made with the staggered code. At the moment around 2 minutes are needed to do one trajectory measuring the chiral condensate.


rhmcInputForProfile (939 Bytes) Alessandro Sciarra, 20 May 2015 11:28 am


Related issues

related to CL2QCD - Feature #563: Optimize D_KS even-odd staggered In Progress 12 Dec 2013

Associated revisions

Revision 3c27ca64
Added by Alessandro Sciarra over 2 years ago

Fixed name kernels in get_flop/read_write and kernel init/release for spinors_staggered.cpp file.
refs #733

Revision be0437bc
Added by Alessandro Sciarra over 2 years ago

Fixed name kernels in get_flop/read_write around in staggered code.
refs #733

History

Updated by Alessandro Sciarra over 2 years ago

Running the rhmc via time ./rhmc rhmc.input --beta=5.1200 it takes

1real    1m57.496s
2user    1m10.435s
3sys     0m32.769s

on the AMD Radeon HD 7970.

  • Description changed from The quality of the performance of the rhmc executable in physical scenarios i... to The quality of the performance of the rhmc executable in physical scenarios i... More
  • File rhmcInputForProfile added

Updated by Alessandro Sciarra over 2 years ago

After having applied the changeset 47443874ba9a5fe41c5ac103a0c9dfd4f926b531 the command time ./rhmc rhmc.input --beta=5.1200 takes

1real    1m40.088s
2user    0m54.351s
3sys     0m32.815s

always on on the AMD Radeon HD 7970.

Updated by Alessandro Sciarra over 2 years ago

  • Start date set to 20 May 2015
  • Status changed from New to In Progress

Updated by Alessandro Sciarra over 2 years ago

The profiling has been done for one RHMC trajectory and it reads

 1                                    #device_0     Time[mus]         Calls      Avg_Time[mus]    Avg_Time/Site[mus]                  BW[GB/s]            FLOPS[GFLOP/s]                 Re/Wr[MB]                      FLOP
 2                                      D_KS_eo       6538861        106904                 61                     0          26.5183796480763          9.54259873699716                  1.546875                    583680
 3           global_squarenorm_staggered_eoprec        999483        132838                  7                     0          7.62140253711169          1.76898834697539                 0.0546875                     13310
 4             saxpby_real_vec_staggered_eoprec        894897        130186                  6                     0          23.8347812541555          2.68141289109249                   0.15625                     18432
 5              saxpy_real_vec_staggered_eoprec        866552        130186                  6                     0          23.3836982985441           1.8460814446219                 0.1484375                     12288
 6                sax_real_arg_staggered_eoprec        784883        130186                  6                     0          17.6641464473049         0.509542685979949                 0.1015625                      3072
 7    scalar_product_real_part_staggered_eoprec        745399        101629                  7                     0          14.5198504210497          1.67523101453047                 0.1015625                     12287
 8                  saxpy_real_staggered_eoprec        610641        101570                  6                     0          25.8894626466287          2.04390494578648                 0.1484375                     12288
 9             saxpby_cplx_arg_staggered_eoprec        384186         54553                  7                     0          25.5911456221726          6.10697793256391                  0.171875                     43008
10                                  gauge_force        211264           881                239                     0           95.652953650409          92.8175268100575                    21.875                  22257664
11           fermion_staggered_partial_force_eo         58439           656                 89                     0          13.9776639572888          21.5642249867383                    1.1875                   1921024
12                         md_update_gaugefield         54179           880                 61                     0          46.8364849849573          96.2010756935344                      2.75                   5922816
13                sax_cplx_arg_staggered_eoprec         17709          2489                  7                     0          16.1193987238128          2.59061765204133                  0.109375                     18432
14                          gaugemomentum_saxpy         14363          1250                 11                     0          136.885051869387          11.4070876557822                       1.5                    131072
15            create_volume_source_stagg_eoprec          3092            32                 96                     0           190910676053915           190910676053915            17592186044416      18446744073709551615
16                                    plaquette          2234             7                319                     0          11.0985282005372         0.462037600716204              3.3779296875                    147456
17                          plaquette_reduction          1761             7                251                     0        0.0246132879045997          73326069571815.4        0.0059051513671875      18446744073709551615
18               generate_gaussian_gaugemomenta           872             1                872                     0         0.601247706422018                         0                       0.5                         0
19              saxpy_cplx_arg_staggered_eoprec           833           135                  6                     0          26.5527010804322          3.98290516206483                   0.15625                     24576
20              scalar_product_staggered_eoprec           398            39                 10                     0          11.2382713567839          2.40800502512563                  0.109375                     24574
21                                     polyakov           282             3                 94                     0          1.59046808510638           3.2571914893617               0.142578125                    306176
22                           polyakov_reduction           252             3                 84                     0        0.0245714285714286           219604096115590        0.0019683837890625      18446744073709551615
23            set_zero_spinorfield_stagg_eoprec           238           374                  0                     0          77.2388571428571                         0                  0.046875                         0
24             saxpby_real_arg_staggered_eoprec           195            32                  6                     0          26.8865641025641          3.02473846153846                   0.15625                     18432
25                       convertGaugefieldToSOA           148             2                 74                     0          31.8823783783784                         0                      2.25                         0
26        set_gaussian_spinorfield_stagg_eoprec           113             1                113                     0         0.434973451327434        0.0271858407079646                  0.046875                      3072
27                                  complex_sum            53            16                  3                     0        0.0144905660377358       0.00060377358490566         4.57763671875e-05                         2
28            convert_staggered_field_to_SoA_eo            51             6                  8                     0          11.5651764705882                         0                   0.09375                         0
29                     convertGaugefieldFromSOA            34             1                 34                     0          69.3910588235294                         0                      2.25                         0
30                     gaugemomentum_squarenorm            31             2                 15                     0          38.0531612903226           63.421935483871                    0.5625                    983040
31                       set_zero_gaugemomentum             0             0                  0                     0                         0                         0                         0                         0
32            set_cold_spinorfield_stagg_eoprec             0             0                  0                     0                         0                         0                  0.046875                      9216
33                sax_real_vec_staggered_eoprec             0             0                  0                     0                         0                         0                 0.1015625                      3072
34                    sax_real_staggered_eoprec             0             0                  0                     0                         0                         0                 0.1015625                      3072
35              saxpy_real_arg_staggered_eoprec             0             0                  0                     0                         0                         0                 0.1484375                     12288
36                  saxpy_cplx_staggered_eoprec             0             0                  0                     0                         0                         0                   0.15625                     24576
37                 saxpby_real_staggered_eoprec             0             0                  0                     0                         0                         0                   0.15625                     18432
38               saxpbypz_cplx_staggered_eoprec             0             0                  0                     0                         0                         0                   0.21875                     49152
39           saxpbypz_cplx_arg_staggered_eoprec             0             0                  0                     0                         0                         0                   0.21875                     49152
40                 saxpby_cplx_staggered_eoprec             0             0                  0                     0                         0                         0                  0.171875                     43008
41                    sax_cplx_staggered_eoprec             0             0                  0                     0                         0                         0                  0.109375                     18432
42                  global_squarenorm_staggered             0             0                  0                     0                         0                         0                  0.109375                     26622
43                 gaugemomentum_convert_to_soa             0             0                  0                     0                         0                         0            17592186044416      18446744073709551615
44               gaugemomentum_convert_from_soa             0             0                  0                     0                         0                         0            17592186044416      18446744073709551615
45                           fermion_force_eo_3             0             0                  0                     0                         0                         0                         0                         0
46                           fermion_force_eo_2             0             0                  0                     0                         0                         0                         0                         0
47                           fermion_force_eo_1             0             0                  0                     0                         0                         0                         0                         0
48                           fermion_force_eo_0             0             0                  0                     0                         0                         0                         0                         0
49                                fermion_force             0             0                  0                     0                         0                         0                      9.25                   1789952
50          convert_staggered_field_from_SoA_eo             0             0                  0                     0                         0                         0                   0.09375                         0
51                convert_from_eoprec_staggered             0             0                  0                     0                         0                         0                    0.1875                         0
52                convert_from_eoprec_staggered             0             0                  0                     0                         0                         0                    0.1875                         0
53                     convert_float_to_complex             0             0                  0                     0                         0                         0        2.288818359375e-05                         0
54                          complex_subtraction             0             0                  0                     0                         0                         0         4.57763671875e-05                         2
55                                complex_ratio             0             0                  0                     0                         0                         0         4.57763671875e-05                        11
56                              complex_product             0             0                  0                     0                         0                         0         4.57763671875e-05                         6

Now, summing up the time spent (the column Time), one gets 12.19 seconds. Since the code runs for 1 minute 40 seconds, it is clear that most of the time is spent on the CPU and not on the GPU. I will start to investigate this implementing an inverter executable for the staggered formulation and profiling it (#735). I guess the most of the time is spent there and, in any case, it is a good opportunity to refactor that part of the code.

Also available in: Atom PDF