Work over get_work_sizes (Defect #464)


Added by Christopher Pinke almost 6 years ago. Updated almost 6 years ago.


Status:Feedback Start date:23 Apr 2013
Priority:Normal Due date:
Assignee:Matthias Bach % Done:

20%

Category:-
Target version:-

Description

Currently, the get_work_sizes fcts. return ls, gs and num_groups.

However, kernels only need ls and gs as arguments and calculate num_groups dynamically.
If therefore num_groups is used explicitely, ie to create a local buffer, there may be mismatches if ls or gs have been changed manually after get_work_sizes was called.

Because of this, it should be safer to modify this fct. to return ls and gs only. Then, num_groups has to be calculated if needed and fits to the arguments passed to the kernel.

Matthias also suggested to rename the fct. to something like "provide_work_sizes".


Related issues

related to CL2QCD - Defect #461: Scalar Product broken for larger lattices with commit:14e... Feedback 23 Apr 2013

History

Updated by Christopher Pinke almost 6 years ago

I have a question to the usage of num_groups as an arg to a kernel.

Would it not be saver to use the OpenCL own "get_num_groups(0)"? This is always the actual gs/ls...

  • Status changed from New to Feedback
  • Assignee set to Matthias Bach

Updated by Christopher Pinke almost 6 years ago

Alessandro and I thought of a solution to the problem:

The idea is that each kernel-calling fct. can stay as it is, it still uses

get_work_sizes(<kernelname>, &ls, &gs, &num_groups);

However, in the module, this fct. is replaced by something like:

void get_work_sizes(const cl_kernel kernel, size_t * ls, size_t * gs, cl_uint * num_groups) const
{
    this->provide_work_sizes(kernel, ls, gs);
    Opencl_Module::calc_num_groups(num_groups, ls, gs);
}

with

void provide_work_sizes(const cl_kernel kernel, size_t * ls, size_t * gs) const
{
        Opencl_Module::provide_work_sizes(ls, gs);
        /*
         * Block with potentially adjusting ls and gs to specific kernels
         */
        //check that ls is a multiple of gs
        if((*gs)/(*ls) != 0){
             throw hardware::OpenclException(1, "provide_work_sizes", __FILE__, __LINE__);
        }
}

and

void calc_num_groups(cluint * num_groups, size_t * ls, size_t * gs) const
{
         (*num_groups) = (*gs)/(*ls);
         //check that ls * num_groups = gs is valid
         if( (*ls) * (*num_groups) != (*gs) )
               throw hardware::OpenclException(1, "calc_num_groups", __FILE__, __LINE__);
}

In the opencl_module the fct "get_work_sizes" is replaced by "provide_work_sizes", one can also safe one argument here.

In this way it is ensured that ls, gs and num_groups are always in the correct relation, and also possible sick stuff like ls=17 is prevented.

@Matthias: What about the exception? Is this the right one here?

  • % Done changed from 0 to 20

Also available in: Atom PDF