Proper implementation of reductions (Feature #102)

Added by Matthias Bach over 8 years ago. Updated over 8 years ago.

Status:New Start date:16 Mar 2011
Priority:Normal Due date:
Assignee:Matthias Bach % Done:


Target version:- Estimated time:12.00 hours


Updated by Christopher Pinke over 8 years ago

I implemented a reduction where needed. On each kernel there is a collection of data on the local level with currently two versions: one is a loop and the other is explicitly coded. In the latter it is assumed that the local_work_size is not bigger than 128, which of course has to be adjusted if this is not the case. Perhaps the loops is the saver solution.
After that, a kernel is called where thread 0 collects the local_data on a global level.

Updated by Matthias Bach over 8 years ago

This is not really an efficient reduction method. In addition local_work_size should be adjusted to the kernel, not the kernal to the work-size.

Also available in: Atom PDF