HMC hangs for large lattices on Tahiti (Defect #366)
When using sufficiently large lattices the HMC will get stuck on Tahiti.
Sufficiently large lattices are 32^3x16 and up.
The hang seems to occur in the kernel $gauge_force_tlsym$.
The problem cannot be reproduced when running the kernel in the standalone test.
The problem cannot be solved by removing the compile time work group size definition.
It seems running only even or odd sides the kernel does not hang up.
One can split even and odd sites onto different threads and the hang will go away.
The application will also hang on the $gaugefield_zero$ kernel.
Replacing it by $clEnqueueFillBuffer$ will avoid that, but one loses compatibility to OpenCL 1.0.
Major drawback of the solution so far: If the kernels are not in lockstep there is a NaN resulting from the gauge force tlsym kernel
I found the NaN, a bug introduced during the fix.
It seems the hangs can be circumvented now. Performance has to be rechecked, though.
- % Done changed from 0 to 50
For lattices of size 48^3x8 the hang still occurs... :(
- Priority changed from Normal to High
- Target version deleted (
- Status changed from New to In Progress
Also available in: