Commit c742e79
committed
Use SequentialReductionKernel for tree-reduction as well
1. Renamed misspelled variable
2. If reduction_nelems is small, used SequentialReductionKernel
for tree-reductions as it is done for atomic reduction
3. Tweak scaling down logic for moderately-sized number of elements
to reduce.
We should also use max_wg if the iter_nelems is very small (one),
since choosing max_wg for large iter_nelems may lead to under-
utilization of GPU.1 parent 11ecba8 commit c742e79
1 file changed
+194
-117
lines changed
0 commit comments