Commit 71e891c
Check in of generic reduction templates and some reductions (#1399)
* Implements necessary sycl utilities for custom reductions
* Implements dpctl.tensor.max and dpctl.tensor.min
* Adds tests for min and max
* Reductions now set max_wg to the minimum of the max work group size and 2048
- This prevents running out of resources when using local memory on CPU
* max and min nan propagation fixed for CPU devices
- drops use of fetch_max/fetch_min for floats, which do not handle nans correctly
* Tweak to test_reduction_kernels
* Implements dpctl.tensor.argmax and argmin
* Tests for argmin and argmax
Also fixes argmin and argmax for scalar inputs
* Argmin and argmax now handle identities correctly
Adds a test for this behavior
Fixed a typo in argmin and argmax causing shared local memory variant to be used for more types than expected
* Replaced `std::min` with `idx_reduction_op_`
* reductions now well-behaved for size-zero arrays
- comparison and search reductions will throw an error in this case
- slips in change to align sum signature with array API spec
* removed unnecessary copies in reduction templates
* Refactors sum to use generic reduction templates
* Sum now uses a generic Python API
* Docstrings added for argmax, argmin, max, and min
* Small reduction clean-ups
Removed unnecessary copies in custom_reduce_over_group
Sequential reduction now casts before calling operator (makes behavior explicit rather than implicit)
* Added test for argmin with keepdims=True
* Added a test for raised errors in reductions
Also removed unused `_usm_types` in `test_tensor_sum`
* Removed `void` overloads from reduction utilities
These were unused by dpctl
* Added missing include, Identity to use has_known_identity
Implementation of Identity trait should call sycl::known_identity
if trait sycl::has_known_identity is a true_type.
Added IsMultiplies, and identity value for it, since sycl::known_identity
for multiplies is only defined for real-valued types.
* Adding functor factories for product over axis
* Added Python API for _prod_over_axis
* Common reduction template takes functions to test if atomics are applicable
Passing these function pointers around allows to turn atomic off altogether
if desired.
Use custom trait to check if reduce_over_groups can be used. This allows to
work-around bug, or switch to custom code for reduction over group if desired.
Such custom trait type works around issue with incorrect result returned from
sycl::reduce_over_group for sycl::multiplies operator for 64-bit integral types.
* Defined dpctl.tensor.prod
Also tweaked docstring for sum.
* Added tests for dpt.prod, removed uses of numpy
* Corrected prod docstring
Small tweaks to sum, min, and max docstrings
---------
Co-authored-by: Oleksandr Pavlyk <oleksandr.pavlyk@intel.com>1 parent caa0939 commit 71e891c
File tree
11 files changed
+3759
-434
lines changed- dpctl
- tensor
- libtensor
- include
- kernels
- utils
- source
- tests
11 files changed
+3759
-434
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
53 | 52 | | |
| 53 | + | |
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| 63 | + | |
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
160 | 160 | | |
161 | 161 | | |
162 | 162 | | |
163 | | - | |
| 163 | + | |
164 | 164 | | |
165 | 165 | | |
166 | 166 | | |
| |||
309 | 309 | | |
310 | 310 | | |
311 | 311 | | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
312 | 317 | | |
0 commit comments