Skip to content

Commit 5b37b84

Browse files
Document chunked arrays (#2102)
Add xchunked_array documentation
1 parent a377fb8 commit 5b37b84

File tree

6 files changed

+139
-6
lines changed

6 files changed

+139
-6
lines changed

docs/source/api/chunked_array.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
.. Copyright (c) 2016, Johan Mabille, Sylvain Corlay and Wolf Vollprecht
2+
3+
Distributed under the terms of the BSD 3-Clause License.
4+
5+
The full license is in the file LICENSE, distributed with this software.
6+
7+
chunked_array
8+
=============
9+
10+
Defined in ``xtensor/xchunked_array.hpp``
11+
12+
.. doxygenfunction:: xt::chunked_array
13+
:project: xtensor

docs/source/api/container_index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ xexpression API is actually implemented in ``xstrided_container`` and ``xcontain
1818
xiterable
1919
xarray
2020
xarray_adaptor
21+
chunked_array
2122
xtensor
2223
xtensor_adaptor
2324
xfixed

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ for details.
8787
view
8888
quickref/iterator
8989
quickref/manipulation
90+
quickref/chunked_arrays
9091

9192
.. toctree::
9293
:caption: API REFERENCE

docs/source/quickref/basic.rst

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Tensor types
1313
- ``xarray<T>``: tensor that can be reshaped to any number of dimensions.
1414
- ``xtensor<T, N>``: tensor with a number of dimensions set to ``N`` at compile time.
1515
- ``xtensor_fixed<T, xshape<I, J, K>``: tensor whose shape is fixed at compile time.
16+
- ``xchunked_array<CS>``: chunked array using the ``CS`` chunk storage.
1617

1718
.. note::
1819

@@ -28,7 +29,7 @@ Tensor with dynamic shape:
2829
2930
#include "xarray.hpp"
3031
31-
xt::xarray<double>::shape_type shape = {2, 3};
32+
xt::xarray<double>::shape_type shape = {2, 3};
3233
xt::xarray<double> a0(shape);
3334
xt::xarray<double> a1(shape, 2.5);
3435
xt::xarray<double> a2 = {{1., 2., 3.}, {4., 5., 6.}};
@@ -40,12 +41,12 @@ Tensor with static number of dimensions:
4041
4142
#include "xtensor.hpp"
4243
43-
xt::xtensor<double, 2>::shape_type shape = {2, 3};
44+
xt::xtensor<double, 2>::shape_type shape = {2, 3};
4445
xt::xtensor<double, 2> a0(shape);
4546
xt::xtensor<double, 2> a1(shape, 2.5);
4647
xt::xtensor<double, 2> a2 = {{1., 2., 3.}, {4., 5., 6.}};
4748
auto a3 = xt::xtensor<double, 2>::from_shape(shape);
48-
49+
4950
Tensor with fixed shape:
5051

5152
.. code::
@@ -54,6 +55,16 @@ Tensor with fixed shape:
5455
5556
xt::xtensor_fixed<double, xt::xshape<2, 3>> = {{1., 2., 3.}, {4., 5., 6.}};
5657
58+
In-memory chunked tensor with dynamic shape:
59+
60+
.. code::
61+
62+
#include "xtensor/xchunked_array.hpp"
63+
64+
std::vector<std::size_t> shape = {10, 10, 10};
65+
std::vector<std::size_t> chunk_shape = {2, 3, 4};
66+
auto a = xt::chunked_array<double>(shape, chunk_shape);
67+
5768
Output
5869
------
5970

@@ -234,4 +245,3 @@ The underlying 1D data buffer can be accessed with the ``data`` method:
234245
a.data()[4] = 8.;
235246
std::cout << a << std::endl;
236247
// Outputs {{1., 2., 3.}, {8., 5., 6.}}
237-
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
.. Copyright (c) 2016, Johan Mabille, Sylvain Corlay and Wolf Vollprecht
2+
3+
Distributed under the terms of the BSD 3-Clause License.
4+
5+
The full license is in the file LICENSE, distributed with this software.
6+
7+
Chunked arrays
8+
==============
9+
10+
Motivation
11+
----------
12+
13+
Arrays can be very large and may not fit in memory. In this case, you may not be
14+
able to use an in-memory array such as an ``xarray``. A solution to this problem
15+
is to cut up the large array into many small arrays, called chunks. Not only do
16+
the chunks fit comfortably in memory, but this also allows to process them in
17+
parallel, including in a distributed environment (although this is not supported
18+
yet).
19+
20+
Formats for the storage of arrays such as `Zarr <https://zarr.readthedocs.io>`_
21+
specifically target chunked arrays. Such formats are becoming increasingly
22+
popular in the field of big data, since the chunks can be stored in the cloud.
23+
24+
In-memory chunked arrays
25+
------------------------
26+
27+
This may not look very useful at first sight, since each chunk (and thus the
28+
whole array) is hold in memory. It means that it cannot work with very large
29+
arrays, but it may be used to parallelize an algorithm, by processing several
30+
chunks at the same time.
31+
32+
An in-memory chunked array has the following type:
33+
34+
.. code::
35+
36+
#include "xtensor/xchunked_array.hpp"
37+
38+
using data_type = double;
39+
// don't use this code:
40+
using inmemory_chunked_array = xt::xchunked_array<xarray<xarray<data_type>>>;
41+
42+
But you should not directly use this type to create a chunked array. Instead,
43+
use the `chunked_array` factory function:
44+
45+
.. code::
46+
47+
#include "xtensor/xchunked_array.hpp"
48+
49+
std::vector<std::size_t> shape = {10, 10, 10};
50+
std::vector<std::size_t> chunk_shape = {2, 3, 4};
51+
auto a = xt::chunked_array<double>(shape, chunk_shape);
52+
// a is an in-memory chunked array
53+
// each chunk is an xarray<double>, and chunks are hold in an xarray
54+
// thus a is an xarray of xarray<double> elements
55+
a(3, 9, 2) = 1.; // this will address the chunk of index (1, 3, 0)
56+
// and in this chunk, the element of index (1, 0, 2)
57+
58+
Chunked arrays implement the full semantic of ``xarray``, including lazy
59+
evaluation.
60+
61+
Stored chunked arrays
62+
---------------------
63+
64+
These are arrays whose chunks are stored on a file system, allowing for
65+
persistence of data. In particular, they are used as a building block for the
66+
`xtensor-zarr <https://github.com/xtensor-stack/xtensor-zarr>`_ library.
67+
68+
For further dedails, please refer to the documentation
69+
of `xtensor-io <https://xtensor-io.readthedocs.io>`_.

include/xtensor/xchunked_array.hpp

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -220,13 +220,52 @@ namespace xt
220220
template<class E>
221221
constexpr bool is_chunked(const xexpression<E>& e);
222222

223+
/**
224+
* Creates an in-memory chunked array.
225+
* This function returns an uninitialized ``xchunked_array<xarray<T>>``.
226+
*
227+
* @tparam T The type of the elements (e.g. double)
228+
* @tparam L The layout_type of the array
229+
* @tparam EXT The type of the array extension (default: empty_extension)
230+
*
231+
* @param shape The shape of the array
232+
* @param chunk_shape The shape of a chunk
233+
* @param chunk_memory_layout The layout of each chunk (default: XTENSOR_DEFAULT_LAYOUT)
234+
*
235+
* @return returns a ``xchunked_array<xarray<T>>`` with the given shape, chunk shape and memory layout.
236+
*/
223237
template <class T, layout_type L = XTENSOR_DEFAULT_LAYOUT, class EXT = empty_extension, class S>
224238
xchunked_array<xarray<xarray<T>>, EXT> chunked_array(S&& shape, S&& chunk_shape, layout_type chunk_memory_layout = XTENSOR_DEFAULT_LAYOUT);
225239

240+
/**
241+
* Creates an in-memory chunked array.
242+
* This function returns a ``xchunked_array<xarray<T>>`` initialized from an expression.
243+
*
244+
* @tparam L The layout_type of the array
245+
* @tparam EXT The type of the array extension (default: empty_extension)
246+
*
247+
* @param e The expression to initialize the chunked array from
248+
* @param chunk_shape The shape of a chunk
249+
* @param chunk_memory_layout The layout of each chunk (default: XTENSOR_DEFAULT_LAYOUT)
250+
*
251+
* @return returns a ``xchunked_array<xarray<T>>`` from the given expression, with the given chunk shape and memory layout.
252+
*/
226253
template <layout_type L = XTENSOR_DEFAULT_LAYOUT, class EXT = empty_extension, class E, class S>
227254
xchunked_array<xarray<xarray<typename E::value_type>>, EXT>
228255
chunked_array(const xexpression<E>& e, S&& chunk_shape, layout_type chunk_memory_layout = XTENSOR_DEFAULT_LAYOUT);
229256

257+
/**
258+
* Creates an in-memory chunked array.
259+
* This function returns a ``xchunked_array<xarray<T>>`` initialized from an expression.
260+
*
261+
* @tparam L The layout_type of the array
262+
* @tparam EXT The type of the array extension (default: empty_extension)
263+
*
264+
* @param e The expression to initialize the chunked array from
265+
* @param chunk_memory_layout The layout of each chunk (default: XTENSOR_DEFAULT_LAYOUT)
266+
*
267+
* @return returns a ``xchunked_array<xarray<T>>`` from the given expression, with the expression's chunk shape and the given memory layout.
268+
*/
230269
template <layout_type L = XTENSOR_DEFAULT_LAYOUT, class EXT = empty_extension, class E>
231270
xchunked_array<xarray<xarray<typename E::value_type>>, EXT>
232271
chunked_array(const xexpression<E>&e, layout_type chunk_memory_layout = XTENSOR_DEFAULT_LAYOUT);
@@ -398,7 +437,7 @@ namespace xt
398437
}
399438
return this->derived_cast();
400439
}
401-
440+
402441
template <class D>
403442
template <class E>
404443
inline auto xchunked_semantic<D>::operator=(const xexpression<E>& e) -> derived_type&
@@ -407,7 +446,7 @@ namespace xt
407446
get_assigner(d.chunks()).build_and_assign_temporary(e, d);
408447
return d;
409448
}
410-
449+
411450
template <class D>
412451
template <class CS>
413452
inline auto xchunked_semantic<D>::get_assigner(const CS&) const -> xchunked_assigner<temporary_type, CS>

0 commit comments

Comments
 (0)