@@ -21,7 +21,7 @@ Solving optimal transport
2121The optimal transport problem between discrete distributions is often expressed
2222as
2323 .. math ::
24- \gamma ^* = arg\min _\gamma \sum _{i,j}\gamma _{i,j}M_{i,j}
24+ \gamma ^* = arg\min _\gamma \quad \ sum _{i,j}\gamma _{i,j}M_{i,j}
2525
2626 s.t. \gamma 1 = a; \gamma ^T 1 = b; \gamma\geq 0
2727
@@ -56,15 +56,12 @@ Computing Wasserstein distance
5656The value of the OT solution is often more of interest that the OT matrix :
5757
5858 .. math ::
59- W (a,b)=\min _\gamma \sum _{i,j}\gamma _{i,j}M_{i,j}
59+ OT (a,b)=\min _\gamma \quad \sum _{i,j}\gamma _{i,j}M_{i,j}
6060
6161 s.t. \gamma 1 = a; \gamma ^T 1 = b; \gamma\geq 0
6262
6363
64- where :math: `W(a,b)` is the `Wasserstein distance
65- <https://en.wikipedia.org/wiki/Wasserstein_metric> `_ between distributions a and b
66- It is a metrix that has nice statistical
67- properties. It can computed from an already estimated OT matrix with
64+ It can computed from an already estimated OT matrix with
6865:code: `np.sum(T*M) ` or directly with the function :any: `ot.emd2 `.
6966
7067.. code :: python
@@ -73,13 +70,58 @@ properties. It can computed from an already estimated OT matrix with
7370 # M is the ground cost matrix
7471 W= ot.emd2(a,b,M) # Wasserstein distance / EMD value
7572
73+ Note that the well known `Wasserstein distance
74+ <https://en.wikipedia.org/wiki/Wasserstein_metric> `_ between distributions a and
75+ b is defined as
76+
77+
78+ .. math ::
79+
80+ W_p(a,b)=(\min _\gamma \sum _{i,j}\gamma _{i,j}\| x_i-y_j\| _p)^\frac {1 }{p}
81+
82+ s.t. \gamma 1 = a; \gamma ^T 1 = b; \gamma\geq 0
83+
84+ This means that if you want to compute the :math: `W_2 ` you need to compute the
85+ square root of :any: `ot.emd2 ` when providing
86+ :code: `M=ot.dist(xs,xt) ` that use the squared euclidean distance by default. Computing
87+ the :math: `W_1 ` wasserstein distance can be done directly with :any: `ot.emd2 `
88+ when providing :code: `M=ot.dist(xs,xt, metric='euclidean') ` to use the euclidean
89+ distance.
90+
91+
7692
7793.. hint ::
7894 Examples of use for :any: `ot.emd2 ` are available in the following examples:
7995
8096 - :any: `auto_examples/plot_compute_emd `
8197
8298
99+ Special cases
100+ ^^^^^^^^^^^^^
101+
102+ Note that the OT problem and the corresponding Wasserstein distance can in some
103+ special cases be computed very efficiently.
104+
105+ For instance when the samples are in 1D, then the OT problem can be solved in
106+ :math: `O(n\log (n))` by using a simple sorting. In this case we provide the
107+ function :any: `ot.emd_1d ` and :any: `ot.emd2_1d ` to return respectively the OT
108+ matrix and value. Note that since the solution is very sparse the :code: `sparse `
109+ parameter of :any: `ot.emd_1d ` allows for solving and returning the solution for
110+ very large problems. Note that in order to computed directly the :math: `W_p`
111+ Wasserstein distance in 1D we provide the function :any: `ot.wasserstein_1d ` that
112+ takes :code: `p ` as a parameter.
113+
114+ Another specials for estimating OT and Monge mapping is between Gaussian
115+ distributions. In this case there exists a close form solution given in Remark
116+ 2.29 in [15 ]_ and the Monge mapping is an affine function and can be
117+ also computed from the covariances and means of the source and target
118+ distributions. In this case when the finite sample dataset is supposed gaussian, we provide
119+ :any: `ot.da.OT_mapping_linear ` that returns the parameters for the Monge
120+ mapping.
121+
122+
123+
124+
83125Regularized Optimal Transport
84126-----------------------------
85127
@@ -89,31 +131,53 @@ computational and statistical properties.
89131We address in this section the regularized OT problem that can be expressed as
90132
91133.. math ::
92- \gamma ^* = arg\min _\gamma < \gamma ,M>_F + reg* \Omega (\gamma )
134+ \gamma ^* = arg\min _\gamma \quad \sum _{i,j} \gamma _{i,j}M_{i,j} + \lambda \Omega (\gamma )
93135
94- s.t. \gamma 1 = a
136+ s.t. \gamma 1 = a; \gamma ^T 1 = b; \gamma\geq 0
95137
96- \gamma ^T 1 = b
97138
98- \gamma\geq 0
99139 where :
100140
101141- :math: `M\in \mathbb {R}_+^{m\times n}` is the metric cost matrix defining the cost to move mass from bin :math: `a_i` to bin :math: `b_j`.
102142- :math: `a` and :math: `b` are histograms (positive, sum to 1) that represent the weights of each samples in the source an target distributions.
103143- :math: `\Omega ` is the regularization term.
104144
105- We disvuss in the following specific algorithms
106-
145+ We discuss in the following specific algorithms that can be used depending on
146+ the regularization term.
107147
108148
109149Entropic regularized OT
110150^^^^^^^^^^^^^^^^^^^^^^^
111151
152+ This is the most common regularization used for optimal transport. It has been
153+ proposed in the ML community by Marco Cuturi in his seminal paper [2 ]_. This
154+ regularization has the following expression
155+
156+ .. math ::
157+ \Omega (\gamma )=\sum _{i,j}\gamma _{i,j}\log (\gamma _{i,j})
158+
159+
160+ The use of the regularization term above in the optimization problem has a very
161+ strong impact. First it makes the problem smooth which leads to new optimization
162+ procedures such as L-BFGS (see :any: `ot.smooth ` ). Next it makes the problem
163+ strictly convex meaning that there will be a unique solution. Finally the
164+ solution of the resulting optimization problem can be expressed as:
165+
166+ .. math ::
167+
168+ \gamma _\lambda ^*=\text {diag}(u)K\text {diag}(v)
169+
170+ where :math: `u` and :math: `v` are vectors and :math: `K=\exp (-M/\lambda )` where
171+ the :math: `\exp ` is taken component-wise.
172+
173+
174+
175+
112176
113177Other regularization
114178^^^^^^^^^^^^^^^^^^^^
115179
116- Stochastic gradient decsent
180+ Stochastic gradient descent
117181^^^^^^^^^^^^^^^^^^^^^^^^^^^
118182
119183Wasserstein Barycenters
0 commit comments