On a general matrix-valued unbalanced optimal transport problem

Bowen Li; Jun Zou

doi:10.1017/S0956792524000901

On a general matrix-valued unbalanced optimal transport problem

Part of: General theory of linear operators Classical measure theory Manifolds Metric geometry Infinite-dimensional manifolds

Published online by Cambridge University Press: 24 February 2025

Bowen Li and

Jun Zou

Show author details

Bowen Li*: Affiliation:
Department of Mathematics, City University of Hong Kong, Kowloon Tong, Hong Kong, China
Jun Zou: Affiliation:
Department of Mathematics, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
*: Corresponding author: Bowen Li; Email: bowen.li@cityu.edu.hk

Article contents

Abstract
Introduction
Preliminaries and notation
Definition and basic properties
Properties of weighted Wasserstein–Bures metrics
Geometric properties and Riemannian interpretation
Cone space and spherical distance
Example and discussion
Concluding remarks
Financial Support
Competing interests
References

Rights & Permissions

Abstract

We introduce a general class of transport distances $\mathrm {WB}_{\Lambda }$ over the space of positive semi-definite matrix-valued Radon measures $\mathcal {M}(\Omega, \mathbb {S}_+^n)$, called the weighted Wasserstein–Bures distance. Such a distance is defined via a generalised Benamou–Brenier formulation with a weighted action functional and an abstract matricial continuity equation, which leads to a convex optimisation problem. Some recently proposed models, including the Kantorovich–Bures distance and the Wasserstein–Fisher–Rao distance, can naturally fit into ours. We give a complete characterisation of the minimiser and explore the topological and geometrical properties of the space $(\mathcal {M}(\Omega, \mathbb {S}_+^n),\mathrm {WB}_{\Lambda })$. In particular, we show that $(\mathcal {M}(\Omega, \mathbb {S}_+^n),\mathrm {WB}_{\Lambda })$ is a complete geodesic space and exhibits a conic structure.

Keywords

Unbalanced optimal transport Matrix-valued measure Geodesic space Metric cone

MSC classification

Primary: 47A56: Functions whose values are linear operators (operator and matrix valued functions, etc., including analytic and meromorphic ones) 49Q20: Variational problems in a geometric measure-theoretic setting

Secondary: 28A33: Spaces of measures, convergence of measures 51F99: None of the above, but in this section 58B20: Riemannian, Finsler and other geometric structures

Type: Papers
Information: European Journal of Applied Mathematics , First View , pp. 1 - 33

DOI: https://doi.org/10.1017/S0956792524000901 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press

1. Introduction

1.1. Classical optimal transport

Optimal transport (OT) [Reference Santambrogio87, Reference Villani89, Reference Villani90] provides a versatile framework for defining metrics and studying geometric structures on probability measures. It has been an active research area over the past decades with fruitful applications in various areas, including functional inequalities [Reference Lott and Villani68, Reference Otto and Villani76, Reference Sturm88], gradient flow [Reference Jordan, Kinderlehrer and Otto51, Reference Otto75], and more recently, image processing and machine learning [Reference Arjovsky, Chintala and Bottou3, Reference Ferradans, Papadakis, Peyré and Aujol37, Reference Frogner, Zhang, Mobahi, Araya and Poggio43]. The OT problem was first proposed by Monge in 1781 [Reference Monge72]: given probabilities $\rho _0$ and $\rho _1$ , find a measure-preserving transport map $T$ minimising

(1.1)

\begin{align} \min _{T_{\#}\rho _0 = \rho _1} \int |x - T(x)|^2\, \mathrm {d} \rho _0(x)\,. \end{align}

However, its solution (i.e., the OT map) may not exist. This question remained open for a long time until 1942 when Kantorovich introduced a relaxed problem based on the so-called transport plans [Reference Kantorovich52]:

(1.2)

\begin{align} \mathrm {W}_2^2(\rho _0,\rho _1) \,:\!=\, \min \Big \{\int |x-y|^2\, \mathrm {d}\gamma \,;\ \gamma \ \text {is a probability with}\ (\pi _\#^x \gamma, \pi _\#^y \gamma ) = (\rho _0,\rho _1)\Big \}\,, \end{align}

where $\pi _\#^x \gamma$ and $\pi _\#^y \gamma$ are the first and second marginals of $\gamma$ , respectively. The $2$ -Wasserstein distance (1.2) turns out to exhibit intriguing mathematical properties. Brenier [Reference Brenier14] proved that under mild conditions, the OT map $T$ to (1.1) exists and is uniquely given by the gradient of a convex function $\varphi$ . Thanks to the measure-preserving property of the transport map $T = \nabla \varphi$ , it is easy to see that $\varphi$ satisfies the Monge–Ampère equation, which provides a PDE-based approach for solving the OT problem (1.1). One can also show that $(\mathrm {id}, \nabla \varphi )_\# \rho _0$ gives a minimiser to (1.2). Equipped with the distance $\mathrm {W}_2(\cdot, \cdot )$ , the probability measure space becomes a geodesic space, where the geodesic is characterised by McCann’s displacement interpolation $\rho _t\,:\!=\, ((1-t)I + t \nabla \varphi )_{\#}\rho _0$ [Reference McCann71]. In Benamou and Brenier’s seminal work [Reference Benamou and Brenier8], an equivalent fluid mechanics formulation was proposed for computational purposes:

(𝒫_W₂)

\begin{equation} \mathrm {W}_2^2(\rho _0,\rho _1) = \min _{\rho, m}\left \{ \frac {1}{2}\iint \rho ^{-1}|m|^2\, \mathrm {d}t\, \mathrm {d}x\,; \ \partial _t \rho + \mathrm {div}\, m = 0 \right \}\,. \end{equation}

This dynamic point of view has since stimulated numerous follow-up studies, including the present work. We refer the interested readers to [Reference Villani89, Reference Villani90] for the precise statements of aforementioned results and a detailed overview.

1.2. Unbalanced optimal transport

Although the OT theory has become a popular tool in learning theory and data science for its geometric nature and capacity for large-scale simulation, a limitation is that the associated metric is only defined for measures of equal mass, while in many applications, it is more desirable to allow measures with different masses. This leads to the problem of extending the classical OT theory to the unbalanced case. The early effort in this direction may date back to the works [Reference Kantorovich and Rubinshtein53, Reference Kantorovich and Rubinshtein54] by Kantorovich and Rubinshtein in the 1950s, where a simple static formulation with an extended Kantorovich norm was introduced. The underlying idea is to allow the mass to be sent to (or come from) a point at infinity, which was further investigated and extended in [Reference Guittet49, Reference Hanin50]. Similarly, Figalli and Gigli [Reference Figalli and Gigli39] introduced an unbalanced transportation distance via a variant of Kantorovich formulation (1.2) by allowing taking the mass from (or giving it back to) the boundary of the domain. Another closely related approach is the optimal partial transport [Reference Caffarelli and McCann20, Reference Figalli38], which is also based on (1.2) but involves a relaxed constraint $(\pi _\#^x \gamma, \pi _\#^y \gamma ) \le (\rho _0,\rho _1)$ and a shifted cost $|x-y|^2-\alpha$ .

In addition to the static models, there is a large number of works devoted to defining an unbalanced OT model via a dynamic formulation in the spirit of [Reference Benamou and Brenier8]; see for example [Reference Benamou7, Reference Chizat, Peyré, Schmitzer and Vialard27, Reference Lombardi and Maitre66, Reference Maas, Rumpf, Schönlieb and Simon69, Reference Piccoli and Rossi79]. In these works, a source term and a corresponding penalisation term are introduced in the continuity equation and the action functional, respectively, in order to model the mass change. In particular, Piccoli and Rossi [Reference Piccoli and Rossi78, Reference Piccoli and Rossi79] defined a generalised Wasserstein distance by relaxing the marginal constraint $(\pi _\#^x \gamma, \pi _\#^y \gamma ) = (\rho _0,\rho _1)$ by a total variation regularisation, which turns out to be equivalent to the optimal partial transport in certain scenarios [Reference Chizat, Peyré, Schmitzer and Vialard27]. Moreover, an equivalent dynamic formulation has also been given in ref. [Reference Piccoli and Rossi79]. Later, a new transport model, called the Wasserstein–Fisher–Rao (WFR) or Hellinger–Kantorovich distance (in this work we adopt the former one), was introduced independently and almost simultaneously by three research groups with different perspectives and techniques [Reference Chizat, Peyré, Schmitzer and Vialard27, Reference Kondratyev, Monsaingeon and Vorotnikov56, Reference Liero, Mielke and Savaré64]. This model can be regarded as an inf-convolution of the Wasserstein and Fisher–Rao metric tensors, as the name suggests. In their subsequent work [Reference Chizat, Peyré, Schmitzer and Vialard29], Chizat et al. presented a class of unbalanced transport distances in a unified framework via both static and dynamic formulations, thanks to the notions of semi-couplings and Lagrangians. Meanwhile, Liero et al. [Reference Liero, Mielke and Savaré65] proposed a related optimal entropy-transport approach and discussed its properties in detail. It was proved that both the optimal partial transport and the WFR distance can be viewed as the special cases of the general frameworks in refs. [Reference Chizat, Peyré, Schmitzer and Vialard29, Reference Liero, Mielke and Savaré65]. After that, the unbalanced OT theory is further developed in various directions, such as gradient flows [Reference Kondratyev and Vorotnikov57, Reference Kondratyev and Vorotnikov59], Sobolev inequalities [Reference Kondratyev and Vorotnikov58] and the JKO scheme [Reference Fleissner41, Reference Gallouët and Monsaingeon44]. We also want to mention a recent work [Reference Lombardini and Rossi67] by Lombardini and Rossi, which gave a negative answer to an interesting question of whether it is possible to define an unbalanced transport distance that coincides with the Wasserstein one when the measures are of equal mass.

1.3. Noncommutative optimal transport

More recently, there is also an increasing interest in generalising the OT theory to the noncommutative setting, namely, the quantum states or matrix-valued measures. The first line of research is motivated by the ergodicity of open quantum dynamics [Reference Gross48, Reference Kastoryano and Temme55, Reference Olkiewicz and Zegarlinski74]. In the seminal works [Reference Carlen and Maas21, Reference Carlen and Maas22] by Carlen and Maas, a quantum Wasserstein distance was introduced with a Benamou–Brenier dynamic formulation such that a primitive quantum Markov semigroup satisfying the detailed balance condition can be formulated as the gradient flow of the logarithmic relative entropy, which opens the door to investigating the noncommutative functional inequalities via the gradient flow techniques and the geodesic convexity; see for example [Reference Datta and Rouzé31, Reference Li and Lu62, Reference Rouzé and Datta84, Reference Wirth and Zhang93]. Meanwhile, Golse et al. proposed another quantum transport model via a generalised Monge–Kantorovich formulation, when they studied the mean-field and classical limits of the Schrödinger equation; see [Reference Golse, Mouhot and Paul45–Reference Golse and Paul47]. Other static quantum Wasserstein distances can be found in refs. [Reference Cole, Eckstein, Friedland and Życzkowski30, Reference De Palma, Marvian, Trevisan and Lloyd32, Reference De Palma and Trevisan33], just to name a few.

The second research line is driven by the advances in diffusion tensor imaging [Reference Bihan61, Reference Wandell92], where a tensor field (usually, a positive semi-definite matrix) is generated at each spatial position to encode the local diffusivity of water molecules in the brain. It gives rise to a natural question of how to compare two brain tensor fields, or mathematically how to define a reasonable distance between matrix-valued measures. Chen et al. [Reference Chen, Gangbo, Georgiou and Tannenbaum23, Reference Chen, Georgiou and Tannenbaum24] introduced a dynamic matricial Wasserstein distance for matrix-valued densities with unit mass, drawing inspiration from ref. [Reference Benamou and Brenier8] and leveraging the Lindblad equation in quantum mechanics, which was later extended to the unbalanced case [Reference Chen, Georgiou and Tannenbaum25] in a manner similar to [Reference Chizat, Peyré, Schmitzer and Vialard27]. In particular, Brenier and Vorotnikov [Reference Brenier and Vorotnikov16] recently proposed a different dynamic OT model for unbalanced matrix-valued measures called the Kantorovich–Bures metric, which is motivated by the observation in ref. [Reference Brenier15] that the incompressible Euler equation admits a dual concave maximisation problem. Regarding static formulations, Peyré et al. [Reference Peyré, Chizat, Vialard and Solomon77] introduced a quantum transport distance with entropic regularisation inspired by [Reference Liero, Mielke and Savaré65] and proposed an associated scaling algorithm that generalised the results in ref. [Reference Chizat, Peyré, Schmitzer and Vialard28]. Additionally, Ryu et al. defined a matrix OT model of order $1$ by a Beckmann-type flux formulation and presented a scalable and parallelisable numerical method. Applications in tensor field imaging were also explored in ref. [Reference Peyré, Chizat, Vialard and Solomon77, Reference Ryu, Chen, Li and Osher86].

1.4. Contribution

The initial motivation for this work is the numerical study of the unbalanced matricial OT models proposed in ref. [Reference Brenier and Vorotnikov16, Reference Chen, Georgiou and Tannenbaum25]; see (𝒫_WB ) and (𝒫_2,FR ). We find that despite their distinct formulations, these models actually share many mathematical properties. In this work, we consider an abstract continuity equation $\partial _t \mathsf {G} + \mathsf {D} \mathsf {q} = \mathsf {R}^{{\mathrm {sym}}}$ in Definition3.4 with $\mathsf {D}$ being a first-order constant coefficient linear differential operator such that $\mathsf {D}^*(I) = 0$ , in analogy with the one $\partial _t \mathsf {G} + 2 (\mathsf {L}^* \circ \mathsf {P})\, \mathsf {q} = 0$ for the matrix-valued optimal ballistic transport problem (cf. [Reference Vorotnikov91, (1.4)–(1.5)]). Here, $\mathsf {q}(t,x)$ can be intuitively seen as a momentum variable; $\mathsf {D}q$ is the matricial analogue of the advection term $\mathrm {div}\, m$ in (𝒫_W₂ ) controlling the mass transportation in space and between components; $\mathsf {R}^{{\mathrm {sym}}}$ is the reaction part describing the variation of mass. Then, thanks to the weighted infinitesimal cost $J_\Lambda (G_t, q_t, R_t) =\frac {1}{2} (q_t \Lambda _1^\dagger ) \cdot G_t^{\dagger } (q_t \Lambda _1^\dagger ) + \frac {1}{2} (R_t \Lambda _2^\dagger ) \cdot G^{\dagger }_t (R_t \Lambda _2^\dagger )$ given in Proposition3.1 with the weight matrices $\Lambda _1$ and $\Lambda _2$ representing the contributions of each component of $q$ and $G$ in $J_\Lambda$ , we define a general matrix-valued unbalanced OT distance $\mathrm {WB}_{\Lambda }(\cdot, \cdot )$ (𝒫) as a convex optimisation, similarly to the classical case (𝒫_W₂ ), which we call the weighted Wasserstein–-Bures distance; see Definition3.8. We note that the problems (𝒫_WB ) and (𝒫_2,FR ), as well as the scalar WFR distance (𝒫_WFR ), can be viewed as the special instances of our model (𝒫). See Section 7 for more details.

Our main contribution is a comprehensive and self-contained study of the properties of the weighted distance $\mathrm {WB}_{\Lambda }$ on the positive semi-definite matrix-valued Radon measure space $\mathcal {M}(\Omega, \mathbb {S}_+^n)$ . We establish the a priori estimates for solutions of the abstract continuity equation (3.13) in Lemmas3.9, 3.12 and Proposition3.13, which consequently gives the well-posedness of the model (𝒫) and a useful compactness result (Proposition3.18). Then, by leveraging tools from convex analysis, we show the existence of the minimiser (i.e., the minimising geodesic) to (𝒫) with a characterisation of the optimality conditions; see Theorems4.2 and 4.5. Moreover, we prove that the topology induced by $\mathrm {WB}_\Lambda (\cdot, \cdot )$ is stronger than the weak^* one, and study the limit model when a weight matrix goes to zero; see Propositions5.2 and 4.6, respectively. With the help of these results, in Theorem5.5 and Corollary5.7, we characterise the absolutely continuous curve with respect to the metric $\mathrm {WB}_\Lambda$ and show that $(\mathcal {M}(\Omega, \mathbb {S}_+^n), \mathrm {WB}_\Lambda )$ is a complete geodesic space. We further consider its conic structure and prove in Theorem6.3 that the space $(\mathcal {M}(\Omega, \mathbb {S}_+^n), \mathrm {WB}_\Lambda )$ is a metric cone over $(\mathcal {M}_1, \mathrm {SWB}_\Lambda )$ , where $\mathcal {M}_1$ is a normalised matrix-valued measure space (6.2), which corresponds to a noncommutative probability space, and $\mathrm {SWB}_\Lambda$ is the spherical distance (6.1) induced by $\mathrm {WB}_\Lambda$ . Recalling the Riemannian interpretation in Corollary5.8, we can formally view $(\mathcal {M}(\Omega, \mathbb {S}_+^n), \mathrm {WB}_\Lambda )$ as a Riemannian manifold and $\mathcal {M}_1$ as its submanifold with the induced metric $\mathrm {SWB}_\Lambda$ , which allows developing the Otto calculus in the spirit of [Reference Otto and Villani76]. These results can be readily applied to the models (𝒫_WB ) and (𝒫_2,FR ), which lay a solid mathematical foundation for the distance (𝒫_2,FR ) and complement the results in ref. [Reference Brenier and Vorotnikov16] for (𝒫_WB ) (note that our approach is quite different from theirs).

In the companion work [Reference Li and Zou63], we have designed a convergent discretisation scheme for the general model (𝒫), which directly applies to the Kantorovich–Bures distance (𝒫_WB ) [Reference Brenier and Vorotnikov16], the matricial interpolation distance (𝒫_2,FR ) [Reference Chen, Georgiou and Tannenbaum25] and the WFR metric (𝒫_WFR ) [Reference Chizat, Peyré, Schmitzer and Vialard27], thanks to the discussion in Section 7 of the present work.

1.5. Layout

The rest of this work is organised as follows. In Section 2, we give a list of basic notations that will be used throughout this work and recall some preliminary results. In Section 3, we define a class of weighted Wasserstein–Bures distances for matrix-valued measures via a dynamic formulation. Sections 4 and 5 are devoted to its topological, metric and geometric properties, while in Section 6, we discuss its conic structure. In Section 7, we connect our general model with several existing models in the literature. Some auxiliary proofs are included in Appendix A.

2. Preliminaries and notation

2.1. Notation and convention

• We denote by $\mathbb {R}^{n \times m}$ the space of $n \times m$ real matrices. If $m = n$ , we simply write it as $\mathbb {M}^{n}$ . Moreover, we use $\mathbb {S}^n$ , $\mathbb {S}_+^n$ and $\mathbb {S}^n_{++}$ to denote symmetric matrices, positive semi-definite matrices and positive definite matrices, respectively. $\mathbb {A}^n$ denotes the space of $n \times n$ antisymmetric matrices.
• We denote by $|\cdot |$ the Euclidean norm on $\mathbb {R}^n$ . We equip the matrix space $\mathbb {R}^{n \times m}$ with the Frobenius inner product $A \cdot B = Tr(A^{\mathrm {T}} B)$ and the associated norm $\lVert A \rVert _{\mathrm {F}} = \sqrt { A \cdot A}$ .
• The symmetric and antisymmetric parts of $A \in \mathbb {M}^n$ are given by
(2.1) \begin{equation} A^{\mathrm {sym}} = (A + A^{\mathrm {T}})/2\,,\quad A^{\mathrm {ant}} = (A - A^{\mathrm {T}})/2\,, \end{equation}
respectively. We also write $A \preceq B$ (resp., $A \prec B$ ) for $A, B \in \mathbb {S}^n$ if $B - A \in \mathbb {S}^n_+$ (resp., $B - A \in \mathbb {S}^n_{++}$ ).
• $\mathcal {X}$ denotes a generic compact separable metric space with Borel $\sigma$ -algebra $\mathscr {B}(\mathcal {X})$ , unless otherwise specified.
• $C(\mathcal {X},\mathbb {R}^n)$ denotes the space of $\mathbb {R}^n$ -valued continuous functions on $\mathcal {X}$ with the supremum norm $\lVert \cdot \rVert _\infty$ . Its dual space, denoted by $\mathcal {M}(\mathcal {X},\mathbb {R}^n)$ , is $\mathbb {R}^n$ -valued Radon measure space with the total variation norm $\lVert \cdot \rVert _{\mathrm {TV}}$ .
• Let $\mathcal {B}$ be a Banach space with the dual space $\mathcal {B}^*$ . We denote by $\langle \cdot, \cdot \rangle _{\mathcal {B}}$ the duality pairing between $\mathcal {B}$ and $\mathcal {B}^*$ . When $\mathcal {B} = C(\mathcal {X},\mathbb {R}^n)$ , we usually write it as $\langle \cdot, \cdot \rangle _{\mathcal {X}}$ for short. We will also consider the weak and weak^* convergences on $\mathcal {B}$ and $\mathcal {B}^*$ , respectively. In particular, a sequence of measures $\{\mu _j\}$ weak^* converges to $\mu \in \mathcal {M}(\mathcal {X},\mathbb {R}^n)$ if for any $\phi \in C(\mathcal {X},\mathbb {R}^n)$ , there holds $\langle \mu _j, \phi \rangle _{\mathcal {X}} \to \langle \mu, \phi \rangle _{\mathcal {X}}$ as $j \to +\infty$ .
• Let $\mathbb {R}_+\,:\!=\, [0,\infty )$ , and $\mathcal {M}(\mathcal {X},\mathbb {R}_+)$ be the space of nonnegative finite Radon measures. For $\mu \in \mathcal {M}(\mathcal {X},\mathbb {R}^n)$ , we have an associated variation measure $|\mu |\in \mathcal {M}(\mathcal {X},\mathbb {R}_+)$ such that $\mathrm {d} \mu = \sigma \mathrm {d} |\mu |$ with $|\sigma (x)| = 1$ for $|\mu |$ -a.e. $x \in \mathcal {X}$ , where $\sigma \,:\, \mathcal {X} \to \mathbb {R}^n$ is the Radon–Nikodym derivative (density) of $\mu$ with respect to $|\mu |$ [Reference Evans and Gariepy36, Reference Rudin85].
• We identify the space of matrix-valued Radon measures $\mathcal {M}(\mathcal {X},\mathbb {R}^{n \times m})$ with $\mathcal {M}(\mathcal {X},\mathbb {R}^{nm})$ by vectorisation. It is easy to see that both sets of $\mathbb {S}^n$ -valued Radon measures $\mathcal {M}(\mathcal {X},\mathbb {S}^n)$ and $\mathbb {S}^n_+$ -valued Radon measures $\mathcal {M}(\mathcal {X},\mathbb {S}_+^n)$ are closed in $\mathcal {M}(\mathcal {X},\mathbb {M}^n)$ with respect to the weak^* topology [Reference Duran and Lopez-Rodriguez35, Theorem 3.5]. Moreover, we have the following characterisation:
\begin{equation*} (C(\mathcal {X}, \mathbb {S}^n))^* \simeq (C(\mathcal {X},\mathbb {M}^n) /C(\mathcal {X}, \mathbb {A}^n))^* \simeq \mathcal {M}(\mathcal {X}, \mathbb {S}^n)\,, \end{equation*}
where $\simeq$ means the isometric isomorphism and $C(\mathcal {X},\mathbb {M}^n) /C(\mathcal {X}, \mathbb {A}^n)$ is the quotient space. Indeed, we observe that $\mu \in \mathcal {M}(\mathcal {X},\mathbb {S}^n) \subset \mathcal {M}(\mathcal {X},\mathbb {M}^n) \simeq C(\mathcal {X},\mathbb {M}^n)^*$ if and only if its induced linear functional on $C(\mathcal {X},\mathbb {M}^n)$ has the kernel $C(\mathcal {X}, \mathbb {A}^n)$ , which yields, by [Reference Brezis17, Proposition 11.9],
\begin{equation*} (C(\mathcal {X},\mathbb {M}^n) /C(\mathcal {X}, \mathbb {A}^n))^* \simeq \mathcal {M}(\mathcal {X}, \mathbb {S}^n)\,. \end{equation*}
Meanwhile, $C(\mathcal {X}, \mathbb {S}^n) \simeq C(\mathcal {X},\mathbb {M}^n) /C(\mathcal {X}, \mathbb {A}^n)$ is a consequence of $\mathbb {S}^n \perp \mathbb {A}^n$ and $\mathbb {S}^n \simeq \mathbb {M}^n/\mathbb {A}^n$ .
• For $\mu \in \mathcal {M}(\mathcal {X}, \mathbb {S}_+^n)$ , we define an associated trace measure $Tr\mu$ by the set function $E \to Tr (\mu (E))$ , $E \in \mathscr {B}(\mathcal {X})$ . It is clear that $ 0 \preceq \mu (E) \preceq Tr (\mu (E)) I$ and $ Tr\mu$ is equivalent to $|\mu |$ , denoted by $Tr\mu \sim |\mu |$ . That is,
(2.2) \begin{equation} |\mu | \ll Tr\mu \quad \text {and} \quad Tr\mu \ll |\mu |\,. \end{equation}
We will usually use $Tr\mu$ as the dominant measure for $\mu \in \mathcal {M}(\mathcal {X},\mathbb {S}_+^n)$ . In addition, note that for $\lambda \in \mathcal {M}(\mathcal {X}, \mathbb {R}_+)$ with $|\mu | \ll \lambda$ , there holds $\frac {\mathrm {d} \mu }{\mathrm {d} \lambda } \in \mathbb {S}^n_+$ for $\lambda$ -a.e. $x \in \mathcal {X}$ , which is an equivalent characterisation of $\mathcal {M}(\mathcal {X}, \mathbb {S}_+^n)$ .
• We will use sans serif letterforms to denote vector-valued or matrix-valued measures, e.g., $\mathsf {A} \in \mathcal {M}(\mathcal {X}, \mathbb {M}^n)$ , while letters with serifs are reserved for their densities with respect to some reference measure, e.g., $A_\lambda \,:\!=\, \frac {\mathrm {d} \mathsf {A}}{\mathrm {d} \lambda }$ for $|\mathsf {A}| \ll \lambda$ . The symmetric and antisymmetric parts $\mathsf {A}^{\mathrm {sym}}$ and $\mathsf {A}^{\mathrm {ant}}$ of $\mathsf {A} \in \mathcal {M}(\mathcal {X}, \mathbb {M}^n)$ are defined as in (2.1).
• We identify a measure and its density with respect to the Lebesgue measure (if exists) unless otherwise specified.
• For $\lambda \in \mathcal {M}(\mathcal {X}, \mathbb {R}_+)$ , we denote by $L^p_\lambda (\mathcal {X},\mathbb {R}^n)$ with $p \in [1, +\infty ]$ the standard space of $p$ -integrable $\mathbb {R}^n$ -valued functions. For $\mathsf {G} \in \mathcal {M}(\mathcal {X}, \mathbb {S}_+^n)$ , we consider the space of $\mathbb {R}^{n \times m}$ -valued measurable functions endowed with the semi-inner product:
(2.3) \begin{equation} \langle P, Q \rangle _{L^2_{\mathsf {G}}(\mathcal {X})} \,:\!=\, \langle \mathsf {G}, QP^{\mathrm {T}} \rangle _{\mathcal {X}} = \int _{\mathcal {X}} P \cdot (\mathrm {d} \mathsf {G}\, Q) = \int _{\mathcal {X}} P \cdot \big (G_\lambda Q \big )\, \mathrm {d} \lambda \,, \end{equation}
where $\lambda$ is a reference measure such that $|\mathsf {G}|\ll \lambda$ and $G_\lambda$ is the density. Noting that $\lVert Q \rVert _{L^2_{\mathsf {G}}(\mathcal {X})} = 0$ is equivalent to $G_\lambda Q = 0$ for $\lambda$ -a.e. $x \in \mathcal {X}$ , the kernel of the seminorm $\lVert \cdot \rVert _{L^2_{\mathsf {G}}(\mathcal {X})}$ is given by $\{Q\,;\ \mathrm {Ran}(Q) \in \mathrm {Ker}(G_\lambda )\,, \,\lambda \text {-a.e.}\}$ . Then, we define the Hilbert space $L^2_{\mathsf {G}}(\mathcal {X}, \mathbb {R}^{n \times m})$ as the quotient space by $\mathrm {Ker}\big (\lVert \cdot \rVert _{L^2_{\mathsf {G}}(\mathcal {X})}\big )$ .

2.2. Preliminaries

We denote by $A^\dagger \in \mathbb {R}^{m \times n}$ the pseudoinverse of a matrix $A \in \mathbb {R}^{n \times m}$ . If $A \in \mathbb {S}^n$ has the eigendecomposition $A = O \Sigma O^{\mathrm {T}}$ , then $A^\dagger = O \Sigma ^\dagger O^{\mathrm {T}}$ with $\Sigma ^\dagger = \text {diag}(\lambda _1^{-1}, \ldots, \lambda _s^{-1},0, \ldots, 0)$ , where $O$ is an orthogonal matrix and $\Sigma = \text {diag}(\lambda _1,\ldots, \lambda _s,0, \ldots, 0)$ is a diagonal matrix with $\{\lambda _i\}$ being nonzero eigenvalues of $A$ .

Lemma 2.1. The following properties hold:

1. If $A \succeq B \succeq 0$ and $\mathrm {Ran}(A) = \mathrm {Ran}(B)$ , then $B^{\dagger } \succeq A^{\dagger }$ .
2. The cone $\mathbb {S}^n_+$ in $\mathbb {S}^n$ is self-dual, that is, $(\mathbb {S}_+^n)^* \,:\!=\, \{B\in \mathbb {S}^n\,; \ Tr(AB) \ge 0\,,\ \forall A \in \mathbb {S}^n_+ \} = \mathbb {S}^n_+$ .
3. If $A, B \succeq 0$ and $A \cdot B = 0$ , then $\mathrm {Ran} B \subset \mathrm {Ker} A$ , equivalently, $\mathrm {Ran} A \subset \mathrm {Ker} B$ .
4. For $A \in \mathbb {S}_+^n, M \in \mathbb {R}^{n \times m}$ , there holds
(2.4) \begin{equation} (A M) \cdot M \le Tr(A)\lVert M \rVert _{\mathrm {F}}^2\,. \end{equation}

Remark 2.2. The range condition $\mathrm {Ran}(A) = \mathrm {Ran}(B)$ for the first statement in Lemma 2.1 above is necessary, due to the example $A = \text {diag}(1,1,1,0)$ and $B = \text {diag}(1,1,0,0)$ . Moreover, we remark that for $\mathsf {G} \in \mathcal {M}(\mathcal {X}, \mathbb {S}_+^n)$ , there holds $L_{Tr \mathsf {G}}^2(\mathcal {X}, \mathbb {R}^n) \subset L_{\mathsf {G}}^2(\mathcal {X}, \mathbb {R}^n)$ by (2.4), while the converse is not true; see [35] for the counterexample.

Proof. We only prove the first statement, as the others are direct. We first note that the orthogonal projection onto $\mathrm {Ran}(A) = \mathrm {Ran}(B)$ is given by $\mathbb {P} = \sqrt {B}^\dagger B \sqrt {B}^\dagger = \sqrt {A}^\dagger A \sqrt {A}^\dagger$ . By $A - B \succeq 0$ , we have $\sqrt {B}^\dagger A \sqrt {B}^\dagger - \mathbb {P} \succeq 0$ , which means that all the eigenvalues of the matrix $\sqrt {B}^\dagger A \sqrt {B}^\dagger$ restricted on its invariant subspace $\mathrm {Ran}(A) = \mathrm {Ran}(B)$ is greater than or equal to one. It is easy to see that $\sqrt {B}^\dagger A \sqrt {B}^\dagger$ and $\sqrt {A} B^\dagger \sqrt {A}$ have the same eigenvalues. Hence, we find $\sqrt {A} B^\dagger \sqrt {A} - \mathbb {P} \succeq 0$ , which gives $B^\dagger \succeq A^\dagger$ by conjugating with $\sqrt {A}^\dagger$ .

The next lemma is about the measurability of matrix-valued functions.

Lemma 2.3. Let $A(x)$ be a $\mathbb {S}^n$ -valued Borel measurable function on $\mathcal {X}$ . Then, it holds that

1. The eigenvalues $\{\lambda _{A,i}(x)\}^n_{i = 1}$ of $A(x)$ in nondecreasing order are measurable, and the corresponding eigenvectors $\{u_{A,i}(x)\}^n_{i=1}$ can also be selected to be measurable and form an orthonormal basis of $\mathbb {R}^n$ for every $x \in \mathcal {X}$ .
2. The pseudoinverse $A^\dagger (x)$ of $A(x)$ is measurable, and the square root $A^{1/2}(x)$ of $A(x) \in \mathbb {S}^n_+$ is measurable.

The first and second properties are from [Reference Reid81] and [Reference Robertson and Rosenberg82] with the continuity of $A^{1/2}$ in $A \in \mathbb {S}^n_+$ , respectively. In fact, Powers–Størmer inequality [Reference Powers and Størmer80] gives

(2.5)

\begin{align} \big \lVert \sqrt {A} - \sqrt {B}\,\big \rVert _{\mathrm {F}}^2 \le \sqrt {n} \lVert A - B \rVert _{\mathrm {F}}\,, \quad \forall A,B \in \mathbb {S}^n_+\,. \end{align}

We finally recall some concepts and useful results from convex analysis. Let $f\,:\,X \to \mathbb {R} \cup \{+ \infty \}$ be an extended real-valued function on a Banach space $X$ . We denote by $\partial f(x)$ its subgradient at $x \in X$ and by $dom(f) \,:\!=\, f^{-1}(\mathbb {R})$ its domain. We say that $f$ is proper if $dom(f) \neq \varnothing$ ; and that $f$ is positively homogeneous of degree $k$ if for all $x \in X$ and $\alpha \gt 0$ , $f(\alpha x) = \alpha ^k f(x)$ . The conjugate function $f^*$ of $f$ is defined by

(2.6)

\begin{equation} f^*(x^*) = \sup _{x \in X} \langle x^*, x\rangle _X - f(x)\,, \quad \forall x^* \in X^*\,, \end{equation}

which is convex and lower semicontinuous with respect to the weak^* topology of $X^*$ . The following two lemmas are from [Reference Barbu and Precupanu4, Proposition 2.33] and [Reference Bouchitté12, Proposition 2.5], respectively.

Lemma 2.4 (Subgradient). Let $f\,:\,X \to \mathbb {R} \cup \{+\infty \}$ be a proper convex function on a Banach space $X$ . Then, the following three properties are equivalent: (i) $x^* \in \partial f(x)$ ; (ii) $f(x) + f^*(x^*) = \langle x^*,x\rangle _X$ ; (iii) $f(x) + f^*(x^*) \le \langle x^*,x\rangle _X$ . In addition, if $f$ is lower semicontinuous, then all of these properties are equivalent to $x \in \partial f^*(x^*)$ .

Lemma 2.5 (Fenchel–Rockafellar duality). Let $X$ and $Y$ be two Banach spaces and $L\,:\, X \to Y$ be a bounded linear operator with the adjoint $L^*\,:\, Y^* \to X^*$ . Let $f$ and $g$ be two proper lower semicontinuous convex functions defined on $X$ and $Y$ valued in $\mathbb {R} \cup \{+\infty \}$ , respectively. If there exists $x \in dom (f)$ such that $g$ is continuous at $Lx$ , then

(2.7)

\begin{equation} \sup _{x \in X} - f({-}x) - g(Lx) = \inf _{y^* \in Y^*}f^*(L^*y^*) + g^*(y^*)\,, \end{equation}

and the $\inf$ in (2.7) can be attained. Moreover, the $\sup$ in (2.7) is attained at $x \in X$ if and only if there exists a $y^* \in Y^*$ such that $L x \in \partial g^*(y^*)$ and $L^* y^* \in \partial f({-}x)$ , in which case $y^*$ also achieves the $\inf$ in (2.7).

3. Definition and basic properties

We shall introduce a new family of distances on the matrix-valued Radon measure space $\mathcal {M}(\Omega, \mathbb {S}^n_+)$ based on a dynamic OT formulation, which will be the central object of this work.

3.1. Action functional

To define our dynamic OT model over the space of $\mathbb {S}^n_+$ -valued measures, the starting point is a weighted action functional. Let $n, k, m \in \mathbb {N}$ be positive integers and $\Lambda \,:\!=\, (\Lambda _1,\Lambda _2)$ be a pair of matrices with $\Lambda _1 \in \mathbb {S}^k_+$ and $\Lambda _2 \in \mathbb {S}_+^m$ . We define the following closed convex set:

(3.1)

\begin{align} \mathcal {O}_\Lambda = \Big \{(A,B,C) \in \mathbb {S}^n \times \mathbb {R}^{n \times k} \times \mathbb {R}^{n \times m}\,;\ A + \frac {1}{2} B \Lambda ^2_1 B^{\mathrm {T}} + \frac {1}{2} C \Lambda ^2_2 C^{\mathrm {T}} \preceq 0 \Big \}\,. \end{align}

Note that its characteristic function:

\begin{equation*} \iota _{\mathcal {O}_\Lambda } \,:\!=\, \begin{cases} 0, & (A,B,C) \in \mathcal {O}_\Lambda \,,\\ + \infty, & (A,B,C) \notin \mathcal {O}_\Lambda \,,\\ \end{cases} \end{equation*}

is proper lower semicontinuous and convex [Reference Bauschke and Combettes6, Lemma 1.24]. We denote by $J_\Lambda$ the conjugate function (2.6) of $\iota _{\mathcal {O}_\Lambda }$ and derive the explicit expressions for $J_\Lambda$ and its subgradient $\partial J_\Lambda$ .

Proposition 3.1. $J_\Lambda$ is proper, positively homogeneous of degree one, lower semicontinuous and convex with the following representation:

(3.2)

\begin{equation} J_\Lambda (X, Y, Z) = \frac {1}{2} (Y \Lambda _1^\dagger ) \cdot (X^{\dagger } Y \Lambda _1^\dagger ) + \frac {1}{2} (Z \Lambda _2^\dagger ) \cdot (X^{\dagger } Z \Lambda _2^\dagger )\,, \end{equation}

if $X \in \mathbb {S}_+^n$ , $\mathrm {Ran} (Y^{\mathrm {T}}) \subset \mathrm {Ran} ( \Lambda _1)$ , $\mathrm {Ran} (Z^{\mathrm {T}}) \subset \mathrm {Ran} (\Lambda _2)$ and $\mathrm {Ran} ([Y,Z]) \subset \mathrm {Ran}(X)$ ; otherwise, $J_\Lambda (X, Y, Z) = +\infty$ . Moreover, the subgradient of $J_\Lambda$ at $(X,Y, Z) \in dom (J_\Lambda )$ is characterised by

(3.3)

\begin{equation} \partial J_\Lambda (X,Y, Z) = \Big \{(A,B,C) \in \mathcal {O}_\Lambda \,; \ Y = X B \Lambda ^2_1\,, \ Z = X C \Lambda ^2_2\,, \ X \cdot \Big (A + \frac {1}{2} B \Lambda _1^2 B^{\mathrm {T}} + \frac {1}{2} C \Lambda ^2_2 C^{\mathrm {T}} \Big ) = 0 \Big \}\,. \end{equation}

$\partial J_\Lambda (X, Y, Z)$ is a singleton if and only if $(X, Y, Z) \in \mathbb {S}^n_{++} \times \mathbb {R}^{n \times k} \times \mathbb {R}^{n \times m}$ and $\Lambda _1 \in \mathbb {S}_{++}^k$ , $\Lambda _2 \in \mathbb {S}_{++}^m$ .

Proof. The properties of $J_\Lambda$ are by [Reference Bauschke and Combettes6, Proposition 14.11]. To derive the formula (3.2), by definition, we have

(3.4)

\begin{align} J_\Lambda (X, Y, Z) &= \sup _{(A,B,C) \in \mathcal {O}_\Lambda } X \cdot A + Y \cdot B + Z \cdot C \,, \end{align}

for $(X, Y, Z) \in \mathbb {S}^n \times \mathbb {R}^{n \times k} \times \mathbb {R}^{n \times m}$ . We consider the following four cases.

Case I: $X \in \mathbb {S}^n\backslash \mathbb {S}_+^n$ . We choose a vector $a \in \mathbb {R}^n$ such that $\langle a, X a \rangle \lt 0$ and set $A = - \lambda a a^{\mathrm {T}}\preceq 0$ with $\lambda \gt 0$ , $B = 0$ and $C = 0$ in (3.4). Then it follows that

\begin{equation*} J_\Lambda (X, Y, Z) \ge \sup _{\lambda \gt 0} X \cdot ({-} \lambda a a^{\mathrm {T}}) = + \infty \,. \end{equation*}

Case II: $\mathrm {Ran} (Y^{\mathrm {T}}) \not \subset \mathrm {Ran} (\Lambda _1)$ or $\mathrm {Ran} (Z^{\mathrm {T}}) \not \subset \mathrm {Ran} (\Lambda _2)$ . It suffices to consider the case $\mathrm {Ran} (Y^{\mathrm {T}}) \not \subset \mathrm {Ran} (\Lambda _1)$ , since the same argument applies to the other one. Without loss of generality, we let $Y = [y_1,\ldots, y_n]^{\mathrm {T}}$ with $y_i \in \mathbb {R}^k$ and $y_1 \notin \mathrm {Ran} (\Lambda _1)$ . Thanks to $\Lambda _1 \in \mathbb {S}^k_+$ , $y_1$ has the orthogonal decomposition:

\begin{equation*} y_1 = y_1^{(1)} + y^{(2)}_1 \quad \text {with}\ y^{(1)}_1 \in \mathrm {Ran}(\Lambda _1)\,,\ y^{(2)}_1 \neq 0 \in \mathrm {Ker} (\Lambda _1)\,. \end{equation*}

Taking $A = 0$ , $B = \lambda \big [y_1^{(2)},0\big ]^{\mathrm {T}}$ with $\lambda \in \mathbb {R}$ and $C = 0$ in (3.4), we have

\begin{equation*} J_\Lambda (X, Y, Z) \ge \sup _{\lambda \gt 0} \lambda \big | y_1^{(2)} \big |^2 = + \infty \,. \end{equation*}

Case III: $\mathrm {Ran} ([Y, Z]) \not \subset \mathrm {Ran}(X)$ . It suffices to consider $\mathrm {Ran}( Y ) \not \subset \mathrm {Ran} (X)$ . We take $(A,B,C)$ in (3.4) as:

\begin{align*} A = - \frac {\lambda ^2}{2} (\mathbb {P}_{\mathrm {Ker}(X)} Y \Lambda _1) (\mathbb {P}_{\mathrm {Ker} (X)} Y \Lambda _1)^{\mathrm {T}}\,, \ B = \lambda \mathbb {P}_{\mathrm {Ker}(X)} Y\,,\ C = 0\,, \end{align*}

with $\lambda \gt 0$ , where $\mathbb {P}_{\mathrm {Ker}(X)}\,:\!=\, I - X^\dagger X$ is the orthogonal projection onto $\mathrm {Ker}(X)$ . A direct computation gives

\begin{align*} J_\Lambda (X,Y, Z) &\ge \sup _{(A,B,0) \in \mathcal {O}_\Lambda } X \cdot A + Y \cdot B \\ & \ge \sup _{\lambda \gt 0} - \frac { \lambda ^2}{2} (\mathbb {P}_{\mathrm {Ker}(X)} Y \Lambda _1)\cdot (X \mathbb {P}_{\mathrm {Ker}(X)} Y \Lambda _1) + \lambda Y \cdot ( \mathbb {P}_{\mathrm {Ker}(X)} Y) \\ & \ge \sup _{\lambda \gt 0} \lambda (\mathbb {P}_{\mathrm {Ker}(X)} Y) \cdot (\mathbb {P}_{\mathrm {Ker}(X)} Y) = + \infty \,, \end{align*}

since there holds $( \mathbb {P}_{\mathrm {Ker}(X)} Y) \cdot (\mathbb {P}_{\mathrm {Ker}(X)} Y) \gt 0$ by $\mathrm {Ran}( Y ) \not \subset \mathrm {Ran} (X)$ .

Case IV: $(X, Y,Z) \in \mathbb {S}_+^{n} \times \mathbb {R}^{n \times k} \times \mathbb {R}^{n \times m}$ with $\mathrm {Ran} (Y^{\mathrm {T}}) \subset \mathrm {Ran} (\Lambda _1)$ , $\mathrm {Ran} (Z^{\mathrm {T}}) \subset \mathrm {Ran} (\Lambda _2)$ and $\mathrm {Ran} ([Y, Z]) \subset \mathrm {Ran}(X)$ . For this case, we directly compute

(3.5)

\begin{align} X \cdot A + Y \cdot B + Z \cdot C = & X \cdot \Big (A + \frac {1}{2} B \Lambda ^2_1 B^{\mathrm {T}} + \frac {1}{2} C \Lambda ^2_2 C^{\mathrm {T}} \Big ) + Y \cdot B + Z \cdot C - X \cdot \Big ( \frac {1}{2} B \Lambda _1^2 B^{\mathrm {T}} + \frac {1}{2} C \Lambda ^2_2 C^{\mathrm {T}} \Big )\,, \end{align}

and

(3.6)

\begin{align} Y \cdot B + Z \cdot C - \frac {1}{2} X \cdot \big (B \Lambda _1^2 B^{\mathrm {T}} + C \Lambda ^2_2 C^{\mathrm {T}} \big ) = & - \frac {1}{2} \Big \lVert \sqrt {X}B \Lambda _1 - \sqrt {X}^{\dagger } Y \Lambda _1^\dagger \Big \lVert _{\mathrm {F}}^2 - \frac {1}{2} \Big \lVert \sqrt {X}C \Lambda _2 - \sqrt {X}^{\dagger } Z \Lambda _2^\dagger \Big \lVert _{\mathrm {F}}^2 \notag \\ & + \frac {1}{2 } \Big \lVert \sqrt {X}^{\dagger } Y \Lambda _1^\dagger \Big \lVert _{\mathrm {F}}^2 + \frac {1}{2} \Big \lVert \sqrt {X}^{\dagger } Z \Lambda _2^\dagger \Big \lVert _{\mathrm {F}}^2\,, \end{align}

where we have used

\begin{equation*} Y \cdot B + Z \cdot C = \big (\sqrt {X} \sqrt {X}^\dagger Y \Lambda _1^\dagger \Lambda _1 \big ) \cdot B + \big (\sqrt {X} \sqrt {X}^\dagger Z \Lambda _2^\dagger \Lambda _2 \big ) \cdot C \,,\end{equation*}

by the range relations: $\mathrm {Ran} (Y^{\mathrm {T}}) \subset \mathrm {Ran} (\Lambda _1)$ , $\mathrm {Ran} (Z^{\mathrm {T}}) \subset \mathrm {Ran} (\Lambda _2)$ and $\mathrm {Ran} ([Y,Z]) \subset \mathrm {Ran}(X)$ . Also, by (3.1), we have $ X \cdot \big (A + \frac {1}{2} B \Lambda _1^2 B^{\mathrm {T}} + \frac {1}{2} C \Lambda _2^2 C^{\mathrm {T}} \big ) \le 0$ . Hence, by (3.5) and (3.6), the maximisers to (3.4) are given by the set

(3.7)

\begin{equation} \Big \{(A,B,C) \in \mathcal {O}_\Lambda \,; \ Y = X B \Lambda ^2_1 \,, \ Z = X C \Lambda ^2_2\,, \ X \cdot \Big (A + \frac {1}{2} B \Lambda _1^2 B^{\mathrm {T}} + \frac {1}{2} C \Lambda ^2_2 C^{\mathrm {T}} \Big ) = 0 \Big \}\,, \end{equation}

and the corresponding supremum is (3.2).

Finally, to characterise the subgradient of $J_\Lambda$ , by Lemma2.4, we have that $(A,B,C) \in \partial J_\Lambda (X,Y,Z)$ if and only if $(A,B,C) \in \mathcal {O}_\Lambda$ and $ J_\Lambda (X,Y,Z) = X \cdot A + Y \cdot B + Z \cdot C$ holds. Then, (3.3) readily follows from the above argument. For the last statement, we note that $\partial J_\Lambda (X,Y,Z)$ is a singleton if and only if the equations in (3.3) for $(A,B,C)$ are uniquely solvable, which is equivalent to $\Lambda _1 \in \mathbb {S}_{++}^k$ , $\Lambda _2 \in \mathbb {S}_{++}^m$ and $X \in \mathbb {S}_{++}^n$ .

Similarly to the unbalanced WFR distance [Reference Chizat, Peyré, Schmitzer and Vialard27, Reference Kondratyev, Monsaingeon and Vorotnikov56, Reference Liero, Mielke and Savaré64], the variables $(X, Y, Z) \in \mathbb {S}^n \times \mathbb {R}^{n \times k} \times \mathbb {R}^{n \times m}$ in the infinitesimal cost $J_\Lambda (X, Y, Z)$ represent the mass, the momentum for the mass transportation and the source for the mass variation, respectively, in our transport problem (see Remark3.6 and Definition3.8). In what follows, we assume $m = n$ , since the dimensions of the mass $X \in \mathbb {S}^n$ and the source $Z \in \mathbb {R}^{n \times m}$ need to match. We shall also let $\Lambda _2 \in \mathbb {S}^n_{++}$ to avoid technical issues (see Remark3.10). Now, for a given triplet of measures $\mu \,:\!=\, \mathsf {(G,q, R)} \in \mathcal {M}(\mathcal {X}, \mathbb {S}^n \times \mathbb {R}^{n \times k} \times \mathbb {M}^n)$ , we define a positive measure $\mathcal {J}_{\Lambda } (\mu )$ on $\mathcal {X}$ by

(3.8)

\begin{equation} \mathcal {J}_{\Lambda }(\mu )(E)\,:\!=\, \int _E J_\Lambda \left (\frac {\mathrm {d} \mu }{\mathrm {d} \lambda }\right ) \mathrm {d} \lambda \,, \end{equation}

for a measurable set $E \in \mathscr {B}(\mathcal {X})$ , where $\lambda \in \mathcal {M}(\mathcal {X},\mathbb {R}_+)$ is a reference measure such that $|\mu | \ll \lambda$ . Thanks to the positive homogeneity of $J_\Lambda$ by Proposition3.1, the definition (3.8) of $\mathcal {J}_{\Lambda }$ is independent of the choice of $\lambda$ . To alleviate notations, we adopt the following conventions in the rest of this work.

• We define the space $\mathbb {X} \,:\!=\, \mathbb {S}^n \times \mathbb {R}^{n \times k} \times \mathbb {M}^n$ and then write $\mathcal {M}(\mathcal {X},\mathbb {X}) = \mathcal {M}(\mathcal {X}, \mathbb {S}^n \times \mathbb {R}^{n \times k} \times \mathbb {M}^n) = C(\mathcal {X},\mathbb {X})^*$ , where $C(\mathcal {X},\mathbb {X}) = C(\mathcal {X}, \mathbb {S}^n \times \mathbb {R}^{n \times k} \times \mathbb {M}^n)$ .
• We often write $\mu$ for $\mathsf {(G,q,R)} \in \mathcal {M}(\mathcal {X},\mathbb {X})$ for short, which will be clear from the context.
• We write $\mathcal {J}_{\Lambda }(\mu )(E)$ as $\mathcal {J}_{\Lambda, E}(\mu )$ for short. Then, $\mathcal {J}_{\Lambda, \mathcal {X}}(\mu )$ denotes the total measure $\mathcal {J}_{\Lambda }(\mu )(\mathcal {X})$ .
• We denote by $(G_\lambda, q_\lambda, R_\lambda )$ the density of $\mathsf {(G,q,R)} \in \mathcal {M}(\mathcal {X},\mathbb {X})$ with respect to a reference measure $\lambda \in \mathcal {M}(\mathcal {X},\mathbb {R}_+)$ such that $|\mathsf {(G,q,R)}| \ll \lambda$ . The subscript $\lambda$ of $(G_\lambda, q_\lambda, R_\lambda )$ will often be omitted for simplicity.
• The generic positive constant $C$ involved in the estimates below may change from line to line.

Definition 3.2. We define the $\Lambda$ -weighted action functional for a measure $\mu \in \mathcal {M}(\mathcal {X}, \mathbb {X})$ by $\mathcal {J}_{\Lambda, \mathcal {X}}(\mu )$ .

By Proposition3.1 and the formula (3.8), we have the following useful lemma.

Lemma 3.3. For $\mu = \mathsf {(G,q,R)} \in \mathcal {M}(\mathcal {X},\mathbb {X})$ with $\mathcal {J}_{\Lambda, \mathcal {X}}(\mu ) \lt + \infty$ , we have $\mathsf {G} \in \mathcal {M}(\mathcal {X},\mathbb {S}_+^n)$ and $|(\mathsf {q}, \mathsf {R})| \ll Tr \mathsf {G}$ with

(3.9)

\begin{equation} G_\lambda \in \mathbb {S}_+^n,\ \mathrm {Ran}\left ([q_\lambda, R_\lambda ]\right ) \subset \mathrm {Ran} \left (G_\lambda \right )\!, \ \mathrm {Ran}(q_\lambda ^{\mathrm {T}}) \subset \mathrm {Ran}( \Lambda _1), \ \mathrm {Ran}(R_\lambda ^{\mathrm {T}}) \subset \mathrm {Ran}(\Lambda _2),\quad \text {$\lambda $-a.e.}\,. \end{equation}

Proof. By $\mathcal {J}_{\Lambda, \mathcal {X}}(\mu ) = \int _{\mathcal {X}} J_\Lambda (\mu _\lambda )\, \mathrm {d} \lambda \lt + \infty$ , $J_\Lambda (\mu _\lambda )$ is finite for $\lambda$ -a.e. $x \in \mathcal {X}$ , where $\mu _\lambda = (G_\lambda, q_\lambda, R_\lambda )$ . It means that $\mu _\lambda (x) \in dom(J_{\Lambda })$ holds $\lambda$ -a.e., which immediately gives (3.9) by Proposition3.1. We next show the absolute continuity of $|\mathsf {q}|$ and $|\mathsf {R}|$ with respect to $Tr \mathsf {G}$ , that is, for $E \in \mathscr {B}(\mathcal {X})$ with $Tr \mathsf {G} (E) = 0$ , we have $|\mathsf {q}|(E) = |\mathsf {R}|(E) = 0$ . For this, we consider two measurable subsets $E_1$ and $E_2$ of $E$ with $E = E_1 \cup E_2$ :

\begin{equation*} E_1 = \left \{x \in E\,;\ G_\lambda (x) \in \mathbb {S}_+^n\backslash \{0\}\right \},\quad E_2 = \left \{x \in E\,;\ G_\lambda (x) = 0\right \}. \end{equation*}

By $Tr \mathsf {G}(E_1) = 0$ and $Tr G_\lambda \gt 0$ on $E_1$ everywhere, we have $\lambda (E_1) = 0$ . Then $|\mathsf {q}|(E_1) = 0$ and $|\mathsf {R}|(E_1) = 0$ follows from $|\mathsf {q}|, |\mathsf {R}| \ll \lambda$ . Moreover, by (3.9) and $G_\lambda = 0$ on $E_2$ , we have $q_\lambda (x) = 0$ and $R_\lambda (x) = 0$ for $\lambda$ -a.e. $x \in E_2$ . Then it follows that $|\mathsf {q}|(E_2) = 0$ and $|\mathsf {R}|(E_2) = 0$ . The proof is complete.

3.2. Continuity equation

Another key ingredient for the dynamic OT formulation is a matricial continuity equation; see Definition3.4 below. Let us fix more notations.

• Let $\Omega \subset \mathbb {R}^d$ be a compact set with a nonempty interior, a smooth boundary $\partial \Omega$ and the exterior unit normal vector $\nu = (\nu _1,\ldots, \nu _d)$ . We denote by $Q_a^b \,:\!=\, [a,b] \times \Omega \subset \mathbb {R}^{1 + d}$ with $b \gt a \gt 0$ the associated time-space domain. If $[a,b] = [0,1]$ , we simply write it as $Q$ .
• For a function $\Phi (t,x)$ on $Q_a^b$ , we write $\Phi _t(\cdot ) \,:\!=\, \Phi (t,\cdot )$ if we regard it as a family of functions $\{\Phi _t\}_{t \in [a,b]}$ in $x$ .
• We denote by $\pi ^t\,:\, (t,x) \to t$ the projection. We use the subscript $\#$ to denote the pushforward by a map. For instance, for a measure $\mu$ on $Q_a^b$ , $\pi ^t_\# \mu = \mu \circ (\pi ^t)^{-1}$ is the pushforward measure on $[a,b]$ .
• Let $X$ and $Y$ be two Banach spaces. We denote by $\mathcal {L}(X,Y)$ the space of continuous linear operators from $X$ to $Y$ (simply $\mathcal {L}(X)$ if $X = Y$ ) and by $C_c^\infty (\mathbb {R}^d,X)$ the $X$ -valued smooth functions with compact support. We also need $C^k$ -smooth functions $C^k(\Omega, X)$ , where we assume that the derivatives exist in the interior of $\Omega$ and can be continuously extended to the boundary. The norm on $C^k(\Omega, X)$ is defined by $\lVert \Phi \rVert _{k,\infty } \,:\!=\, \sum _{|\alpha | \le k} \sup _{x \in \Omega } \lVert D^\alpha \Phi (x) \rVert$ . Other similar notations are interpreted accordingly.
• We recall the indicator function of a set $A$ :
(3.10) \begin{equation} \chi _A(x) = \begin{cases} 1, & \text {if}\ x \in A\,,\\ 0, & \text {if}\ x \notin A\,. \end{cases} \end{equation}
• We use $\widehat{\cdot}$ to denote the Fourier transform of a function, or the symbol of a constant coefficient linear differential operator.

Let $\mathsf {D}^*\,:\,C_c^\infty (\mathbb {R}^d, \mathbb {S}^n) \to C_c^\infty (\mathbb {R}^d, \mathbb {R}^{n \times k})$ be a general first-order constant coefficient linear differential operator satisfying $\mathsf {D}^*(I) = 0$ . That is, for a matrix-valued function $\Phi \in C_c^\infty (\mathbb {R}^d, \mathbb {S}^n)$ with components $\{\Phi _{ij}\}_{i,j = 1}^n$ , we have

(3.11)

\begin{equation} \mathsf {D}^*(\Phi _{ij}(e_{ij} + e_{ji})) = A_0^{ij}\Phi _{ij}(x) + \sum _{l = 1}^d A_l^{ij} \partial _{x_l} \Phi _{ij}(x)\,, \quad i \le j\,, \end{equation}

for some matrices $\{A_l^{ij}\}_{l = 0}^d \subset \mathbb {R}^{n \times k}$ , and there holds $\sum _{i = 1}^n A_0^{ii} = 0$ . Here $e_{ij}$ is the $n \times n$ matrix unit with $1$ at the $(i,j)$ -entry. By Fourier transform, the operator $\mathsf {D}^*$ can be equivalently characterised by

(3.12)

\begin{equation} \mathsf {D}^* (\Phi )(x) = \int _{\mathbb {R}^d} \widehat {\mathsf {D}^*}(\xi )\big [\widehat {\Phi }(\xi )\big ] e^{\mathrm {i} \xi \cdot x}\, \mathrm {d} \xi \,, \quad \Phi \in C_c^\infty (\mathbb {R}^d, \mathbb {S}^n)\,, \end{equation}

where $\widehat {\Phi }(\xi )$ is the Fourier transform of $\Phi$ :

\begin{equation*} \widehat {\Phi }(\xi ) = \frac {1}{(2\pi )^d} \int _{\mathbb {R}^d} \Phi (x) e^{-\mathrm {i}\xi \cdot x} \,\mathrm {d} x\,, \end{equation*}

and $\widehat {\mathsf {D}^*}(\xi )\,:\, \mathbb {R}^d \to \mathcal {L}(\mathbb {S}^n,\mathbb {R}^{n \times k})$ is the symbol of $\mathsf {D}^*$ such that for any $X \in \mathbb {S}^n$ and $Y \in \mathbb {R}^{n \times k}$ , $Y \cdot \widehat {\mathsf {D}^*}(\xi )[X]$ is a first-order polynomial in $\xi$ . We write $\widehat {\mathsf {D}^*}(\xi )$ as the sum of its homogeneous components: $\widehat {\mathsf {D}^*}(\xi ) = \widehat {\mathsf {D}^*_0} + \widehat {\mathsf {D}^*_1}(\xi )$ , where $\widehat {\mathsf {D}^*_0}$ and $\widehat {\mathsf {D}^*_1}(\xi )$ are homogeneous of degree $0$ and $1$ , respectively: for $X = (X_{ij}) \in \mathbb {S}^n$ ,

\begin{equation*} \widehat {\mathsf {D}^*_0}[X] = \frac {1}{2} \sum _{i = 1}^n A_0^{ii} X_{ii} + \sum _{i \lt j} A_0^{ij} X_{ij}\,, \end{equation*}

and

\begin{equation*} \widehat {\mathsf {D}^*_1}(\xi )[X] = \frac {\mathrm {i}}{2} \sum _{l = 1}^d\sum _{i = 1}^n A_l^{ii} \xi _l X_{ii} + \mathrm {i} \sum _{l = 1}^d\sum _{i \lt j} A_l^{ij} \xi _l X_{ij}\,, \end{equation*}

with matrices $A_{l}^{ij}$ given in (3.11). Then, noting that the Fourier transform of $I$ is $\delta _0 I$ , it is easy to see that the condition $\mathsf {D}^*(I) = 0$ is equivalent to $\widehat {\mathsf {D}^*}(0)(I) = \widehat {\mathsf {D}_0^*}(I) = \frac {1}{2} \sum _{i = 1}^n A_0^{ii} = 0$ .

By abuse of notation, we define $\mathsf {D}^*\Phi$ for functions $\Phi (t,x)$ on $\mathbb {R}^{1 + d}$ by acting $\mathsf {D}^*$ on the spatial variable $x$ . Moreover, we define the operator $\mathsf {D}$ as the adjoint operator of $ - \mathsf {D}^*$ in the sense of distribution, which can be viewed as a bdivergence operator that maps the momentum to the mass (see equation (3.14)). We similarly denote by $\mathsf {D}_0$ and $\mathsf {D}_1$ the homogeneous parts of degree $0$ and $1$ of the operator $\mathsf {D}$ , respectively.

Example 3.1. A simple example of $\mathsf {D}$ is the entry-wise transport, in which case the mass transportation between components is forbidden. To be precise, for $\mathsf {q} \in \mathcal {M}(Q, \mathbb {R}^{n \times n \times d})$ , we regard $\mathsf {q}$ as a collection of $\mathbb {R}^d$ -valued measures $\{\mathsf {q}_{ij}\}_{i,j =1}^n \subset \mathcal {M}(Q, \mathbb {R}^{d})$ , and define

\begin{equation*} \mathsf {D}(\mathsf {q}) = (\mathrm{div} \mathsf {q})^{\mathrm {sym}} = \frac {\mathrm {div} \mathsf {q} + (\mathrm {div} \mathsf {q})^{\mathrm {T}}}{2}\,, \end{equation*}

where the standard divergence is applied to each $q_{ij}$ , i.e., $(\mathrm {div} \mathsf {q})_{ij} \,:\!=\, \mathrm {div} q_{ij}$ . Then, the adjoint $\mathsf {D}^*$ is simply given by the gradient that acts on $\Phi \in C_c^\infty (\mathbb {R}^d, \mathbb {S}^n)$ component-wisely: $\mathsf {D}^* \Phi = (\nabla \Phi _{ij})_{ij}$ . More examples with discussion can be found in Section 7 .

Definition 3.4. A measure $\mathsf{G} \in \mathcal {M}(Q_a^b, \mathbb {S}^n)$ connects $\mathsf {G}_a, \mathsf {G}_b \in \mathcal {M}(\Omega, \mathbb {S}^n_+)$ over the time interval $[a,b]$ , if there exists $\mathsf {(q, R)} \in \mathcal {M}(Q_a^b, \mathbb {R}^{n \times k} \times \mathbb {M}^n)$ satisfying the following general matrix-valued continuity equation:

(3.13)

\begin{equation} \int _{Q_a^b} \partial _t\Phi \cdot \mathrm {d} \mathsf {G} + \mathsf {D}^* \Phi \cdot \mathrm {d} \mathsf {q} + \Phi \cdot \mathrm {d} \mathsf {R} = \int _{\Omega } \Phi _b \cdot \mathrm {d} \mathsf {G}_b - \int _{\Omega } \Phi _a \cdot \mathrm {d} \mathsf {G}_a\,,\quad \forall \Phi \in C^1(Q_a^b,\mathbb {S}^n)\,. \end{equation}

The measures $\mathsf {G}_a$ and $\mathsf{G}_b$ are referred to as the initial and final distributions of $\mathsf {G}$ , respectively. Moreover, we denote by $\mathcal {CE}([a,b];\,\mathsf {G}_a, \mathsf {G}_b)$ the set of the measures $\mathsf {(G,q, R)} \in \mathcal {M}(Q_a^b,\mathbb {X})$ satisfying (3.13).

Remark 3.5. It is easy to derive the distributional equation of (3.13):

(3.14)

\begin{equation} \partial _t \mathsf {G} + \mathsf {D}\mathsf {q} = \mathsf {R}^{\mathrm {sym}}\,, \end{equation}

with the measure $\mathsf {q}$ satisfying a homogeneous boundary condition on $\partial \Omega$ . Indeed, assume that $\mathsf {q}$ admits a smooth density $q$ with respect to the Lebesgue measure. Note that for $\mathsf {D}^* = a + \partial _{x_i}$ with $\mathsf {D} = - a + \partial _{x_i}$ ( $a \in \mathbb {R}$ ), a direct integration by parts gives, for smooth real functions $f,g$ on $\Omega$ ,

\begin{equation*} \int _{\Omega } ((a + \partial _{x_i})f(x)) g(x) + f(x) ({-}a + \partial _{x_i}) g(x) \, \mathrm {d} x = \int _{\partial \Omega } \nu _i f(x) g(x) \, \mathrm {d} x\,. \end{equation*}

We then have, by linearity and noting $\widehat {\partial _{x_k}} = \mathrm {i} \xi _k$ , for a general $\mathsf {D}^*$ ,

\begin{equation*} \int _\Omega \mathsf {D} q \cdot \Phi + q \cdot \mathsf {D}^* \Phi \, \mathrm {d} x = \int _{\partial \Omega } q \cdot \widehat {\mathsf {D}_1^*}({-}\mathrm {i} \nu )(\Phi ) \, \mathrm {d} x = \int _{\partial \Omega } \widehat {\mathsf {D}_1}({-}\mathrm {i} \nu )(q) \cdot \Phi \, \mathrm {d} x\,, \quad \forall \Phi \in C^1(\Omega, \mathbb {S}^n)\,. \end{equation*}

It follows that the boundary condition $\widehat {\mathsf {D}_1}({-}\mathrm {i} \nu )(q) = 0$ holds for $\mathsf {q}$ satisfying (3.13). In the case of $\mathsf {D} = \mathrm {div}$ for $\mathsf {q} \in \mathcal {M}(Q, \mathbb {R}^{d})$ , we see that $\widehat {\mathsf {D}_1}({-}\mathrm {i} \nu )(q) = 0$ is the familiar no-flux boundary condition $\nu \cdot q = 0$ .

Remark 3.6. We give an intuitive interpretation of (3.14) as a continuity equation. Recall the homogeneous parts $\mathsf {D}_0$ and $\mathsf {D}_1$ of $\mathsf {D}$ with $\mathsf{D}_0 \in \mathcal {L}(\mathbb{R}^{n \times k},\mathbb{S}^n)$ and $\mathsf {D}_1$ vanishing when acting on constant functions. It allows us to split $\mathsf {D}\mathsf {q}$ into two parts: $\mathsf {D}_0\mathsf {q}$ and $\mathsf {D}_1\mathsf {q}$ , where $\mathsf {D}_0\mathsf {q}$ and $\mathsf {D}_1\mathsf {q}$ describe the mass transportation between components of $\mathsf {G}$ and the transportation in space, respectively. Moreover, the condition $\mathsf {D}^*(I) = 0$ can be regarded as a conservativity condition in the sense that if $\mathsf {R} = 0$ , then $Tr\mathsf {G}_t(\Omega ) = Tr \mathsf {G}_0(\Omega )$ for any $t$ ; see Proposition 3.13.

The following elementary lemma gives the absolute continuity of the time marginal of $\mathsf {G}$ .

Lemma 3.7. Let $\mathsf {(G,q,R)} \in \mathcal {CE}([a,b];\,\mathsf {G}_a,\mathsf {G}_b)$ with $\mathsf {G}_a, \mathsf {G}_b \in \mathcal {M}(\Omega, \mathbb {S}^n_+)$ . It holds that $\pi ^t_\# \mathsf {G} \in \mathcal {M}([a,b],\mathbb {S}^n)$ has the distributional derivative $(\pi _\#^t \mathsf {R})^{\mathrm {sym}} \in \mathcal {M}([a,b],\mathbb {S}^n)$ in $t$ . If, further, $\mathsf {G} \in \mathcal {M}(Q_a^b,\mathbb {S}_+^n)$ is a positive semi-definite matrix-valued measure over $Q_a^b$ , then $\pi ^t_\# |\mathsf {G}| \ll \mathrm {d} t$ .

Proof. It suffices to consider $[a,b] = [0,1]$ . By (3.13) with test functions $\Phi (t,x) = \phi (t) \in C_c^1((0,1),\mathbb {S}^n)$ , we have

(3.15)

\begin{equation} \int _0^1 \partial _t \phi \cdot \mathrm {d} \pi ^t_\# \mathsf {G} + \phi \cdot \mathrm {d} \pi _\#^t \mathsf {R} = 0\,, \end{equation}

which implies that $(\pi _\#^t \mathsf {R})^{\mathrm {sym}}$ is the distributional derivative of $\pi _\#^t \mathsf {G}$ . Note that $\pi ^t_\# \mathsf {G}$ and $\pi ^t_\# \mathsf {R}$ are Radon measures (since every finite Borel measure on $[0,1]$ is regular). There exists a matrix-valued bounded variation function $M(t)$ that generates the Radon measure $\pi ^t_\# \mathsf {R}$ [Reference Gerald42, Theorem 3.29]. It follows from (3.15) that

(3.16)

\begin{equation} \mathrm {d} \pi ^t_\# \mathsf {G} = (M(t)^{\mathrm {sym}} + C)\, \mathrm {d} t\,, \end{equation}

for some $C \in \mathbb {S}^n$ [Reference Gerald42, Theorem 3.36]. bIf $\mathsf {G} \in \mathcal {M}(Q_a^b,\mathbb {S}_+^n)$ , then (3.16) and (2.2) readily give $Tr \pi ^t_\# \mathsf {G} \sim |\pi ^t_\# \mathsf {G}| \ll \mathrm {d} t$ , which further yields $\pi ^t_\# |\mathsf {G}| \ll \mathrm {d} t$ by noting $Tr \pi ^t_\# \mathsf {G} = \pi ^t_\# Tr \mathsf {G} \sim \pi ^t_\# |\mathsf {G}|$ .

3.3. Weighted Wasserstein–Bures distance

We are now ready to define a class of distances on $\mathcal {M}(\Omega, \mathbb {S}^n_+)$ by minimising the action functional $\mathcal {J}_{\Lambda, Q}(\mu )$ over the solutions to the continuity equation (3.13).

Definition 3.8. The weighted Wasserstein–Bures distance between $\mathsf {G}_0, \mathsf {G}_1 \in \mathcal {M}(\Omega, \mathbb {S}_+^n)$ is defined by

(𝒫)

\begin{equation} \mathrm {WB}^2_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) = \inf _{\mu \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)} \mathcal {J}_{\Lambda, Q}(\mu ). \end{equation}

We remark that the quantity $\mathcal {J}_{\Lambda, Q}(\mu )$ can be understood as the energy of the measure $\mu \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ . The following a priori estimate shows that $\mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ is nonempty and $\rm {WB}_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1)$ is always finite, which means that the problem ( $\mathcal {P}$ ) is well defined.

Lemma 3.9. Given $\mathsf {G}_0,\mathsf {G}_1 \in \mathcal {M}(\Omega, \mathbb {S}^n_+)$ , let $\lambda \in \mathcal {M}(\Omega, \mathbb {R}_+)$ be a reference measure such that $|\mathsf {G}_0|, |\mathsf {G}_1| \ll \lambda$ . Then there exists $\mu = \mathsf {(G,0,R)} \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ with finite $\mathcal {J}_{\Lambda, Q}(\mu )$ . Moreover, it holds that

(3.17)

\begin{equation} \mathrm {WB}^2_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) \le \mathrm {WB}^2_{(0,\Lambda _2)}(\mathsf {G}_0,\mathsf {G}_1) \le 2 \big \lVert \Lambda _2^{-1} \big \lVert _{\mathrm {F}}^2 \int _{\Omega } \big \lVert \sqrt {G_{1,\lambda }} - \sqrt {G_{0,\lambda }} \big \lVert _{\mathrm {F}}^2 \ \mathrm {d} \lambda \,, \end{equation}

where $G_{0,\lambda }$ and $G_{1,\lambda }$ are densities of $\mathsf {G}_0$ and $\mathsf {G}_1$ with respect to $\lambda$ .

Proof. We omit the subscript $\lambda$ of $G_{0,\lambda }$ and $G_{1,\lambda }$ for simplicity. We define measures

\begin{equation*} \mathsf {G} \,:\!=\, \left (\sqrt {G_{0}} + t \Big (\sqrt {G_{1}} - \sqrt {G_{0}} \Big ) \right )^2 \mathrm {d} t \otimes \lambda \in \mathcal {M}(Q, \mathbb {S}^n_+)\,, \end{equation*}

and

\begin{equation*} \mathsf {R} \,:\!=\, 2 \left (\sqrt {G_{0}} + t \left (\sqrt {G_{1}} - \sqrt {G_{0}} \right ) \right ) \left (\sqrt {G_{1}} - \sqrt {G_{0}}\right ) \mathrm {d} t \otimes \lambda \subset \mathcal {M}(Q, \mathbb {M}^n)\,, \end{equation*}

which satisfies $\mu = \mathsf {(G,0,R)} \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ and $\mathrm {Ran} \big ( \frac {\mathrm {d} \mathsf {R}}{\mathrm {d} t \otimes \lambda } \big ) \subset \mathrm {Ran} \big ( \frac {\mathrm {d} \mathsf {G}}{\mathrm {d} t \otimes \lambda } \big )$ for $\mathrm {d} t \otimes \lambda$ -a.e. Moreover, we note

\begin{align*} \mathrm {Ran}\left (\sqrt {G_1} - \sqrt {G_0}\right )\subset \mathrm {Ran}\left (\sqrt {G_0} + t \left (\sqrt {G_1} - \sqrt {G_0}\right )\right )\,,\quad t \in (0,1)\,, \end{align*}

from the relation: $ \mathrm {Ker}\big (\sqrt {G_0} + t (\sqrt {G_1} - \sqrt {G_0})\big ) = \mathrm {Ker} \big (\sqrt {G_0}\big ) \cap \mathrm {Ker} \big (\sqrt {G_1}\big )\subset \mathrm {Ker}\big (\sqrt {G_1} - \sqrt {G_0}\big )$ . Then, we compute

(3.18)

\begin{equation} \mathcal {J}_{\Lambda, Q}(\mu ) = 2 \int _{\Omega } \Big \lVert \Big (\sqrt {G_1} - \sqrt {G_0}\Big ) \Lambda _2^{-1} \Big \lVert _{\mathrm {F}}^2\ \mathrm {d} \lambda \,, \end{equation}

for $\mu$ defined above. The proof is completed by the submultiplicativity of the Frobenius norm.

Remark 3.10. The proof of Lemma 3.9 uses $\mathrm {Ran} (\Lambda _2) = \mathbb {R}^n$ from the assumption $\Lambda _2 \in \mathbb {S}^n_{++}$ we made before (3.8). If we only assume $\Lambda _2 \in \mathbb {S}^n_{+}$ , the distance $\mathrm {WB}_\Lambda$ may be only well-defined (i.e., finite) on a subset of $\mathcal {M}(\Omega, \mathbb {S}^n_+)$ .

Remark 3.11. $\mathrm {WB}_{(0,\Lambda _2)}$ is the matricial Hellinger distance $d_H$ in [Reference Monsaingeon and Vorotnikov73 , Definition 4.1], up to a transformation. Indeed, recalling Lemma 3.3 , we have that if $\Lambda _1 = 0$ , then $\mathsf {q}$ must be zero and ( $\mathcal {P}$ ) reduces to

(3.19)

\begin{equation} \mathrm {WB}_{(0,\Lambda _2)}^2(\mathsf {G}_0,\mathsf {G}_1) = \inf \{\mathcal {J}_{(0,\Lambda _2),Q}(\mu )\,;\ \mu = \mathsf {(G,0,R)} \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)\}. \end{equation}

For a given $S \in \mathbb {S}^n_{++}$ , we introduce a linear map $g_{S}(A) \,:\!=\, S A S : \mathbb {S}^n_{+} \to \mathbb {S}^n_{+}$ with the inverse $g_{S^{-1}}$ . It is easy to see that $\mathsf {(G,0,R)} \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ if and only if $(g_{\Lambda _2^{-1}}(\mathsf {G}),0, g_{\Lambda _2^{-1}}(\mathsf {R})) \in \mathcal {CE}([0,1];\,g_{\Lambda _2^{-1}}(\mathsf {G}_0),g_{\Lambda _2^{-1}}(\mathsf {G}_1))$ , and there holds $\mathcal {J}_{(0,\Lambda _2),Q}(\mathsf {(G,0,R)}) = \mathcal {J}_{(0,I),Q}(g_{\Lambda _2^{-1}}(\mathsf {G}),0, g_{\Lambda _2^{-1}}(\mathsf {R}))$ . Therefore, we have

\begin{equation*} \mathrm {WB}_{(0,\Lambda _2)}(\mathsf {G}_0,\mathsf {G}_1) = \mathrm {WB}_{(0,I)}(g_{\Lambda _2^{-1}}(\mathsf {G}_0),g_{\Lambda _2^{-1}}(\mathsf {G}_1))\,. \end{equation*}

From [Reference Monsaingeon and Vorotnikov73 , Definition 4.1] and Theorem 4.5 below, one can see that $\mathrm {WB}_{(0,I)}$ is nothing else than the convex formulation of the Hellinger distance $d_H$ , up to a constant. We refer the readers to [Reference Monsaingeon and Vorotnikov73 , Lemma 4.3 and Theorem 2] for the properties of the Hellinger distance and its relation with the Bures-Wasserstein distance on $\mathbb {S}^n_+$ [10].

3.4. A priori estimate

Thanks to Lemma3.9, the optimisation (𝒫) can be equivalently taken over the following set:

\begin{equation*} \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)\,:\!=\, \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1) \bigcap \{\mu \in \mathcal {M}(Q,\mathbb {X});\, \mathcal {J}_{\Lambda, Q}(\mu ) \lt +\infty \}\,. \end{equation*}

Before we proceed, we give some auxiliary results. First, we introduce

(3.20)

\begin{equation} \mathcal {J}_{\Lambda, \mathcal {X}}^*(\mathsf {G},u,W) \,:\!=\, \frac {1}{2} \lVert (u \Lambda _1, W \Lambda _2) \rVert ^2_{L^2_{\mathsf {G}}(\mathcal {X})}\quad \text {on}\ \mathcal {M}(\mathcal {X},\mathbb {S}_+^n) \times C(\mathcal {X}, \mathbb {R}^{n \times k} \times \mathbb {M}^n)\,, \end{equation}

where $\lVert \cdot \rVert _{L^2_{\mathsf {G}}(\mathcal {X})}$ is defined by (2.3). By an argument similar to the one for Lemma4.1 below, we have that the conjugate function (2.6) of $\mathcal {J}^*_{\Lambda, \mathcal {X}}(\mathsf {G},u,W)$ with respect to $(u,W)$ is exactly $\mathcal {J}_{\Lambda, \mathcal {X}}(\mathsf {G},\mathsf {q},\mathsf {R})$ . Moreover, there holds

(3.21)

\begin{equation} \mathcal {J}_{\Lambda, \mathcal {X}}(\mathsf {G},\mathsf {q},\mathsf {R}) = \sup _{(u,W) \in L^\infty _{|\mathsf {(G,q,R)}|}(\mathcal {X}, \mathbb {R}^{n \times k} \times \mathbb {M}^n) } \langle (\mathsf {q},\mathsf {R}), (u,W) \rangle _{\mathcal {X}} - \mathcal {J}_{\Lambda, \mathcal {X}}^*(\mathsf {G},u,W)\,. \end{equation}

Since $\mathcal {J}_{\Lambda, \mathcal {X}}(\mathsf {G}, \mathsf {q},\mathsf {R})$ and $\mathcal {J}_{\Lambda, \mathcal {X}}^*(\mathsf {G}, u, W)$ are homogeneous of degree $2$ in $(\mathsf {q},\mathsf {R})$ and $(u,W)$ , respectively, by (3.21), it holds that for $\mathsf {(G,q,R)} \in \mathcal {M}(\mathcal {X},\mathbb {X})$ and $(u,W) \in L^\infty _{|\mathsf {(G,q,R)}|}(\mathcal {X}, \mathbb {R}^{n \times k} \times \mathbb {M}^n)$ ,

(3.22)

\begin{align} \langle (\mathsf {q},\mathsf {R}), (u,W)\rangle _{\mathcal {X}} \le \gamma ^{-2} \mathcal {J}_{\Lambda, \mathcal {X}}(\mathsf {G}, \mathsf {q}, \mathsf {R}) + \gamma ^2\mathcal {J}_{\Lambda, \mathcal {X}}^*(\mathsf {G},u, W)\,, \quad \forall \gamma \gt 0\,. \end{align}

We minimise the right-hand side of (3.22) with respect to $\gamma$ and obtain

(3.23)

\begin{align} \langle (\mathsf {q},\mathsf {R}), (u,W)\rangle _{\mathcal {X}} \le 2 \sqrt {\mathcal {J}_{\Lambda, \mathcal {X}}(\mathsf {G}, \mathsf {q}, \mathsf {R}) \mathcal {J}_{\Lambda, \mathcal {X}}^*(\mathsf {G},u, W)}\,, \end{align}

where we have used non-negativity of $\mathcal {J}_{\Lambda, \mathcal {X}}$ and $\mathcal {J}_{\Lambda, \mathcal {X}}^*$ .

Second, we observe from formulas (3.2) and (3.8) and Lemmas2.3 and 3.3 that for $\mu = (\mathsf {G},\mathsf {q},\mathsf {R}) \in \mathcal {M}(\mathcal {X},\mathbb {X})$ with $\mathcal {J}_{\Lambda, \mathcal {X}}(\mu ) \lt +\infty$ , the functions $G_\lambda ^\dagger q_\lambda \Lambda _1^\dagger$ and $G_\lambda ^\dagger R_\lambda \Lambda _2^{-1}$ are well defined, Borel measurable and independent of the reference measure $\lambda$ (hence we omit the subscript $\lambda$ in the sequel for simplicity), and there holds

(3.24)

\begin{align} \mathcal {J}_{\Lambda, \mathcal {X}}(\mu ) = \frac {1}{2} \lVert G^\dagger q \Lambda _1^\dagger \rVert ^2_{L^2_{\mathsf {G}}(\mathcal {X})} + \frac {1}{2} \lVert G^\dagger R \Lambda _2^{-1} \rVert ^2_{L^2_{\mathsf {G}}(\mathcal {X})} \lt +\infty \,. \end{align}

We now give useful a priori bounds for measures $\mathsf {q}$ and $\mathsf {R}$ .

Lemma 3.12. For $\mu = (\mathsf {G},\mathsf {q},\mathsf {R}) \in \mathcal {M}(\mathcal {X},\mathbb {X})$ with $\mathcal {J}_{\Lambda, \mathcal {X}}(\mu ) \lt +\infty$ , it holds that for $E \in \mathscr{B}(\mathcal {X})$ ,

(3.25)

\begin{align} |\mathsf {q}|(E) \le \sqrt {Tr \mathsf {G} (E)}\, \lVert \Lambda _1 \rVert _{\mathrm {F}} \lVert G^\dagger q \Lambda _1^\dagger \rVert _{L^2_{\mathsf {G}}(E)}\,,\quad |\mathsf {R}|(E) \le \sqrt {Tr \mathsf {G} (E)}\, \lVert \Lambda _2 \rVert _{\mathrm {F}} \lVert G^\dagger R \Lambda _2^{-1} \rVert _{L^2_{\mathsf {G}}(E)}\,. \end{align}

Proof. Recall that there exist bounded measurable functions $\sigma _{q}$ and $\sigma _{R}$ with $ \lVert \sigma _{q} \rVert _{\mathrm {F}} = \lVert \sigma _{R} \rVert _{\mathrm {F}} = 1$ such that $\mathrm {d} \mathsf {q} = \sigma _{q}\, \mathrm {d} |\mathsf {q}|$ and $\mathrm {d} \mathsf {R} = \sigma _{R} \,\mathrm {d} |\mathsf {R}|$ . Taking $\mathsf {R} = 0$ and $(u,W) = (\chi _E \sigma _q, 0)$ in (3.23) for $E \in \mathscr {B}(\mathcal {X})$ , we obtain

\begin{align*} |\mathsf {q}|(E) = \int _E u \cdot \mathrm {d} \mathsf {q} \le 2 \sqrt {\mathcal {J}_{\Lambda, E}(\mathsf {G}, \mathsf {q}, 0)\mathcal {J}_{\Lambda, E}^{*}(\mathsf {G},u, 0)} \le \sqrt {Tr \mathsf {G} (E) \lVert \Lambda _1 \rVert _{\mathrm {F}}^2} \lVert G^\dagger q \Lambda _1^\dagger \rVert _{L^2_{\mathsf {G}}(E)} \,, \end{align*}

by (3.24) and the following estimate derived from (3.20) and (2.4):

\begin{align*} \mathcal {J}_{\Lambda, E}^*(\mathsf {G},u, W) \le \frac {1}{2} Tr \mathsf {G}(E) \lVert \Lambda _1 \rVert _{\mathrm {F}}^2\,. \end{align*}

Similarly, by taking $\mathsf {q} = 0$ and $(u,W) = (0, \chi _E \sigma _R)$ in (3.23), we obtain the estimate for $\mathsf {R}$ in (3.25).

With the help of the above lemma, the following proposition holds.

Proposition 3.13. Let $\mu = \mathsf {(G,q,R)}\in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ with $\mathsf {G}_0,\mathsf {G}_1 \in \mathcal {M}(\Omega, \mathbb {S}^n_+)$ . Then,

(i) $\mathsf {G} \in \mathcal {M}(Q,\mathbb {S}^n_+)$ and $\pi _\#^t |\mathsf {G}| \ll \mathrm {d} t$ . Moreover, $\mu$ can be disintegrated as
(3.26) \begin{equation} \mu = \int _0^1 \delta _t \otimes (\mathsf {G}_t, \mathsf {q}_t, \mathsf {R}_t)\, \mathrm {d} t\,, \end{equation}
where $(\mathsf {G}_t, \mathsf {q}_t, \mathsf {R}_t) \in \mathcal {M}(\Omega, \mathbb {X})$ for $\mathrm {d} t$ -a.e. $t \in [0,1]$ .
(ii) There exists a weak^* continuous curve $\big \{\widetilde {\mathsf {G}}\big \}_{t \in [0,1]}$ in $\mathcal {M}(\Omega, \mathbb {S}^n_+)$ such that $\mathsf {G}_t = \widetilde {\mathsf {G}}_t$ for a.e. $t \in [0,1]$ and, for any interval $[t_0,t_1] \subset [0,1]$ , it holds that
(3.27) \begin{equation} \int _{Q_{t_0}^{t_1}} \partial _t\Phi \cdot \mathrm {d} \mathsf {G} + \mathsf {D}^* \Phi \cdot \mathrm {d} \mathsf {q} + \Phi \cdot \mathrm {d} \mathsf {R} = \int _{\Omega } \Phi _{t_1} \cdot \mathrm {d} \widetilde {\mathsf {G}}_{t_1} - \int _{\Omega } \Phi _{t_0} \cdot \mathrm {d} \widetilde {\mathsf {G}}_{t_0}\,, \quad \forall \Phi \in C^1(Q_{t_0}^{t_1},\mathbb {S}^n)\,. \end{equation}
Moreover, there holds, for some $C \gt 0$ ,
(3.28) \begin{align} Tr \widetilde {\mathsf {G}}_t(\Omega ) \le C \left (Tr \mathsf {G}_0 (\Omega ) + \lVert G^\dagger R \Lambda _2^{-1} \rVert ^2_{L_{\mathsf {G}}^2(Q)}\lVert \Lambda _2 \rVert _{\mathrm {F}}^2\right ),\quad \forall t \in [0,1]\,. \end{align}

Remark 3.14. By Proposition 3.13, we can identify a measure $\mu = (\mathsf {G}, \mathsf {q},\mathsf {R})\in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ with a family of measures $\{\mu _t = (\mathsf {G}_t, \mathsf {q}_t,\mathsf {R}_t)\}_{t \in [0,1]}$ in $\mathcal {M}(\Omega, \mathbb {X})$ via the disintegration (3.26), where $\mathsf {G}_t$ is weak^* continuous. We also remark that one can alternatively define the matrix-valued continuity equation (3.13) by testing against functions $\Phi \in C^1(Q,\mathbb {S}^n)$ compactly supported in $(0,1)\times \Omega$ as in [1, Chapter 8] (in this case the right-hand side of (3.13) vanishes), and consider its solution $\mu = (\mathsf {G}, \mathsf {q},\mathsf {R}) \in \mathcal {M}(Q,\mathbb {X})$ with finite energy $\mathcal {J}_{\Lambda, Q}(\mu ) \lt +\infty$ . In this setting, a similar analysis by disintegration shows that $\mathsf {G}$ still has the weak^* continuous representation $\{\mathsf {G}_t\}_{t \in [0,1]}$ , and then the initial and final distributions $\mathsf {G}_0$ and $\mathsf {G}_1$ can be obtained from the limits as $t \to 0$ and $t \to 1$ of $\mathsf {G}_t$ , respectively. In this work, we always stick to Definition 3.4 with temporal boundary conditions $\mathsf {G}_0$ and $\mathsf {G}_1$ to avoid any confusion.

Proof. (i) First, note from [Reference Ambrosio, Gigli and Savaré1, Theorem 5.3.1] that $\mu$ can be disintegrated with respect to $\nu = \pi ^t_\# |\mu |$ as $\mu = \int _0^1 \delta _t \otimes \mu _t\, \mathrm {d} \nu$ , where $\mu _t \in \mathcal {M}(\Omega, \mathbb {X})$ for $\nu$ -a.e. $t \in [0,1]$ . Then, by Lemmas3.3 and 3.7, we have $\mathsf {G} \in \mathcal {M}(Q,\mathbb {S}^n_+)$ and $\nu \ll \pi ^t_\# | \mathsf {G}| \ll \mathrm {d} t$ on $[0,1]$ , which allows us to define $\widetilde {\mu }_t \,:\!=\, \mu _t \frac {\mathrm {d} \nu }{\mathrm {d} t}$ and disintegrate $\mu$ as $\mu = \int _0^1 \delta _t \otimes \widetilde {\mu }_t\, \mathrm {d} t$ .

(ii) Consider test functions $\Phi = a(t)\Psi (x)$ in (3.13) with $a(t) \in C_c^1((0,1),\mathbb {R})$ and $\Psi (x) \in C^1(\Omega, \mathbb {S}^n)$ . Then, by (3.26), $\int _{\Omega } \Psi \cdot \mathrm {d} \mathsf {G}_t$ is absolutely continuous in $t$ with the weak derivative:

(3.29)

\begin{equation} \partial _t \langle \mathsf {G}_t, \Psi \rangle _{\Omega } = \langle \mathsf {q}_t, \mathsf {D}^* \Psi \rangle _{\Omega } + \langle \mathsf {R}_t, \Psi \rangle _{\Omega } \,. \end{equation}

Letting $\Psi = I$ in (3.29), we obtain $\partial _t Tr \mathsf {G}_t(\Omega ) = Tr \mathsf {R}^{\mathrm {sym}}_t(\Omega )$ a.e. by $\mathsf {D}^*(I) = 0$ , which implies that there exists a nonnegative function $m(t) \in C([0,1],\mathbb {R})$ such that $Tr \mathsf {G}_t (\Omega ) = m(t)$ a.e. on $[0,1]$ and

(3.30)

\begin{equation} m(t) - m(s) = \int _s^t Tr \mathsf {R}^{\mathrm {sym}}_{\tau }(\Omega )\, \mathrm {d} \tau \,, \quad \forall 0\le s \le t \le 1\,. \end{equation}

By Lemma3.12, it follows from (3.30) that, from some $C \gt 0$ ,

(3.31)

\begin{equation} |m(t) - m(s)| \le C |\mathsf {R}|(Q) \le C \sqrt {Tr \mathsf {G} (Q)} \lVert \Lambda _2 \rVert _{\mathrm {F}} \lVert G^\dagger R \Lambda _2^{-1} \rVert _{L^2_{\mathsf {G}}(Q)}\,. \end{equation}

We choose $t_0$ such that $m(t_0) = \max _{t \in [0,1]} m(t)$ . Then (3.31) implies

\begin{equation*} m(t_0) \le m(0) + C \sqrt {m(t_0)} \lVert \Lambda _2 \rVert _{\mathrm {F}} \lVert G^\dagger R \Lambda _2^{-1} \rVert _{L^2_{\mathsf {G}}(Q)}\,, \end{equation*}

which further gives, by an elementary calculation,

(3.32)

\begin{align} \Big (m(t_0)^{1/2} - \frac {C}{2}\lVert G^\dagger R \Lambda _2^{-1} \rVert _{L_{\mathsf {G}}^2(Q)} \lVert \Lambda _2 \rVert _{\mathrm {F}}\Big )^2 \le m(0) + \frac {C^2}{4} \lVert G^\dagger R \Lambda _2^{-1} \rVert ^2_{L_{\mathsf {G}}^2(Q)} \lVert \Lambda _2 \rVert _{\mathrm {F}}^2\,. \end{align}

Then we have

(3.33)

\begin{equation} m(t) \le C \big (m(0) + \lVert G^\dagger R \Lambda _2^{-1} \rVert ^2_{L_{\mathsf {G}}^2(Q)}\lVert \Lambda _2 \rVert _{\mathrm {F}}^2\big )\,. \end{equation}

With the above estimates, the existence of a weak^* continuous representative of $\mathsf {G}_t$ and the formula (3.27) can be proved similarly to [Reference Ambrosio, Gigli and Savaré1, Lemma 8.1.2]. We sketch the argument for completeness. By (3.25) and (3.33), as well as (3.29), there exists a subset $E \in [0,1]$ of Lebesgue measure zero such that $Tr \mathsf {G}_t (\Omega ) = m(t)$ on $[0,1]\backslash E$ , and there holds, for any $t,s \in [0,1]\backslash E$ with $s \lt t$ and $\Psi \in C^1(\Omega, \mathbb {S}^n)$ ,

(3.34)

\begin{align} | \langle \mathsf {G}_t, \Psi \rangle _{\Omega } - \langle \mathsf {G}_s, \Psi \rangle _{\Omega }| & \le C \lVert \Psi \rVert _{1,\infty } \big (|\mathsf {q}|(Q_s^t) + |\mathsf {R}|(Q_s^t)\big ) \\ & \le C |t - s|^{1/2} \big (m(0) + \lVert G^\dagger q \Lambda _1^\dagger \rVert ^2_{L_{\mathsf {G}}^2(Q)} \lVert \Lambda _1 \rVert _{\mathrm {F}}^2 + \lVert G^\dagger R \Lambda _2^{-1} \rVert ^2_{L_{\mathsf {G}}^2(Q)}\lVert \Lambda _2 \rVert _{\mathrm {F}}^2\big ) \lVert \Psi \rVert _{1,\infty }\,.\nonumber \end{align}

The estimate (3.34) allows us to uniquely extend $\{\mathsf {G}_t\}_{t \in [0,1]\backslash E}$ to a weak^* continuous curve $\{\widetilde {\mathsf {G}}_t\}_{t \in [0,1]}$ in $C^1(\Omega, \mathbb {S}^n)^*$ . Then, by the density of $C^1(\Omega, \mathbb {S}^n)$ in $C(\Omega, \mathbb {S}^n)$ and the boundedness (3.33) of $\{Tr \widetilde {\mathsf {G}}_t (\Omega )\}_{t \in [0,1]}$ , the curve $\{\widetilde {\mathsf {G}}_t\}_{t \in [0,1]}$ is also weak^* continuous in $\mathcal {M}(\Omega, \mathbb {S}^n)$ . The formula (3.27) follows from taking test functions $\Phi _\varepsilon (x,t) = \eta _\varepsilon (t)\Phi (t,x)$ in (3.13), where $\Phi \in C^1(Q,\mathbb {S}^n)$ and $\eta _\varepsilon \in C_c^\infty ((t_0,t_1),\mathbb {R})$ with $0 \le \eta _\varepsilon \le 1$ , $\lim _{\varepsilon \to 0}\eta _\varepsilon (t) = \chi _{(t_0,t_1)}(t)$ pointwisely and $\lim _{\varepsilon \to 0}\eta^{\prime}_\varepsilon = \delta _{t_0} - \delta _{t_1}$ in the distributional sense. Recalling $Tr \mathsf {G}_t (\Omega ) = m(t)$ a.e., by the weak^* continuity of $\widetilde {\mathsf {G}}_t$ , we have $Tr \widetilde {\mathsf {G}}_t = m(t)$ . Then, the estimate (3.28) follows from (3.33).

3.5. Time and space scaling

By writing $\mathcal {J}_{\Lambda, Q}(\mu ) = \int _0^1\mathcal {J}_{\Lambda, \Omega }(\mu _t)\, \mathrm {d} t$ for $\mu \in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ , the following Lemma is a simple consequence of the change of variable.

Lemma 3.15. Let $\mu \in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ . It holds that

1. Let $\mathsf {s}(t)\,:\,[0,1] \to [a,b]$ be a strictly increasing absolutely continuous map with an absolutely continuous inverse: $\mathsf {t} = \mathsf {s}^{-1}$ . Then $\widetilde {\mu } \,:\!=\, \int _a^b \delta _s \otimes (\mathsf {G}_{\mathsf {t}(s)}, \mathsf {t}^{\prime}(s) \mathsf {q}_{\mathsf {t}(s)}, \mathsf {t}^{\prime}(s) \mathsf {R}_{\mathsf {t}(s)})\, \mathrm {d} s \in \mathcal {CE}([a,b];\, \mathsf {G}_0,\mathsf {G}_1)$ . Moreover, we have
(3.35) \begin{align} \int _0^1 \mathsf {t}^{\prime}(\mathsf {s}(t)) \mathcal {J}_{\Lambda, \Omega }(\mu _t)\, \mathrm {d} t = \int _a^b \mathcal {J}_{\Lambda, \Omega }(\widetilde {\mu }_s) \,\mathrm {d} s\,. \end{align}
2. Let $T$ be a diffeomorphism on $\mathbb {R}^d$ mapping from $\Omega$ to $T(\Omega )$ and suppose that there exists $\mathcal {T}_{\mathsf{D}^{*}}(x)\,:\, \Omega \to \mathcal {L}(\mathbb {R}^{n \times k})$ such that for $\Phi \in C_c^\infty (\mathbb {R}^d, \mathbb {S}^n)$ ,
(3.36) \begin{align} \mathcal {T}_{\mathsf{D}^{*}}[(\mathsf{D}^{*} \Phi )\circ T] \,:\!=\, \mathsf{D}^{*} (\Phi \circ T)\,. \end{align}
Then $\widetilde {\mu } \,:\!=\, \int _0^1 \delta _t \otimes T_{\#} (\mathsf {G}_{t}, \mathcal {T}_{\mathsf {D}} \mathsf {q}_{t}, \mathsf {R}_{t})\, \mathrm {d} t \in \mathcal {CE}([0,1];\, T_{\#}\mathsf {G}_0, T_{\#}\mathsf {G}_1)$ on $T(\Omega )$ , where $T_{\#}(\cdot )$ denotes the pushforward measure by $T$ , and $\mathcal {T}_{\mathsf {D}}$ is the transpose of $\mathcal {T}_{\mathsf {D}^*}$ defined via $ (\mathcal {T}_{\mathsf {D}} q) \cdot p = q \cdot (\mathcal {T}_{\mathsf {D}^*} p)\,, \ \forall p,q \in \mathbb {R}^{n \times k}$ .

Remark 3.16. The condition (3.36) is nontrivial and necessary for the second statement. Indeed, there holds

\begin{align*} \mathsf{D}^{*} (\Phi \circ T) = \int _{\mathbb {R}^d} \widehat {\mathsf {D}^*}(\xi \cdot \nabla T(x))\big [\widehat {\Phi }(\xi )\big ]e^{\mathrm {i} \xi \cdot T(x)}\, \mathrm {d} \xi, \end{align*}

by Fourier transform, where $(\xi \cdot \nabla T(x))_j = \xi \cdot \partial _j T(x)$ . It follows that (3.36) is equivalent to a separation of variables: $\widehat {\mathsf {D}^*}(\xi \cdot \nabla T(x)) = \mathcal {T}_{\mathsf{D}^{*}}(x) \circ \widehat {\mathsf {D}^*}(\xi )$ . A sufficient condition for (3.36) is that $\widehat {\mathsf {D}^*}$ is homogeneous of degree $0$ , or homogeneous of degree $1$ with $T(x) = a x + b$ for $a \neq 0 \in \mathbb {R}$ and $b \in \mathbb {R}^d$ , which is enough for our purposes.

Remark 3.17. We connect the weight matrix $\Lambda _1$ and the space scaling. Let us consider $\mu \in \mathcal {CE}_\infty ([0,1]; \mathsf {G}_0, \mathsf {G}_1)$ and $\mathsf {D}^*$ be homogeneous of degree one for simplicity. Define $T(x) = a x\,:\, \Omega \to a \Omega$ and $\mathcal {T}_{\mathsf {D}} = a I$ . By Lemma 3.15, we have $\widetilde {\mu } \,:\!=\, \int _0^1 \delta _t \otimes T_{\#} (\mathsf {G}_{t}, a \mathsf {q}_{t}, \mathsf {R}_{t}) \, \mathrm {d} t \in \mathcal {CE}_\infty ([0,1];\, T_{\#}\mathsf {G}_0, T_{\#}\mathsf {G}_1)$ . Then, a direct computation gives

\begin{align*} \mathcal {J}_{\Lambda, [0,1] \times a \Omega }(\widetilde {\mu }) = \int _0^1 \mathcal {J}_{(a^{-1} \Lambda _1,\Lambda _2), a \Omega }(T_{\#}(\mathsf {G}_t,\mathsf {q}_t,\mathsf {R}_t)) \,\mathrm {d} t = \int _0^1 \mathcal {J}_{(a^{-1} \Lambda _1,\Lambda _2), \Omega }(\mu _t) \,\mathrm {d} t = \mathcal {J}_{(a^{-1} \Lambda _1,\Lambda _2), Q}(\mu ). \end{align*}

Using Lemma3.15 with $\mathsf {s}(t) = (b - a) t + a \,:\, [0,1] \to [a,b]$ , $b \gt a \gt 0$ , we see that for $\mu \in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ , there exists $\widetilde {\mu } \in \mathcal {CE}_\infty ([a,b];\mathsf {G}_0,\mathsf {G}_1)$ such that

\begin{align*} \int _0^1 \mathcal {J}_{\Lambda, \Omega }(\mu _t)\, \mathrm {d} t = (b - a) \int _a^b \mathcal {J}_{\Lambda, \Omega }(\widetilde {\mu }_t)\, \mathrm {d} t\,, \end{align*}

and vice versa, which gives the equivalent characterisation of $\mathrm {WB}_{\Lambda }$ :

(𝒫^′)

\begin{align} \mathrm {WB}^2_\Lambda (\mathsf {G}_0,\mathsf {G}_1) = \inf _{\mathcal {CE}_\infty ([a,b];\mathsf {G}_0,\mathsf {G}_1)} (b - a) \int _a^b \mathcal {J}_{\Lambda, \Omega }(\mu _t) \,\mathrm {d} t\,, \quad \mathsf {G}_0,\mathsf {G}_1 \in \mathcal {M}(\Omega, \mathbb {S}_+^n)\,. \end{align}

3.6. Compactness

We end the discussion of basic properties of $\mathcal {CE}_\infty ([0,1];\, \mathsf {G}_0, \mathsf {G}_1)$ with a compactness result.

Proposition 3.18. Let $\mu ^n = (\mathsf {G}^n,\mathsf {q}^n,\mathsf {R}^n) \in \mathcal {CE}_\infty ([0,1];\, \mathsf {G}^n_0, \mathsf {G}^n_1)$ , $n \ge 1$ , be a sequence of measures satisfying

(3.37)

\begin{equation} m\,:\!=\, \sup _{n \in \mathbb {N}} Tr (\mathsf {G}_0^n) \lt +\infty \,, \quad M\,:\!=\, \sup _{n \in \mathbb {N}} \mathcal {J}_{\Lambda, Q}(\mu ^n) \lt +\infty \,. \end{equation}

Then there exists a subsequence, still denoted by $\mu ^n$ , and a measure $\mu = (\mathsf {G},\mathsf {q},\mathsf {R}) \in \mathcal {CE}_\infty ([0,1]; \mathsf {G}_0, \mathsf {G}_1)$ such that for every $t\in [0,1]$ , $\mathsf {G}^n_t$ weak^* converges to $\mathsf {G}_t$ in $\mathcal {M}(\Omega, \mathbb {S}^{n})$ , and $ (\mathsf{q}^{\mathsf{n}}, \mathsf{R}^{\mathsf{n}})$ weak^* converges to $\mathsf {(q,R)}$ in $\mathcal {M}(Q,\mathbb {R}^{n \times k} \times \mathbb {M}^n)$ . Moreover, it holds that, for $0\le a \lt b \le 1$ ,

(3.38)

\begin{align} \mathcal {J}_{\Lambda, Q_a^b}(\mu ) \le \liminf _{n \to \infty } \mathcal {J}_{\Lambda, Q_a^b}(\mu ^n)\,. \end{align}

Proof. By (3.37), up to a subsequence, we can let $\mathsf {G}^n_0$ weak^* converge to some $\mathsf {G}_0 \in \mathcal {M}(\Omega, \mathbb {S}^n_+)$ . It is also clear from a priori estimates (3.25) and (3.28), as well as the assumption (3.37), that $\{\mu ^n\}_{n \in \mathbb {N}}$ is bounded in $\mathcal {M}(Q,\mathbb {X})$ . Hence, there exists a subsequence of $\{\mu ^n\}_{n \in \mathbb {N}}$ , still indexed by $n$ , weak^* converging to some $\mu \in \mathcal {M}(Q,\mathbb {X})$ . We next prove that the restriction of $\mu ^n$ on $Q_a^b$ , i.e., $\mu ^n|_{Q_a^b}$ , weak^* converges to $\mu |_{Q_a^b}$ in $\mathcal {M}(Q_a^b,\mathbb {X})$ for any $ 0 \le a \le b \le 1$ . For this, again by (3.25) and (3.28), we have, for some $C \gt 0$ ,

(3.39)

\begin{equation} |\mu ^n|([t_0,t_1] \times \Omega ) \le C|t_1 - t_0|^{1/2}\,, \quad \forall 0 \le t_0 \le t_1 \le 1\,, \end{equation}

which also holds for $\mu$ . Let $\eta (t)$ be a smooth function, compactly supported in $[a,b]$ , with $|\eta (t)| \le 1$ and $\eta = 1$ on $[a+\varepsilon, b - \varepsilon ]$ for some small $\varepsilon$ . Then, for any $\Xi \in C(Q_a^b,\mathbb {X})$ , we define $\widetilde {\Xi }(t,x) = \eta (t) \Xi (t,x) \in C(Q,\mathbb {X})$ . The following estimate readily follows from the properties of $\eta$ and the estimate (3.39):

\begin{equation*} \big |\langle \mu ^n, \Xi \rangle _{Q_a^b} - \langle \mu, \Xi \rangle _{Q_a^b}\big | \le \big | \big \langle \mu ^n, \widetilde {\Xi }\big \rangle _{Q} - \big \langle \mu, \widetilde {\Xi }\big \rangle _{Q} \big | + C \varepsilon ^{1/2}\,. \end{equation*}

Since $\mu ^n$ weak^* converges to $\mu$ in $\mathcal {M}(Q,\mathbb {X})$ and $\varepsilon$ is arbitrary, we have $\big |\langle \mu ^n, \Xi \rangle _{Q_a^b} - \langle \mu, \Xi \rangle _{Q_a^b}\big | \to 0$ as $n \to \infty$ for $\Xi \in C(Q_a^b,\mathbb {X})$ . Then, (3.38) follows from the lower semicontinuity of $\mathcal {J}_{\Lambda, Q_a^b}(\mu )$ . We now show the weak^* convergence of $\mathsf {G}^n_t$ for every $t\in [0,1]$ . We note, by taking $\Phi (s,x) = \chi _{[0,t]}(s) \Psi (x)$ in (3.27) with $\Psi (x)\in C^1(\Omega, \mathbb {S}^n)$ ,

\begin{equation*} \int _0^t \Big ( \int _\Omega \mathsf {D}^* \Psi \cdot \mathrm {d} \mathsf {q}_s^n + \int _\Omega \Psi \cdot \mathrm {d} \mathsf {R}_s^n \Big ) \mathrm {d} s = \int _{\Omega } \Psi \cdot \mathrm {d} \mathsf {G}^n_{t} - \int _{\Omega } \Psi \cdot \mathrm {d} \mathsf {G}^n_{0}\,, \quad \forall \Psi \in C^1(\Omega, \mathbb {S}^n)\,. \end{equation*}

Then, using the weak^* convergences of $\mathsf {G}^n_0$ in $\mathcal {M}(\Omega, \mathbb {S}^n)$ and $(\mathsf {q}^n,\mathsf {R}^n)|_{Q_0^t}$ in $\mathcal {M}(Q_0^t,\mathbb {R}^{n \times k}\times \mathbb {M}^n)$ , we get the convergence of $ \langle \mathsf {G}^n_{t}, \Psi \rangle _{\Omega }$ as $n \to \infty$ . The proof is completed by the density of $C^1(\Omega, \mathbb {S}^n)$ in $C(\Omega, \mathbb {S}^n)$ and the uniform boundedness of $Tr \mathsf {G}^n_t (\Omega )$ with respect to $n$ from (3.28).

4. Properties of weighted Wasserstein–Bures metrics

This section is devoted to the investigation of the convex optimisation problem (𝒫). We shall first show the existence of the minimiser and derive the corresponding optimality condition. We then explore its primal-dual formulations in more detail, which will lead to a Riemannian interpretation of $\mathrm {WB}_{\Lambda }$ in Section 5. Finally, we consider the dependence of $\mathrm {WB}_{\Lambda }$ on the weight matrix $\Lambda$ .

4.1. Existence of minimiser and optimality condition

For our purpose, let us first define the Lagrangian of (𝒫) with the multiplier $\Phi \in C^1(Q, \mathbb {S}^n)$ :

\begin{align*} \mathcal {L}(\mu, \Phi ) \,:\!=\, \mathcal {J}_{\Lambda, Q}(\mu ) - \langle \mu, (\partial _t \Phi, \mathsf {D}^* \Phi, \Phi ) \rangle _Q + \langle \mathsf {G}_1, \Phi _1 \rangle _\Omega - \langle \mathsf {G}_0, \Phi _0 \rangle _\Omega \,, \end{align*}

which allows us to write

\begin{equation*} \mathrm {WB}^2_\Lambda (\mathsf {G}_0,\mathsf {G}_1) = \inf _{\mu \in \mathcal {M}(Q,\mathbb {X})} \sup _{\Phi \in C^1(Q,\mathbb {S}^n)} \mathcal {L}(\mu, \Phi )\,. \end{equation*}

By changing the order of $\sup$ and $\inf$ , a formal calculation via integration by parts gives the dual problem:

(4.1)

\begin{align} \mathrm {WB}_{\Lambda }^2(\mathsf {G}_0,\mathsf {G}_1) & \ge \sup _\Phi \inf _\mu \mathcal {L}(\mu, \Phi ) \notag \\ & = \sup _{\Phi } \Big \{ \langle \mathsf {G}_1, \Phi _1 \rangle _\Omega - \langle \mathsf {G}_0, \Phi _0 \rangle _\Omega \,; \ \partial _t \Phi + \frac {1}{2} (\mathsf {D}^* \Phi ) \Lambda _1^2 (\mathsf {D}^* \Phi )^{\mathrm {T}} + \frac {1}{2} \Phi \Lambda _2^2 \Phi \preceq 0 \Big \}\,. \end{align}

We next use the Fenchel-Rockafellar theorem (Lemma2.5) to show that the duality gap is zero, which will also give the existence of the minimiser to (𝒫) and the optimality conditions. For this, we define

(4.2)

\begin{equation} C(Q,\mathcal {O}_\Lambda ) \,:\!=\, \{\varphi \in C(Q,\mathbb {X})\,;\ \varphi (x) \in \mathcal {O}_\Lambda \,, \ \forall x \in Q\}\,, \end{equation}

with $\mathcal {O}_\Lambda$ given in (3.1), which is a closed convex subset of $C(Q,\mathbb {X})$ . We then define lower semicontinuous convex functions: $f(\Phi ) = \langle \mathsf {G}_1, \Phi _1 \rangle _\Omega - \langle \mathsf {G}_0, \Phi _0 \rangle _\Omega$ for $\Phi \in C^1(Q,\mathbb {S}^n)$ and $g(\Xi ) = \iota _{C(Q,\mathcal {O}_\Lambda )}(\Xi )$ for $\Xi \in C(Q,\mathbb {X})$ . We also introduce the bounded linear operator: $L\,:\, \Phi \in C^1(Q, \mathbb {S}^n) \to (\partial _t \Phi, \mathsf {D}^* \Phi, \Phi ) \in C(Q,\mathbb {X})$ with the dual operator $L^*$ . These notions help us to write (4.1) as $ \sup \{f(\Phi ) - g(L \Phi )\,;\ \Phi \in C^1(Q,\mathbb {S}^n)\}\,.$

We now verify the condition in Lemma2.5. We consider $\Phi = -\varepsilon t I + \frac {\varepsilon }{2} I \in C^1(Q,\mathbb {S}^n)$ . It is clear that $f(\Phi )$ is finite and $L \Phi = ({-} \varepsilon I, 0, -\varepsilon t I + \frac {\varepsilon }{2} I)$ by $\mathsf {D}^*(I) = 0$ . By a simple calculation, we have

\begin{align*} \partial _t \Phi + \frac {1}{2} (\mathsf {D}^*\Phi ) \Lambda _1^2 (\mathsf {D}^* \Phi )^{\mathrm {T}} + \frac {1}{2} \Phi \Lambda _2^2 \Phi &= - \varepsilon I+ \frac {1}{2}\varepsilon ^2 \Big({-} t + \frac {1}{2}\Big)^2 \Lambda _2^2 \preceq - \varepsilon I+ \frac {1}{8} \varepsilon ^2 \Lambda _2^2\,, \end{align*}

which implies that for small enough $\varepsilon$ and any $(t,x)\in Q$ , $(L \Phi )(t,x)$ is in the interior of $\mathcal {O}_\Lambda$ and hence $g$ is continuous at $L \Phi$ . Then Lemma2.5 readily gives

(4.3)

\begin{equation} \min _{\mu \in \mathcal {M}(Q,\mathbb {X})}f^*(L^*\mu ) + g^*(\mu ) = \sup _{\Phi \in C^1(Q,\mathbb {S}^n)} f(\Phi ) - g(L \Phi )\,, \end{equation}

where $f^*(L^*\mu ) = \sup \{\langle \mu, L\Phi \rangle _Q - f(\Phi )\,;\ \Phi \in C^1(Q,\mathbb {S}^n)\}$ can be easily computed as $\iota _{\mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)}$ by linearity of $f$ , while $g^*(\mu )$ is nothing else than $\mathcal {J}_{\Lambda, Q}(\mu )$ by the following lemma, which is a direct application of general results [Reference Bouchitté and Valadier13, Reference Rockafellar83]. We sketch the proof in Appendix A for completeness.

Lemma 4.1. Let $\mathcal {X}$ be a compact separable metric space and $C(\mathcal {X},\mathcal {O}_\Lambda )$ be defined in (4.2). Then, we have

(4.4)

\begin{equation} \iota ^*_{C(\mathcal {X},\mathcal {O}_\Lambda )} = \sup _{\Xi \in L_{|\mu |}^\infty (\mathcal {X},\mathcal {O}_\Lambda )}\langle \mu, \Xi \rangle _{\mathcal {X}} = \mathcal {J}_{\Lambda, \mathcal {X}} (\mu )\,,\quad \text {for}\ \mu \in \mathcal {M}(\mathcal {X},\mathbb {X})\,, \end{equation}

which is proper convex and lower semicontinuous with respect to the weak^* topology of $\mathcal {M}(\mathcal {X},\mathbb {X})$ . Moreover, the subgradient $\partial \mathcal {J}_{\Lambda, \mathcal {X}}(\mu )$ in $C(\mathcal {X},\mathbb {X})$ is given as follows:

(4.5)

\begin{equation} \partial \mathcal {J}_{\Lambda, \mathcal {X}}(\mu )|_{C(\mathcal {X},\mathbb {X})} = \left \{\Xi \in C(\mathcal {X}, \mathcal {O}_\Lambda )\,; \ \Xi (x) \in \partial J_\Lambda (\mu _\lambda )(x)\,, \ \lambda \text{-a.e.}\right \}\,, \end{equation}

which is independent of the choice of the reference measure $\lambda$ such that $|\mu | \ll \lambda$ .

By the above arguments, we have shown the following result.

Theorem 4.2. The optimisation problem ( 𝒫 ) always admits a minimiser $\mu \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ and a dual formulation with zero duality gap:

(4.6)

\begin{equation} \mathrm {WB}^2_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) = \sup _{\Phi \in C^1(Q,\mathbb {S}^n)} \left \{ \langle \mathsf {G}_1, \Phi _1 \rangle _{\Omega } - \langle \mathsf {G}_0, \Phi _0 \rangle _\Omega - \iota _{C(Q,\mathcal {O}_\Lambda )}(\partial _t \Phi, \mathsf {D}^* \Phi, \Phi ) \right \}\,, \end{equation}

where the $\sup$ is attained at $\Phi \in C^1(Q,\mathbb {S}^n)$ if and only if there exists $\mu = \mathsf {(G,q,R)} \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ such that

(4.7)

\begin{align} q_\lambda = G_\lambda (\mathsf {D}^* \Phi ) \Lambda _1^2 \,, \quad R_\lambda = G_\lambda \Phi \Lambda _2^2\,, \end{align}

and

(4.8)

\begin{equation} G_\lambda \cdot \Big (\partial _t \Phi + \frac {1}{2} (\mathsf {D}^* \Phi ) \Lambda _1^2 (\mathsf {D}^* \Phi )^{\mathrm {T}} + \frac {1}{2} \Phi \Lambda _2^2 \Phi \Big ) = 0 \,, \end{equation}

for $\lambda$ -a.e. $(t,x) \in Q$ . In this case, $\mu$ is also the minimiser to the problem ( $\mathcal {P}$ ).

As a consequence of Lemma4.1 and the dual formulation (4.6), we have the sublinearity and the weak^* lower semicontinuity of $\mathrm {WB}^2_{\Lambda }(\cdot, \cdot )$ .

Corollary 4.3. $\mathrm {WB}^2_{\Lambda }(\cdot, \cdot )$ is sublinear: for $\alpha \gt 0$ , $\mathsf {G}_0,\mathsf {G}_1,\widetilde {\mathsf {G}}_0,\widetilde {\mathsf {G}}_1 \in \mathcal {M}(\Omega, \mathbb {S}^n_+)$ , there holds

(4.9)

\begin{align} \mathrm {WB}^2_{\Lambda }\big (\alpha \mathsf {G}_0, \alpha \mathsf {G}_1\big ) = \alpha \mathrm {WB}^2_{\Lambda }\big (\mathsf {G}_0, \mathsf {G}_1\big )\,,\quad \mathrm {WB}^2_{\Lambda }\big (\mathsf {G}_0 +\widetilde {\mathsf {G}}_0, \mathsf {G}_1 + \widetilde {\mathsf {G}}_1\big ) \le \mathrm {WB}^2_{\Lambda }\big (\mathsf {G}_0, \mathsf {G}_1\big ) + \mathrm {WB}^2_{\Lambda }\big (\widetilde {\mathsf {G}}_0, \widetilde {\mathsf {G}}_1\big )\,. \end{align}

Moreover, $\mathrm {WB}_{\Lambda }$ is lower semicontinuous with respect to the weak^* topology, that is, for any sequences $\{\mathsf {G}^n_0\}_{n \in \mathbb {N}}$ and $\{\mathsf {G}^n_1\}_{n \in \mathbb {N}}$ in $\mathcal {M}(\Omega, \mathbb {S}_+^n)$ that weak^* converge to measures $\mathsf {G}_0, \mathsf {G}_1 \in \mathcal {M}(\Omega, \mathbb {S}_+^n)$ , respectively, there holds

(4.10)

\begin{align} \mathrm {WB}_{\Lambda } (\mathsf {G}_0,\mathsf {G}_1) \le \liminf _{n \to 0} \mathrm {WB}_{\Lambda } (\mathsf {G}^n_0,\mathsf {G}^n_1)\,. \end{align}

Proof. Noting that $\mathcal {J}_{\Lambda, Q}(\mu )$ is positively homogeneous and convex, and hence sublinear, the sublinearity of $\mathrm {WB}^2_{\Lambda }(\cdot, \cdot )$ follows from definition ( $\mathcal {P}$ ) and the linearity of the continuity equation. For the weak^* lower semicontinuity, by (4.6), for any $\Phi \in C^1(Q,\mathbb {S}^n)$ with $\iota _{C(Q,\mathcal {O}_\Lambda )}(\partial _t \Phi, \mathsf {D}^* \Phi, \Phi ) = 0$ , there holds

(4.11)

\begin{align} \liminf _{n \to \infty } \mathrm {WB}^2_{\Lambda }(\mathsf {G}^n_0,\mathsf {G}^n_1) \ge \liminf _{n \to \infty } \langle \mathsf {G}^n_1, \Phi _1 \rangle _{\Omega } - \langle \mathsf {G}^n_0, \Phi _0 \rangle _\Omega = \langle \mathsf {G}_1, \Phi _1 \rangle _{\Omega } - \langle \mathsf {G}_0, \Phi _0 \rangle _\Omega \,, \end{align}

by the weak^* convergence of $\mathsf {G}_0^n$ and $\mathsf {G}_1^n$ . Then (4.10) follows by taking the $\sup$ of (4.11) over admissible $\Phi$ .

In addition, we have the following explicit characterisation of the minimiser (i.e., geodesic; see Corollary5.7) to ( $\mathcal {P}$ ) for inflating measures from optimality conditions (4.7) and (4.8), which extends [Reference Brenier and Vorotnikov16, Theorem 5] with a much simpler argument. For $\mathsf {G} \in \mathcal {M}(\Omega, \mathbb {S}^n_+)$ and $A \in \mathbb {S}_+^n$ , we denote by $\mathsf {G}^A$ the inflating measure $A \mathsf {G} A \in \mathcal {M}(\Omega, \mathbb {S}^n_+)$ .

Proposition 4.4. For $\mathsf {G} \in \mathcal {M}(\Omega, \mathbb {S}^n_+)$ and matrices $A_0, A_1 \in \mathbb {S}_+^n$ , we have

(4.12)

\begin{equation} \mathrm {WB}_\Lambda ^2 \big (\mathsf {G}^{A_0},\mathsf {G}^{A_1}\big ) = 2 Tr \big (\Lambda _2^{-1}(A_1 - A_0)\mathsf {G}(\Omega )(A_1 - A_0)\Lambda _2^{-1} \big )\,, \end{equation}

with the minimiser $(\mathsf {G}_*,\mathsf {q}_*,\mathsf {R}_*) \,:\!=\, (\mathsf {G}^{A_t}, 0, 2 A_t \mathsf {G} (A_1 - A_0)) \in \mathcal {M}(Q, \mathbb {X})$ , where $A_t \,:\!=\, tA_1 + (1- t)A_0$ for $t \in [0,1]$ .

Proof. Let us first assume that $A_0$ and $A_1$ are invertible. By a direct calculation, we have

\begin{equation*} \partial _t \mathsf {G}^{\mathsf {A}_t} = (A_1 - A_0) \mathsf {G} A_t + A_t \mathsf {G} (A_1 - A_0) \,. \end{equation*}

We define $\Phi = 2 A_t^{-1} (A_1- A_0)\Lambda _2^{-2}$ and find $\mathsf {R}_* = \mathsf {G}^{A_t} \Phi \Lambda _2^2$ . It is also easy to see that $(\mathsf {G}_*,\mathsf {q}_*,\mathsf {R}_*)$ defined above is in the set $\mathcal {CE}\big ([0,1]; \mathsf {G}^{A_0},\mathsf {G}^{A_1}\big )$ . Moreover, recalling $ ((A + \varepsilon H)^{-1} - A^{-1})/\varepsilon \to - A^{-1}H A^{-1}$ as $\varepsilon \to 0$ for invertible $A$ and $H \in \mathbb {M}^n$ [Reference Bhatia9], we have

\begin{equation*} \partial _t \Phi = - 2 A_t^{-1} (A_1 - A_0)A_t^{-1}(A_1- A_0) \Lambda _2^{-2} = - \Phi \Lambda _2^2 \Phi /2\,. \end{equation*}

By the above computations, we have verified the optimality conditions (4.7) and (4.8), which means that the measure $(\mathsf {G}_*,\mathsf {q}_*,\mathsf {R}_*)$ is the desired minimiser. Then, we can further compute

\begin{equation*} \mathrm {WB}_\Lambda ^2\big (\mathsf {G}^{A_0}, \mathsf {G}^{A_1}\big ) = \frac {1}{2} \int _0^1 \int _\Omega (\Phi \Lambda _2) \cdot \mathrm {d} \mathsf {G}^{A_t} (\Phi \Lambda _2)\, \mathrm {d} t = 2 ((A_1 - A_0)\Lambda _2^{-1})\cdot \mathsf {G}(\Omega )(A_1 - A_0)\Lambda _2^{-1}\,. \end{equation*}

For general $A_0,A_1 \in \mathbb {S}_+^n$ , we first see that $\mu _* \,:\!=\, (\mathsf {G}^{A_t}, 0, 2 A_t \mathsf {G} (A_1 - A_0))$ as above still satisfies the continuity equation and its associated action functional $\mathcal {J}_{\Lambda, Q}(\mu _*)$ gives the right-hand side of (4.12) by $\mathrm {Ran}(A_1 - A_0) \subset \mathrm {Ran}(A_t)$ , which also means $ \mathrm {WB}_\Lambda ^2(\mathsf {G}^{A_0}, \mathsf {G}^{A_1}) \le \mathcal {J}_{\Lambda, Q}(\mu _*)$ . To finish the proof, it suffices to show that the equality holds. For this, we consider $A_i^\varepsilon = A_i + \varepsilon I \in \mathbb {S}_{++}^n$ for $i = 0,1$ . Then, by triangle inequality of $\mathrm {WB}_\Lambda$ (see Proposition5.2 below) and Lemma3.9, we have $\mathrm {WB}_\Lambda (\mathsf {G}^{A^\varepsilon _0}, \mathsf {G}^{A^\varepsilon _1}) \to \mathrm {WB}_\Lambda (\mathsf {G}^{A_0}, \mathsf {G}^{A_1})$ as $\varepsilon \to 0$ . The proof is completed by

\begin{align*} \mathrm {WB}^2_\Lambda \big (\mathsf {G}^{A^\varepsilon _0}, \mathsf {G}^{A^\varepsilon _1}\big ) = & 2 Tr \big (\Lambda _2^{-1}(A^\varepsilon _1 - A^\varepsilon _0)\mathsf {G}(\Omega )(A^\varepsilon _1 - A^\varepsilon _0)\Lambda _2^{-1} \big )\\ & \to 2 Tr \big (\Lambda _2^{-1}(A_1 - A_0)\mathsf {G}(\Omega )(A_1 - A_0)\Lambda _2^{-1} \big ) = \mathcal {J}_{\Lambda, Q}(\mu _*) \,, \quad \varepsilon \to 0\,. \end{align*}

4.2. Primal-dual formulations

We proceed to study in more depth the optimality conditions by viewing $\mathsf {G}$ as the main variable and $\mathsf {(q,R)}$ as the control variable, which will be useful in Section 5. We first observe

(4.13)

\begin{align} \mathrm {WB}^2_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) & = \inf _{\mathsf {G}} \inf _{\mathsf {q},\mathsf {R}} \left \{ \mathcal {J}_{\Lambda, Q}(\mu ) \,;\ \mu = (\mathsf {G}, \mathsf {q}, \mathsf {R}) \in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1) \right \}\,, \end{align}

by taking the inf in ( $\mathcal {P}$ ) over $\mathsf {G}$ and $(\mathsf {q},\mathsf {R})$ separately. Recall the formulation (3.24) of $\mathcal {J}_{\Lambda, Q}(\mu )$ , which motivates us to introduce a weighted semi-inner product:

(4.14)

\begin{align} \big \langle (u,W), (u^{\prime},W^{\prime}) \big \rangle _{L^2_{\mathsf {G},\Lambda }(Q)} \,:\!=\, \big \langle u \Lambda _1^{\dagger }, u^{\prime} \Lambda _1^{\dagger } \big \rangle _{L^2_{\mathsf {G}}(Q)} + \big \langle W \Lambda _2^{-1}, W^{\prime} \Lambda _2^{-1} \big \rangle _{L^2_{\mathsf {G}}(Q)} \,, \end{align}

and the associated seminorm $\lVert \cdot \rVert _{L^2_{\mathsf {G},\Lambda }(Q)}$ on the space of measurable functions valued in $\mathbb {R}^{n \times k} \times \mathbb {M}^n$ . The corresponding Hilbert space, denoted by $L^2_{\mathsf {G},\Lambda }(Q,\mathbb {R}^{n \times k} \times \mathbb {M}^n)$ , is defined as the quotient space by the subspace $\mathrm {Ker}\big (\lVert \cdot \rVert _{L^2_{\mathsf {G},\Lambda }(Q)}\big )$ . Hence, we can rewrite (3.24) as $\mathcal {J}_{\Lambda, Q}(\mu ) = \lVert (G^\dagger q,G^\dagger R) \rVert ^2_{L^2_{\mathsf {G},\Lambda }(Q)}/2$ . Moreover, we define the set

(4.15)

\begin{equation} \mathcal {AC}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)\,:\!=\, \{\mathsf {G} \in \mathcal {M}(Q,\mathbb {S}^n)\,; \ \exists \mathsf {(q,R)} \in \mathcal {M}(Q,\mathbb {R}^{n \times k} \times \mathbb {M}^n) \ \text {s.t.}\ \mathsf {(G,q,R)} \in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)\}\,, \end{equation}

and the associated energy functional: for $\mathsf {G} \in \mathcal {AC}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ ,

(4.16)

\begin{align} \mathcal {J}^{\Lambda }_{\mathsf {G}_0,\mathsf {G}_1}(\mathsf {G}) \,:\!=\, \inf _{\mathsf {(q,R)}} \Big \{\frac {1}{2} \lVert (G^\dagger q, G^\dagger R ) \rVert ^2_{L^2_{\mathsf {G},\Lambda }(Q)}\,;\ \mathsf {(G,q,R)} \in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1) \Big \}\,. \end{align}

We will see in Remark5.6 that $\mathcal {AC}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ is closely related to the set of absolutely continuous curves in the metric space $(\mathcal {M}(\Omega, \mathbb {S}^n_+), \mathrm {WB}_{\Lambda })$ . With the help of these notions, (4.13) can be reformulated in a compact form:

(4.17)

\begin{equation} \mathrm {WB}^2_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) = \inf _{\mathsf {G}\in \mathcal {AC}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)} \mathcal {J}^{\Lambda }_{\mathsf {G}_0,\mathsf {G}_1}(\mathsf {G})\,. \end{equation}

Similarly to (3.24), by Lemma3.3, we also note that for $\mathsf {(G,q,R)} \in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ , the weak formulation (3.13) can be written as

(4.18)

\begin{equation} \big \langle \big (\mathsf {D}^* \Phi \Lambda ^2_1, \Phi \Lambda _2^2 \big ), \big (G^\dagger q, G^\dagger R\big ) \big \rangle _{L^2_{\mathsf {G}, \Lambda }(Q)} = l_{\mathsf {G}}(\Phi )\,, \quad \forall \Phi \in C^1(Q,\mathbb {S}^n)\,, \end{equation}

where $l_{\mathsf {G}}(\cdot )$ for $\mathsf {G} \in \mathcal {AC}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ is a linear functional on $C^1(Q,\mathbb {S}^n)$ defined by

(4.19)

\begin{equation} l_{\mathsf {G}}(\Phi ) = \langle \mathsf {G}_1, \Phi _1 \rangle _{\Omega } - \langle \mathsf {G}_0, \Phi _0 \rangle _{\Omega } - \langle \mathsf {G}, \partial _t\Phi \rangle _Q \,. \end{equation}

Define an injective map $\Pi \,:\, \Phi \to (\mathsf {D}^* \Phi \Lambda _1^2, \Phi \Lambda _2^2)$ for $\Phi \in C^1(Q,\mathbb {S}^n)$ and denote $\widetilde {l}_{\mathsf {G}}\,:\!=\, l_{\mathsf {G}} \circ \Pi ^{-1}$ on the image of $\Pi$ . In view of (4.18), the functional $\widetilde {l}_{\mathsf {G}}$ can be uniquely extended to the space

(4.20)

\begin{equation} H_{\mathsf {G},\Lambda }(\mathsf {D}^*)\,:\!=\, \overline {\left \{ \Pi (\Phi ) \,;\ \Phi \in C^1(Q,\mathbb {S}^n)\right \}}^{\lVert \cdot \rVert _{L^2_{\mathsf {G},\Lambda }(Q)}}\,, \end{equation}

with the norm estimate

(4.21)

\begin{equation} \lVert \widetilde {l}_{\mathsf {G}} \rVert _{H^*_{\mathsf {G},\Lambda }(\mathsf {D}^*)} \le \lVert (G^\dagger q, G^\dagger R) \rVert _{L^2_{\mathsf {G},\Lambda }(Q)}\,. \end{equation}

We emphasise that such an extension is independent of the choice of $\mathsf {(q,R)}$ that satisfies $\mathsf {(G,q,R)} \in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ .

Next, we show that (4.16) admits a unique minimiser $\mathsf {(q,R)}$ that satisfies the equality in (4.21). Note that $(u,W)$ and $(u \mathbb {P}_{\Lambda _1}, W \mathbb {P}_{\Lambda _2})$ are equivalent in $L^2_{\mathsf {G},\Lambda }(Q,\mathbb {R}^{n \times k} \times \mathbb {M}^n)$ , where $\mathbb {P}_{\Lambda _i}$ is the orthogonal projection to $\mathrm {Ran}(\Lambda _i)$ . Hence, for any $(u,W) \in L^2_{\mathsf {G},\Lambda }(Q,\mathbb {R}^{n \times k} \times \mathbb {M}^n)$ , we can assume $\mathrm {Ran}(u^{\mathrm {T}}) \subset \mathrm {Ran} (\Lambda _1)$ and $\mathrm {Ran}(W^{\mathrm {T}}) \subset \mathrm {Ran} (\Lambda _2)$ . Then, it holds that any $L^2_{\mathsf {G},\Lambda }$ -field $(u,W)$ satisfying $ \langle (\mathsf {D}^* \Phi \Lambda _1^2, \Phi \Lambda _2^2), (u,W)\rangle _{L^2_{\mathsf {G},\Lambda }(Q)} = l_{\mathsf {G}}(\Phi )$ , $\forall \Phi \in C^1(Q,\mathbb {S}^n)$ , induces a measure $\mathsf {(q,R)} \,:\!=\, (\mathsf {G} u,\mathsf {G}W)$ such that $\mathsf {(G,q,R)} \in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ . This observation implies that $\mathcal {J}^{\Lambda }_{\mathsf {G}_0,\mathsf {G}_1}(\mathsf {G})$ is actually a uniquely solvable minimum norm problem with an affine constraint:

(4.22)

\begin{align} \mathcal {J}^{\Lambda }_{\mathsf {G}_0,\mathsf {G}_1}(\mathsf {G}) &= \inf \Big \{\frac {1}{2} \lVert (u,W) \rVert ^2_{L^2_{\mathsf {G},\Lambda }(Q)}\,;\ (u,W) \in L^2_{\mathsf {G},\Lambda }(Q,\mathbb {R}^{n \times k} \times \mathbb {M}^n)\ \text {such that} \notag \\ \big \langle (\mathsf {D}^* \Phi \Lambda _1^2, \Phi \Lambda _2^2), (u,W) \big \rangle _{L^2_{\mathsf {G},\Lambda }(Q)} &= l_{\mathsf {G}}(\Phi )\,, \ \forall \Phi \in C^1(Q,\mathbb {S}^n) \Big \}\,. \end{align}

The unique minimiser $(u_*,W_*)$ to (4.22) is given by the orthogonal projection of $0$ on the constraint set, equivalently, the Riesz representation of the functional $\widetilde {l}_{\mathsf {G}}$ on the space $H_{\mathsf {G},\Lambda }(\mathsf {D}^*)$ . It then follows that $(\mathsf {q}_*,\mathsf {R}_*) \,:\!=\, (\mathsf {G} u_*,\mathsf {G}W_*)$ is the desired minimiser to (4.16) and there holds

(4.23)

\begin{align} \lVert \widetilde {l}_{\mathsf {G}} \rVert _{H^*_{\mathsf {G},\Lambda }(\mathsf {D}^*)} & = \lVert (u_*,W_*) \rVert _{L^2_{\mathsf {G},\Lambda }(Q)} = \lVert (G^\dagger q_*, G^\dagger R_*) \rVert _{L^2_{\mathsf {G},\Lambda }(Q)}\,. \end{align}

We summarise the above facts in the following useful result.

Theorem 4.5. $\mathrm {WB}^2_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1)$ has the following representation:

\begin{equation*} \mathrm {WB}^2_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) = \inf _{\mathsf {G}\in \mathcal {AC}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)} \mathcal {J}^{\Lambda }_{\mathsf {G}_0,\mathsf {G}_1}(\mathsf {G})\quad \text {with}\quad \mathcal {J}^{\Lambda }_{\mathsf {G}_0,\mathsf {G}_1}(\mathsf {G}) = \frac {1}{2}\lVert (u_*, W_*) \rVert ^2_{L^2_{\mathsf {G},\Lambda }(Q)}\,, \end{equation*}

where $(u_*, W_*)$ is the Riesz representation of $\widetilde {l}_{\mathsf {G}}$ in $H_{\mathsf {G},\Lambda }(\mathsf {D}^*)$ that uniquely solves the minimum norm problem (4.22).

Moreover, $\mathcal {J}^{\Lambda }_{\mathsf {G}_0,\mathsf {G}_1}(\mathsf {G})$ admits the following dual formulation:

(4.24)

\begin{equation} \mathcal {J}^{\Lambda }_{\mathsf {G}_0,\mathsf {G}_1}(\mathsf {G}) = \sup \Big \{l_{\mathsf {G}}(\Phi ) - \frac {1}{2}\lVert (\mathsf {D}^* \Phi \Lambda _1^2,\Phi \Lambda _2^2) \rVert ^2_{L^2_{\mathsf {G},\Lambda }(Q)}\,; \ \Phi \in C^1(Q,\mathbb {S}^n) \Big \}\,. \end{equation}

Proof. It suffices to derive the dual formulation (4.24) of $\mathcal {J}^{\Lambda }_{\mathsf {G}_0,\mathsf {G}_1}$ . For this, we first note

\begin{equation*} \frac {1}{2}\lVert (u,W) \rVert _{L^2_{\mathsf {G},\Lambda }(Q)}^2 = \sup _{(u^{\prime},W^{\prime}) \in L^2_{\mathsf {G},\Lambda }(Q,\mathbb {R}^{n \times k} \times \mathbb {M}^n)} \langle (u,W), (u^{\prime},W^{\prime})\rangle _{L^2_{\mathsf {G},\Lambda }(Q)} - \frac {1}{2}\lVert (u^{\prime},W^{\prime}) \rVert ^2_{L^2_{\mathsf {G},\Lambda }(Q)} \,, \end{equation*}

which further implies, by $(u_*, W_*) \in H_{\mathsf {G},\Lambda }(\mathsf {D}^*) \subset L^2_{\mathsf {G},\Lambda }(Q,\mathbb {R}^{n \times k} \times \mathbb {M}^n)$ , for any $\Phi \in C^1(Q,\mathbb {S}^n)$ ,

(4.25)

\begin{align} \mathcal {J}^{\Lambda }_{\mathsf {G}_0,\mathsf {G}_1}(\mathsf {G}) = \frac {1}{2}\lVert (u_*, W_*) \rVert _{L^2_{\mathsf {G},\Lambda }(Q)}^2 &\ge \langle (u_*, W_*), \big (\mathsf {D}^* \Phi \Lambda ^2_1, \Phi \Lambda _2^2 \big )\rangle _{L^2_{\mathsf {G},\Lambda }(Q)} - \frac {1}{2}\lVert ( \mathsf {D}^* \Phi \Lambda _1^2,\Phi \Lambda _2^2) \rVert ^2_{L^2_{\mathsf {G},\Lambda }(Q)}\notag \\ & = l_G(\Phi ) - \frac {1}{2}\lVert ( \mathsf {D}^* \Phi \Lambda _1^2,\Phi \Lambda _2^2) \rVert ^2_{L^2_{\mathsf {G},\Lambda }(Q)}\,. \end{align}

Then, recalling (4.20) and choosing a sequence $\{(\mathsf {D}^* \Phi _n \Lambda _1^2, \Phi _n \Lambda _2^2)\}$ with $\Phi _n \in C^1(Q,\mathbb {S}^n)$ in (4.25) that approximates $(u_*, W_*)$ gives the desired (4.24).

4.3. Varying weight matrices

We regard $\mathrm {WB}_{\Lambda }$ as a family of distances indexed by $\Lambda$ and investigate the behaviours of $\mathrm {WB}_{\Lambda }$ and its minimiser when $\Lambda$ varies, in particular, when $|\Lambda _1|$ or $|\Lambda _2|$ tends to zero or infinity. We give a partial answer to this question in the following proposition. For ease of exposition, we introduce

(4.26)

\begin{align} \mathcal {J}^q_{\Lambda _1}(\mu ) = \mathcal {J}_{\Lambda, Q}(\mathsf {(G,q,0)}) \,, \quad \mathcal {J}^R_{\Lambda _2}(\mu ) = \mathcal {J}_{\Lambda, Q}(\mathsf {(G,0,R)}) \quad \text {for}\ \mu \in \mathcal {M}(Q,\mathbb {X})\,. \end{align}

Proposition 4.6. Let $\mathsf {G}_0,\mathsf {G}_1 \in \mathcal {M}(\Omega, \mathbb {S}^n_+)$ and $\mu _{*,\Lambda }$ denote the minimiser to $\mathrm {WB}^2_\Lambda (\mathsf {G}_0,\mathsf {G}_1)$ ( $\mathcal {P}$ ). It holds that $\mathrm {WB}^2_{(\Lambda _1,\Lambda _2)}(\mathsf {G}_0,\mathsf {G}_1) \to \mathrm {WB}^2_{(0,\Lambda _2)}(\mathsf {G}_0,\mathsf {G}_1)$ as $\lVert \Lambda _1 \rVert _{\mathrm {F}} \to 0$ , and for any sequence $\{ \Lambda _{1,j}\}_{j \in \mathbb {N}} \subset \mathbb {S}^k_+$ with $\lVert \Lambda _{1,j} \rVert _{\mathrm {F}} \to 0$ , the associated minimiser $\mu _{*,(\Lambda _{1,j},\Lambda _2)}$ , up to a subsequence, weak^* converges to a minimiser $\mu _*$ to $\mathrm {WB}^2_{(0,\Lambda _2)}(\mathsf {G}_0,\mathsf {G}_1)$ .

Proof. We first claim that $\lVert \Lambda _1 \rVert _{\mathrm {F}}^2\mathcal {J}^q_{\Lambda _1}(\mu _{*,\Lambda })$ and $\mathcal {J}^R_{\Lambda _2}(\mu _{*,\Lambda })$ are bounded when $\lVert \Lambda _1 \rVert _{\mathrm {F}} \to 0$ , which, by estimates (3.25) and (3.28), implies that $\mu _{*,\Lambda }$ is bounded in $\mathcal {M}(Q,\mathbb {X})$ . For this, we consider the set

(4.27)

\begin{align} \mathcal {CE}_{\Lambda _1, q} \,:\!=\, \arg \min \{\mathcal {J}^q_{\Lambda _1}(\mu )\,; \ \mu \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)\}\,. \end{align}

Similarly to the proof of Lemma3.9, we have that $\mathcal {CE}_{\Lambda _1, q}$ is nonempty and contains at least one element with $\mathsf {q} = 0$ and $\min \{\mathcal {J}^q_{\Lambda _1}(\mu )\,; \ \mu \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)\} = 0$ . Since $\mu _{*,\Lambda }$ minimises $\mathcal {J}_{\Lambda, Q}(\cdot )$ , it follows that

(4.28)

\begin{equation} \mathcal {J}_{\Lambda, Q}(\mu _{*,\Lambda }) = \mathcal {J}^q_{\Lambda _1}(\mu _{*,\Lambda }) + \mathcal {J}^R_{\Lambda _2}(\mu _{*,\Lambda })\le \mathcal {J}_{\Lambda, Q}(\mu ) = \mathcal {J}^R_{\Lambda _2}(\mu )\,, \quad \forall \mu = \mathsf {(G,0,R)}\in \mathcal {CE}_{\Lambda _1, q}\,. \end{equation}

Noting $\{\mathsf {(G,0,R)}\in \mathcal {CE}_{\Lambda _1, q}\} = \{\mathsf {(G,0,R)} \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)\}$ , (4.28) yields that $\mathcal {J}^R_{\Lambda _2}(\mu _{*,\Lambda })$ is bounded by a constant independent of $\Lambda _1$ . Moreover, multiplying $\lVert \Lambda _1 \rVert _{\mathrm {F}}^2$ on both sides of (4.28) and then letting $\lVert \Lambda _1 \rVert _{\mathrm {F}} \to 0$ , we obtain

(4.29)

\begin{equation} \lim _{\lVert \Lambda _1 \rVert _{\mathrm {F}} \to 0} \lVert \Lambda _1 \rVert _{\mathrm {F}}^2\,\mathcal {J}^q_{\Lambda _1}(\mu _{*,\Lambda }) = 0\,. \end{equation}

Then the boundedness of $\lVert \Lambda _1 \rVert _{\mathrm {F}}^2\mathcal {J}^q_{\Lambda _1}(\mu _{*,\Lambda })$ for small enough $\lVert \Lambda _1 \rVert _{\mathrm {F}}$ follows. We complete the proof of the claim.

By the boundedness of $\lVert \mu _{*,\Lambda } \rVert _{\mathrm {TV}}$ as $\lVert \Lambda _1 \rVert _{\mathrm {F}} \to 0$ , we are allowed to take a subsequence $\{\Lambda _{1,j}\}_{j \in \mathbb {N}}$ in $\mathbb {S}_+^n$ such that the minimiser $\mu _{*,\widetilde {\Lambda }_j}$ with $\widetilde {\Lambda }_j = (\Lambda _{1,j},\Lambda _2)$ weak^* converges to a measure $\mu _* \in \mathcal {M}(Q,\mathbb {X})$ when $n \to \infty$ , which clearly satisfies $\mu _* \in \mathcal {CE}([0,1];\, \mathsf {G}_0,\mathsf {G}_1)$ . Then, by the weak^* lower semicontinuity of $\mathcal {J}^R_{\Lambda _2}$ and (4.28), we have

(4.30)

\begin{equation} \mathcal {J}^R_{\Lambda _2}(\mu _*) \le \liminf _{j\to \infty } \mathcal {J}^R_{\Lambda _2}(\mu _{*,\widetilde {\Lambda }_j}) \le \limsup _{j\to \infty } \mathrm {WB}^2_{\widetilde {\Lambda }_j}(\mathsf {G}_0,\mathsf {G}_1) \le \inf \{\mathcal {J}^R_{\Lambda _2}(\mu )\,;\ \mu = \mathsf {(G,0,R)}\in \mathcal {CE}_{\Lambda _1, q}\}\,. \end{equation}

The right-hand side of (4.30) is recognised as $\mathrm {WB}_{(0,\Lambda _2)}(\mathsf {G}_0,\mathsf {G}_1)$ and the inf is attained; see Remark3.11 and Theorem4.2. Also, by (3.25) and (4.29), it holds that the limit measure $\mu _* \in \mathcal {CE}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ is of the form $\mu _* = (\mathsf{G}_{*}, \mathsf{0}, \mathsf{R}_{*})$ . The proof is completed by (4.30).

Proposition4.6 above tells us that the measure $\mathsf {q}$ is forced to be nearly zero, if the transportation part is given too much weight (i.e., $\lVert \Lambda _1 \rVert _{\mathrm {F}}$ is small, cf. (3.24)), equivalently, if the problem is on a large scale (cf. Remark3.17). It is also possible and interesting to consider other limiting regimes, e.g., $\lVert \Lambda _1 \rVert _{\mathrm {F}} \to \infty$ , $\lVert \Lambda _2 \rVert _{\mathrm {F}} \to 0$ , or only let part of eigenvalues of $\Lambda _i$ vanish, which, however, is beyond the scope of this work.

5. Geometric properties and Riemannian interpretation

In this section, we shall study the space $\mathcal {M}(\Omega, \mathbb {S}^n_+)$ equipped with the distance $\mathrm {WB}_{\Lambda }(\cdot, \cdot )$ from the metric point of view. In particular, we will prove that $(\mathcal {M}(\Omega, \mathbb {S}^n_+),\mathrm {WB}_{\Lambda })$ is a complete geodesic space with a Riemannian interpretation. We first show that $\mathrm {WB}_{\Lambda }(\cdot, \cdot )$ is indeed a metric on $\mathcal {M}(\Omega, \mathbb {S}^n_+)$ , which is a simple corollary of the following characterisation of $\mathrm {WB}_{\Lambda }(\cdot, \cdot )$ by standard reparameterisation techniques (cf. [Reference Ambrosio, Gigli and Savaré1, Lemma 1.1.4] or [Reference Dolbeault, Nazaret and Savaré34, Theorem 5.4]). We denote by $\widetilde {\mathcal {CE}}([a,b];\mathsf {G}_0,\mathsf {G}_1)$ the set of measures $\mu \in \mathcal {CE}([a,b];\mathsf {G}_0,\mathsf {G}_1)$ that can be disintegrated as $\mu = \int _a^b \delta _t \otimes \mu _t\, \mathrm {d} t$ . It is clear that $\mathcal {CE}_\infty \subset \widetilde {\mathcal {CE}} \subset \mathcal {CE}$ .

Lemma 5.1. For $\mathsf {G}_0,\mathsf {G}_1 \in \mathcal {M}(\Omega, \mathbb {S}_+^n)$ and $b \gt a \gt 0$ , there holds

(5.1)

\begin{align} \mathrm {WB}_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) = \inf _{\mu \in \widetilde {\mathcal {CE}}([a,b];\mathsf {G}_0,\mathsf {G}_1)} \int _a^b \mathcal {J}_{\Lambda, \Omega }(\mu _t)^{1/2} \, \mathrm {d} t \,. \end{align}

Moreover, the minimiser to the problem ( $\mathcal {P}^{\prime}$ ) gives a constant-speed minimiser $\mu$ to (5.1), which satisfies

(5.2)

\begin{align} (b - a) J_{\Lambda, \Omega }(\mu _t)^{1/2} =\mathrm {WB}_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) \quad \text {for}\ a.e.\, t \in [a,b]\,. \end{align}

The proof is provided in Appendix A for completeness. The above lemma is an analogue of a well-known geometric fact that minimising the energy of a parametric curve is the same as minimising its length with constant-speed constraint [Reference Flaherty and do Carmo40]. The following result summarises some fundamental properties of $(\mathcal {M}(\Omega, \mathbb {S}^n_+), \mathrm {WB}_{\Lambda })$ .

Proposition 5.2. $(\mathcal {M}(\Omega, \mathbb {S}^n_+), \mathrm {WB}_{\Lambda })$ is a complete metric space. Moreover, the topology induced by the metric $\mathrm {WB}_{\Lambda }$ is stronger than the weak^* one, i.e., $\lim _{n \to \infty }\mathrm {WB}_{\Lambda }(\mathsf {G}^n,\mathsf {G}) = 0$ implies the weak^* convergence of $\mathsf {G}^n$ to $\mathsf {G}$ .

Remark 5.3. We should emphasise that stronger in Proposition 5.2 above means at least as strong as. In the special case of WFR distance ( $\mathcal {P}_{\mathrm {WFR}}$ ), one can show [Reference Liero, Mielke and Savaré65, Theorem 7.15] that $\mathrm {WFR}(\cdot, \cdot )$ metrizes the weak^* topology on $\mathcal {M}(\Omega, \mathbb {R}_+)$ . However, the exact characterisation of the topology induced by a general metric $\mathrm {WB}_{\Lambda }(\cdot, \cdot )$ is still open. In addition, given the multi-component nature of our matrix-valued transport problem, one can expect that there may be some interesting connections between our model $(\mathcal {M}(\Omega, \mathbb {S}^n_+), \mathrm {WB}_{\Lambda })$ and the multimaterial transport problem [11, 70], which deals with the simultaneous transportation of vector-valued measures along a network or graph and can exhibit the branching behaviour. The detailed investigation of these problems is beyond the scope of this work and left for future work.

The proof of Proposition5.2 needs a priori estimates (3.25) and (3.28), and the following lemma, which is a direct consequence of Lemma3.9.

Lemma 5.4. A subset of $\mathcal {M}(\Omega, \mathbb {S}^n_+)$ is bounded with respect to the distance $\mathrm {WB}_\Lambda$ if and only if it is bounded with respect to the total variation norm. Hence, a bounded set in $(\mathcal {M}(\Omega, \mathbb {S}^n_+), \mathrm {WB}_\Lambda )$ is weak^* relatively compact.

Proof of Proposition 5.2. First, note that $\mathrm {WB}_{\Lambda }$ is a function from $\mathcal {M}(\Omega, \mathbb {S}^n_+) \times \mathcal {M}(\Omega, \mathbb {S}^n_+)$ to $[0,+\infty )$ . It is also easy to check $\mathrm {WB}_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) = 0$ for $\mathsf {G}_0 = \mathsf {G}_1$ by considering the constant curve $\mathsf {G}_t = \mathsf {G}_0$ with $\mathsf {q} = \mathsf {R} = 0$ , the symmetry $\mathrm {WB}_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) = \mathrm {WB}_{\Lambda }(\mathsf {G}_1,\mathsf {G}_0)$ by Lemma3.15 and the triangle inequality by (5.1). Then, to show that $\mathrm {WB}_{\Lambda }$ is a metric, it suffices to prove that $\mathrm {WB}_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) = 0$ implies $\mathsf {G}_0 = \mathsf {G}_1$ .

For this, suppose that $\mu = \mathsf {(G,q,R)}$ is a minimiser to ( $\mathcal {P}$ ) with $\mathcal {J}_{\Lambda, Q}(\mu ) = 0$ . Recalling the formula (3.24), we have $\mathsf {(q,R)} = 0$ . Then, taking test functions $\Phi (t,x) = \Psi (x)$ with $\Psi (x) \in C^1(\Omega, \mathbb {S}^n)$ in (3.13), we find $\langle \mathsf {G}_1 - \mathsf {G}_0, \Psi \rangle _{\Omega } = 0$ , $\forall \Psi \in C^1(\Omega, \mathbb {S}^n)$ , which implies $\mathsf {G}_0 = \mathsf {G}_1$ . Next, we show that the metric space $(\mathcal {M}(\Omega, \mathbb {S}^n_+), \mathrm {WB}_{\Lambda })$ is complete. Let $\{\mathsf {G}^n\}_{n \in \mathbb {N}}$ be a Cauchy sequence in $(\mathcal {M}(\Omega, \mathbb {S}^n_+), \mathrm {WB}_{\Lambda })$ , and hence also bounded in $\mathrm {WB}_{\Lambda }$ . By Lemma5.4, we have that $\mathsf {G}^n$ , up to a subsequence, weak^* converges to a measure $\mathsf {G} \in \mathcal {M}(\Omega, \mathbb {S}_+^n)$ . Then, by Corollary4.3 and the fact that $\{\mathsf {G}^n\}$ is a Cauchy sequence, for small $\varepsilon \gt 0$ and large enough $m$ , there holds

\begin{align*} \varepsilon \ge \liminf _{n \to 0}\mathrm {WB}_{\Lambda }(\mathsf {G}^n,\mathsf {G}^m) \ge \mathrm {WB}_{\Lambda }(\mathsf {G},\mathsf {G}^m)\,, \end{align*}

which immediately gives $\mathrm {WB}_{\Lambda }(\mathsf {G},\mathsf {G}^m) \to 0$ as $m \to \infty$ . To finish, we show that $\mathsf {G}^n$ weak^* converges to $\mathsf {G}$ if $\mathsf {G}^n$ converges to $\mathsf {G}$ in $(\mathcal {M}(\Omega, \mathbb {S}^n_+), \mathrm {WB}_{\Lambda })$ . To do so, it suffices to note that by a similar argument as above, every subsequence of $\mathsf {G}^n$ has a weak^* convergent sub-subsequence to $\mathsf {G}$ , which readily gives the weak^* convergence of $\mathsf {G}^n$ to $\mathsf {G}$ .

The main aim of this section is to show that $(\mathcal {M}(\Omega, \mathbb {S}^n),\mathrm {WB}_{\Lambda })$ is a geodesic space and then equip it with some differential structure that is consistent with the metric structure, in the spirit of [Reference Ambrosio, Gigli and Savaré1, Reference Dolbeault, Nazaret and Savaré34].

For the reader’s convenience, we recall some basic concepts for the analysis in metric spaces [Reference Ambrosio and Tilli2]. Let $(X,d)$ be a metric space and $\{\omega _t\}_{t \in [a, b]}$ be a curve in $(X,d)$ (i.e., a continuous map from $[a,b]$ to $X$ ). We say that it is absolutely continuous if there exists a $L^1$ -function $g$ such that $d(\omega _t,\omega _s) \le \int _s^t g(r) \,\mathrm {d} r$ for any $a \le s \le t \le b$ . Moreover, the curve is said to have finite $p$ -energy if $g \in L^p([a,b],\mathbb {R})$ .

The metric derivative $|\omega _t^{\prime}|$ of $\{\omega _t\}_{t \in [a, b]}$ at the time point $t$ is defined by $|\omega^{\prime}_t| \,:\!=\, \lim _{\delta \to 0}|\delta |^{-1} d(\omega _{t + \delta },\omega _t)$ , if the limit exists. It can be shown [Reference Ambrosio, Gigli and Savaré1, Theorem 1.1.2] that for an absolutely continuous curve $\omega _t$ , the metric derivative $|\omega^{\prime}_t|$ is well-defined for a.e. $t \in [a,b]$ and satisfies $|\omega^{\prime}_t| \le g(t)$ .

The length $\mathrm {L}(\omega _t)$ of an absolutely continuous curve $\{\omega _t\}_{t \in [a,b]}$ is defined as $\mathrm { L}(\omega _t) = \int _a^b |\omega^{\prime}_t| \,\mathrm {d} t$ , which is invariant with respect to the reparameterisation. Then, $(X,d)$ is a geodesic space if for any $x,y \in X$ , there holds

(5.3)

\begin{align} d(x,y) = \min \{\mathrm {L}(\omega _t); \ \{\omega _t\}_{t \in [0,1]}\ \text {is absolutely continuous with}\ \omega (0) = x\,, \omega (1) = y \}, \end{align}

where the minimiser exists and is called the (minimizing) geodesic between $x$ and $y$ . Recall [Reference Ambrosio, Gigli and Savaré1, Lemma 1.1.4] that any absolutely continuous curve can be reparameterised as a Lipschitz one with constant metric derivative $|\omega^{\prime}_t| = \mathrm {L}(\omega _t)$ a.e.. Hence, we can always assume that the geodesic is constant-speed (i.e., $|\omega _t^{\prime}|$ is constant a.e.). Then, it is clear from definition (5.3) that a curve $\{\omega _t\}_{t \in [0,1]}$ is a constant-speed geodesic if and only if it satisfies $d(\omega _s,\omega _t) = |t -s |d (\omega _0,\omega _1)$ for any $0 \lt s \lt t \lt 1$ .

From the above concepts, we see that for our purpose, a key step is to characterise the absolutely continuous curves in the metric space $(\mathcal {M}(\Omega, \mathbb {S}^n_+),\mathrm {WB}_{\Lambda })$ , which is given by the following theorem extended from [Reference Dolbeault, Nazaret and Savaré34, Theorem 5.17].

Theorem 5.5. A curve $\{\mathsf {G}_t\}_{t \in [a,b]}$ , $b \gt a \gt 0$ , is absolutely continuous with respect to the metric $\mathrm {WB}_{\Lambda }$ if and only if there exists $(\mathsf {q},\mathsf {R}) \in \mathcal {M}(Q,\mathbb {R}^{n \times k} \times \mathbb {M}^n)$ such that $\mu = (\mathsf {G},\mathsf {q},\mathsf {R}) \in \widetilde {\mathcal {CE}}([a,b];\mathsf {G}_0,\mathsf {G}_1)$ and

(5.4)

\begin{equation} \int _a^b \mathcal {J}_{\Lambda, \Omega }(\mu _t)^{1/2}\, \mathrm {d} t \lt + \infty \,. \end{equation}

In this case, the metric derivative $|\mathsf {G}_t^{\prime}|$ satisfies

(5.5)

\begin{equation} |\mathsf {G}_t^{\prime}| \le \mathcal {J}_{\Lambda, \Omega }(\mu _t)^{1/2}\quad \text {for}\ a.e.\,t\in [a,b], \end{equation}

and there exists unique $(\mathsf {q}_{*}, \mathsf {R}_{*})$ such that the equality in ( 5.5 ) holds a.e., where the uniqueness is in the sense of equivalence class: $\mathsf {(q,R)} \sim (\mathsf {q}^{\prime},\mathsf {R}^{\prime})$ if and only if $\mathcal {J}_{\Lambda, Q_a^b}((\mathsf {G}, \mathsf {q-q}^{\prime},\mathsf {R-R}^{\prime})) = 0$ . If $\mathsf {G}_t$ has finite $2$ -energy, then $(\mathsf {q}_{*}, \mathsf {R}_{*}) = (\mathsf {G}u_*, \mathsf {G}W_*)$ with the $L^2_{\mathsf {G},\Lambda }$ -field $(u_*,W_*)$ given in Theorem 4.5 .

Remark 5.6. As a corollary of Theorem 5.5, we have that $\mathcal {AC}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ in (4.15) is nothing else than the set of absolutely continuous curves with finite $2$ -energy.

Proof. It suffices to consider the case $[a,b] = [0,1]$ . We first consider the trivial if part. For $\mu \in \widetilde {\mathcal {CE}}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ with the property (5.4), it follows from (5.1) that

\begin{align*} \mathrm {WB}_{\Lambda }(\mathsf {G}_s,\mathsf {G}_t) \le \int _s^t \mathcal {J}_{\Lambda, \Omega }(\mu _\tau )^{1/2}\, \mathrm {d} \tau \quad \forall 0 \le s\le t \le 1\,, \end{align*}

which, by definition, readily implies that $\{\mathsf {G}_t\}_{t \in [0,1]}$ is absolutely continuous and (5.5) holds. We now consider the only if part. Let $\{\mathsf {G}_t\}_{t \in [0,1]}$ be an absolutely continuous curve, which, by reparameterisation, can be further assumed to be Lipschitz with the Lipschitz constant denoted by $\mathrm {Lip}(\mathsf {G}_t)$ . We will approximate it by piecewise constant-speed curves. We fix an integer $N \in \mathbb {N}$ with the step size $\tau = 2^{-N}$ . Let $\{\mu _t^{k,N}\}_{t \in [(k-1)\tau, k \tau ]}$ be a minimiser to ( $\mathcal {P}^{\prime}$ ) with $[a,b] = [(k - 1)\tau, k \tau ]$ , which satisfies

(5.6)

\begin{align} \tau ^{1/2} \mathcal {J}_{\Lambda, \Omega }(\mu ^{k,N}_t)^{1/2} = \tau ^{-1/2}\mathrm {WB}_{\Lambda }(\mathsf {G}_{(k-1)\tau }, \mathsf {G}_{k \tau }) \le \Big (\int _{(k-1)\tau }^{k\tau } |\mathsf {G}_t^{\prime}|^2 \,\mathrm {d} t\Big )^{1/2}\,, \quad a.e.\ t \in [(k-1)\tau, k\tau ]\,, \end{align}

by Lemma5.1 and the absolute continuity of $\mathsf {G}_t$ . We glue the curves $\big \{\mu ^{k,N}_t\big \}_{t \in [(k-1)\tau, k\tau ]}$ with $k = 1,\ldots, 2^N$ and obtain a new one $\{\mu ^N_t = (\mathsf {G}_t^N,\mathsf {q}_t^N,\mathsf {R}_t^N)\}_{t \in [0,1]} \in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ .

Next, note that for any $(a,b) \subset [0,1]$ , there exists $k_1^N, k_2^N \in \mathbb {N}$ with $N$ large enough such that $[(k^N_1 + 1)\tau, (k_2^N - 1) \tau ] \subset (a,b) \subset [k^N_1 \tau, k_2^N \tau ]$ . By squaring (5.6) and summing it from $k = k_1^N + 1$ to $ k = k_2^N$ , there holds

(5.7)

\begin{align} \int _a^b \mathcal {J}_{\Lambda, \Omega }(\mu _t^N)\, \mathrm {d} t \le \sum _{k = k^N_1 + 1}^{k^N_2} \int _{(k-1)\tau }^{k\tau } \mathcal {J}_{\Lambda, \Omega }(\mu ^{k, N}_t)\, \mathrm {d} t\le \int _{a}^{b} |\mathsf {G}_t^{\prime}|^2\, \mathrm {d} t + 2 \tau \mathrm {Lip}(\mathsf {G}_t)^2\,. \end{align}

By taking $a = 0$ , $b = 1$ in (5.7), we observe that $\int _0^1 \mathcal {J}_{\Lambda, \Omega }(\mu _t^N) \,\mathrm {d} t$ is uniformly bounded in $N$ . By Proposition3.18, up to a subsequence, $\{\mu ^N_t\}_{t \in [0,1]}$ weak^* converges to a measure $\widetilde {\mu } = (\widetilde {\mathsf {G}},\widetilde {\mathsf {q}},\widetilde {\mathsf {R}}) \in \mathcal {CE}_\infty ([0,1],\mathsf {G}_0,\mathsf {G}_1)$ . Moreover, it follows from (3.38) and (5.7) that, for $[a,b] \subset [0,1]$ ,

(5.8)

\begin{align} \int _a^b \mathcal {J}_{\Lambda, \Omega }(\widetilde {\mu }_t)\, \mathrm {d} t \le \liminf _{N \to +\infty } \int _a^b \mathcal {J}_{\Lambda, \Omega }(\mu _t^N)\, \mathrm {d} t \le \int _{a}^{b} |\mathsf {G}_t^{\prime}|^2\, \mathrm {d} t \,. \end{align}

We now show $\widetilde {\mathsf {G}}_t = \mathsf {G}_t$ for $0 \le t \le 1$ . Note that for any $t \in [0,1]$ , there exists a sequence of integers $k_N$ such that $s_N = k_N 2^{-N} \to t$ as $N \to \infty$ , which implies that $\mathsf {G}^N_{s_N} = \mathsf {G}_{s_N}$ weak^* converges to $\widetilde {\mathsf {G}}_t$ by Proposition3.18. Meanwhile, $\mathsf {G}_{s_N}$ weak^* converges to $\mathsf {G}_t$ by the continuity of $\mathsf {G}_t$ . We hence have $\widetilde {\mathsf {G}}_t = \mathsf {G}_t$ . Then, it follows from (5.8) that

\begin{equation*} \mathcal {J}_{\Lambda, \Omega }(\widetilde {\mu }_t) = \mathcal {J}_{\Lambda, \Omega }(\mathsf {G}_t,\widetilde {\mathsf {q}}_t,\widetilde {\mathsf {R}}_t) \le |\mathsf {G}_t^{\prime}|^2\,, \end{equation*}

by Lebesgue differentiation theorem. The proof of the only if direction is completed by noting that (5.4) and (5.5) are invariant with respect to the parameterisation. The uniqueness of $(\mathsf {q}_*, \mathsf {R}_*)$ follows from the linearity of the continuity equation in the variable $(\mathsf {q},\mathsf {R})$ and the strict convexity of the $L^2_{\mathsf {G}}$ -norm.

We finally show that when $\mathsf {G}_t$ is absolutely continuous with finite $2$ -energy, $\mu \,:\!=\, (\mathsf {G}, \mathsf {G}u_{*}, \mathsf {G}W_{*}) \in \mathcal{CE}_{\infty} ([0,1];\,\mathsf{G}_0,\mathsf{G}_1)$ satisfies $\mathcal {J}_{\Lambda, \Omega }(\mu _t)^{1/2} \le |\mathsf {G}_t^{\prime}|$ for a.e. $t \in [0,1]$ , where $(u_*,W_*)$ is given in Theorem4.5 (i.e., the Riesz representation of $\widetilde {l}_{\mathsf {G}}$ in $H_{\mathsf {G},\Lambda }(\mathsf {D}^*)$ ). Let $(a,b) \subset [0,1]$ , and $\eta \in C_c^\infty ((a,b))$ with $0 \le \eta \le 1$ , and $\{(\mathsf {D}^* \Phi _n \Lambda _1^2, \Phi _n \Lambda _2^2)\}$ with $\Phi _n \in C^1(Q,\mathbb {S}^n)$ be a sequence approximating $(u_*, W_*)$ . Then, by using (4.18) and noting $\mathsf {D}^* (\eta ^2 \Phi ) = \eta ^2 \mathsf {D}^* (\Phi )$ , we have

(5.9)

\begin{align} &\left \lVert (\eta u_*, \eta W_*) \right \rVert ^2_{L^2_{\mathsf {G},\Lambda }(Q)} = \lim _{n \to + \infty } \big \langle (\eta ^2 u_*, \eta ^2 W_*), (\mathsf {D}^*\Phi _n \Lambda _1^2, \Phi _n \Lambda _2^2) \big \rangle _{L^2_{\mathsf {G},\Lambda }(Q)} = \lim _{n \to + \infty } l_{\mathsf {G}}(\eta ^2 \Phi _n)\,. \end{align}

By only if part proved above, there exists some $\mathsf {(q,R)}$ such that

(5.10)

\begin{align} \left |l_{\mathsf {G}}(\eta ^2 \Phi _n)\right | &\le \left \lVert (G^\dagger q, G^\dagger R) \right \rVert _{L^2_{\mathsf {G},\Lambda }(Q_a^b)} \left \lVert (\mathsf {D}^* \eta ^2 \Phi _n, \eta ^2 \Phi _n) \right \rVert _{L^2_{\mathsf {G},\Lambda }(Q_a^b)}\notag \\ & \le \Big (\int _a^b |\mathsf {G}_t^{\prime}|^2 \, \mathrm {d} t \Big )^{1/2} \left \lVert ( \mathsf {D}^* \Phi _n, \Phi _n) \right \rVert _{L^2_{\mathsf {G},\Lambda }(Q_a^b)}\,. \end{align}

Combining (5.9) with (5.10) and letting $\eta$ approximate $\chi _{[a,b]}$ , we obtain

(5.11)

\begin{equation} \left \lVert ( u_*, W_*) \right \rVert _{L^2_{\mathsf {G},\Lambda }(Q_a^b)} \le \Big (\int _a^b |\mathsf {G}_t^{\prime}|^2 \,\mathrm {d} t \Big )^{1/2}\,. \end{equation}

Then, by Lebesgue differentiation theorem again, the inequality (5.11) gives the desired $\mathcal {J}_{\Lambda, \Omega }(\mu _t)^{1/2} \le |\mathsf {G}_t^{\prime}|$ for the measure $\mu = (\mathsf {G}, \mathsf {G}u_*, \mathsf {G}W_*)$ . The proof is complete.

From Lemma5.1 and Theorem5.5, we have

(5.12)

\begin{align} \mathrm {WB}_\Lambda (\mathsf {G}_0,\mathsf {G}_1) & = \inf _{\mathsf {G}} \inf _{\mathsf {(q,R)}} \Big \{\int _0^1 \mathcal {J}_{\Lambda, \Omega }(\mu _t)^{1/2} \,\mathrm {d} t\,;\ \mu = \mathsf {(G,q,R)} \in \widetilde {\mathcal {CE}}([0,1];\,\mathsf {G}_0,\mathsf {G}_1) \Big \}\notag \\ & = \inf _{\mathsf {G}} \Big \{ \int _0^1 |\mathsf {G}_t^{\prime}| \, \mathrm {d} t\,; \ \{\mathsf {G}\}_{t \in [0,1]} \ \text {is absolutely continuous with}\ \mathsf {G}_t|_{t = 0} = \mathsf {G}_0\,, \mathsf {G}_t|_{t = 1} = \mathsf {G}_1 \Big \}\,. \end{align}

Note that if $\{\mu _t\}_{t \in [0,1]} \in \mathcal {CE}_\infty ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ minimises ( $\mathcal {P}$ ), then for any $0 \le a \lt b \le 1$ , $\{\mu _t\}_{t \in [a,b]}$ is a minimiser to ( $\mathcal {P}^{\prime}$ ) with $\mathsf {G}_0 = \mathsf {G}_t|_{t = a}$ and $\mathsf {G}_1 = \mathsf {G}_t|_{t = b}$ . Recalling the constant-speed property (5.2) of the minimiser $\mu = \mathsf {(G,q,R)}$ , we readily see that the associated $\{\mathsf {G}_t\}_{t \in [0,1]}$ is the desired constant-speed geodesic:

(5.13)

\begin{align} \mathrm {WB}_{\Lambda }(\mathsf {G}_s,\mathsf {G}_t) = |t - s| \mathrm {WB}_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1)\,, \quad \forall 0 \le s \le t \le 1\,. \end{align}

It allows us to conclude that the $\inf$ in (5.12) is attained, and the main result follows.

Corollary 5.7. $(\mathcal {M}(\Omega, \mathbb {S}^n_+), \mathrm {WB}_{\Lambda })$ is a geodesic space. The constant-speed geodesic connecting $\mathsf {G}_0, \mathsf {G}_1 \in \mathcal {M}(\Omega, \mathbb {S}^n_+)$ is given by the minimiser to ( $\mathcal {P}$ ).

Another important application of Theorem5.5 is that we can view the set of $\mathbb {S}^n_+$ -valued measures as a pseudo-Riemannian manifold, following [Reference Ambrosio, Gigli and Savaré1, Proposition 8.4.5]. We define the tangent space at each $\mathsf {G} \in \mathcal {M}(\Omega, \mathbb {S}_+^n)$ by

(5.14)

\begin{align} \mathrm{Tan}(\mathsf{G})\,:\!=\, \big \{ & \mathsf{(q,R)} \in \mathcal {M}(\Omega, \mathbb {R}^{n \times k} \times \mathbb {M}^n)\,; \ \mathcal {J}_{\Lambda, \Omega }({\mu }) \lt \infty \ \, \text {with}\ \, \mu = \mathsf {(G,q,R)} \in \mathcal {M}(\Omega, \mathbb {X}); \,.\notag \\ & \mathcal {J}_{\Lambda, \Omega }(\mu ) \le \mathcal {J}_{\Lambda, \Omega }((\mathsf {G,q}+ \widehat {\mathsf {q}}, \mathsf {R} + \widehat {\mathsf {R}}))\,,\ \forall (\widehat {\mathsf {q}}, \widehat {\mathsf {R}}) \ \text {satisfying} \ \mathsf {D} \widehat{\mathsf {q}} =\widehat{\mathsf{R}}^{\mathrm{sym}} \big \} \end{align}

From Theorem5.5, we have that among all the measures $\mathsf {(q,R)}$ generating $\{\mathsf {G}_t\}_{t \in [0,1]}$ by the continuity equation, there is a unique one $(\mathsf {q}_*, \mathsf {R}_*)$ with minimal $\mathcal {J}_{\Lambda, \Omega }(\mu _t)$ given by $|\mathsf {G}_t^{\prime}|$ for a.e. $t \in [0,1]$ , that is, $(\mathsf {q}_{*,t},\mathsf {R}_{*,t}) \in Tan(\mathsf {G}_t)$ a.e. by (5.14). We also introduce the space $Tan_{field}(\mathsf {G})$ similar to $H_{\mathsf {G},\Lambda }(\mathsf {D}^*)$ (4.20):

\begin{equation*} Tan_{field}(\mathsf {G}) = \overline {\left \{(\mathsf {D}^* \Phi \Lambda _1^2, \Phi \Lambda _2^2)\,;\ \Phi \in C^1(\Omega, \mathbb {S}^n)\right \}}^{\lVert \cdot \rVert _{L^2_{\mathsf {G},\Lambda }(\Omega )}}\,. \end{equation*}

Then, similarly to the argument for Theorem4.5, the tangent space $Tan(\mathsf {G})$ can be characterised as follows:

(5.15)

\begin{equation} \mathsf {(q,R)} \in Tan(\mathsf {G})\quad \text {if and only if}\quad \mathsf {(q,R)}= \mathsf {G}(u,W)\ \text {with}\ (u,W) \in Tan_{field}(\mathsf {G})\,. \end{equation}

We summarise the above discussions in the following corollary, which provides a Riemannian interpretation of the transport distance $\mathrm {WB}_\Lambda (\cdot, \cdot )$ .

Corollary 5.8. Let $\{\mathsf {G}_t\}_{t \in [0,1]}$ be an absolutely continuous curve in $(\mathcal {M}(\Omega, \mathbb {S}_+^n),\mathrm {WB}_\Lambda )$ and $\{(\mathsf {q}_t,\mathsf {R}_t)\}_{t \in [0,1]}$ be the family of measures in $\mathcal {M}(\Omega, \mathbb {R}^{n \times k} \times \mathbb {M}^n)$ such that $\mu = (\mathsf {G},\mathsf {q},\mathsf {R}) \in \mathcal {CE} ([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ and $ \mathcal {J}_{\Lambda, \Omega }(\mu _t)$ is finite a.e.. Then $|\mathsf {G}_t^{\prime}| = \mathcal {J}_{\Lambda, \Omega }(\mu _t)$ holds for a.e. $t \in [0,1]$ if and only if $(\mathsf {q}_t, \mathsf {R}_t) \in Tan(\mathsf {G}_t)$ a.e., where $Tan(\mathsf {G})$ is defined in (5.14) and characterised by (5.15). Moreover, for absolutely continuous $\mathsf {G}_t$ with finite 2-energy (i.e., $\mathsf {G} \in \mathcal {AC}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ ), let $(u_*,W_*)$ be the unique minimiser to (4.22). Then, there holds $(u_{*,t},W_{*,t}) \in Tan_{field}(\mathsf {G}_t)$ a.e..

6. Cone space and spherical distance

In this section, we discuss the conic structure of our weighted transport distance $\mathrm {WB}_\Lambda$ , which extends the results in [Reference Brenier and Vorotnikov16, Section 4] and [Reference Monsaingeon and Vorotnikov73, Section 5]. The starting point is a spherical distance associated with $\mathrm {WB}_\Lambda$ :

(6.1)

\begin{align} \mathrm {SWB}_{\Lambda }^2(\mathsf {G}_0,\mathsf {G}_1) = \inf \big \{\mathcal {J}_{\Lambda, Q}(\mu )\,;\ \mu \in \widetilde {\mathcal {CE}}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)\,, Tr_\Lambda \mathsf {G}_t(\Omega ) = 1\big \}\,,\quad \text {for}\ \mathsf {G}_0,\mathsf {G}_1 \in \mathcal {M}_1\,, \end{align}

where $Tr_\Lambda (X) \,:\!=\, Tr\big (\widetilde {\Lambda }_2^{-1}X\widetilde {\Lambda }_2^{-1}\big )$ with $\widetilde {\Lambda }_2 = n \Lambda _2/Tr(\Lambda _2)$ is the scaled trace and

(6.2)

\begin{equation} \mathcal {M}_1\,:\!=\, \{\mathsf {G} \in \mathcal {M}(\Omega, \mathbb {S}_+^n)\,;\ Tr_\Lambda \mathsf {G}(\Omega ) = 1\}\,. \end{equation}

We will prove that $(\mathcal {M}_1, \mathrm {SWB}_{\Lambda })$ is a complete geodesic space and $(\mathcal {M}(\Omega, \mathbb {S}_+^n),\mathrm {WB}_\Lambda )$ can be viewed as its metric cone. Let us first recall some basic concepts [Reference Burago, Burago and Ivanov19, Reference Laschos and Mielke60]. We consider a metric space $(X,d_X)$ with diameter $\mathrm {diam}(X) = \sup _{x,y\in X}d_X(x,y) \le \pi$ . The associated cone is defined by $\mathfrak {C}(X) \,:\!=\, X \times [0,\infty ) \backslash X \times \{0\}$ with the metric

(6.3)

\begin{equation} d^2_{\mathfrak {C}(X)}([x_0,r_0],[x_1,r_1]) \,:\!=\, r_0^2 + r_1^2 - 2 r_0 r_1 \cos (d_X(x_0,x_1))\,, \end{equation}

where a point in $\mathfrak {C}(X)$ is of the form $[x,r]$ with $x \in X$ and $r \ge 0$ and satisfies the equivalence relation $[x_0,0] \sim [x_1,0]$ . It can be proved that for $x_0, x_1 \in X$ with $0 \lt d_X(x_0, x_1) \lt \pi$ and $r_0,r_1 \gt 0$ , there is one-to-one correspondence between the geodesics for $d_{\mathfrak {C}(X)}([x_0,r_0],[x_1,r_1])$ and for $d_{X}(x_0,x_1)$ ; see [Reference Laschos and Mielke60, Theorem 2.6]. In particular, we have the following useful lemmas from [Reference Brenier and Vorotnikov16, Lemma 4.4] and [Reference Laschos and Mielke60, Theorem 2.2], respectively.

Lemma 6.1. If $X$ is a length space, then the distance $d_X(x_0,x_1)$ can be characterised by

\begin{align*} d_X(x_0,x_1) = \inf \Big \{\int _0^1 \big |[x_t,1]^{\prime}\big |_{\mathfrak {C}(X)}\,\mathrm {d} t\,;\ [x_t,1]\ \text {is absolutely continuous and connects}\ [x_0,1]\ \text {and}\ [x_1,1]\Big \}\,, \end{align*}

where $|[x_t,1]^{\prime}|_{\mathfrak {C}(X)}$ is the metric derivative in the space $(\mathfrak {C}(X),d_{\mathfrak {C}(X)})$ .

Lemma 6.2. Let $\mathfrak {C}(X)$ be the cone as above and $(\mathfrak {C}(X), d)$ be a metric space for some metric $d$ . If there holds

(6.4)

\begin{equation} d^2([x_0,r_0], [x_1, r_1]) = r_0 r_1 d^2([x_0,1],[x_1,1]) + (r_0 - r_1)^2\,, \end{equation}

and $0 \lt d^2([x_0,1],[x_1,1]) \le 4$ for $x_0 \neq x_1$ , then $d_X(x_0,x_1)\,:\!=\, \arccos (1 - d^2([x_0,1], [x_1, 1])/2)$ is a metric on $X$ such that (6.3) holds, equivalently, $(\mathfrak {C}(X), d)$ is a metric cone over $(X,d_X)$ .

We are now ready to consider the conic properties of $(\mathcal {M}(\Omega, \mathbb {S}^n_+), \mathrm {WB}_{\Lambda })$ . For this, we set $r \,:\!=\, \sqrt {Tr_\Lambda (\mathsf {G}(\Omega ))} \ge 0$ for a measure $\mathsf {G} \in \mathcal {M}(\Omega, \mathbb {S}_+^n)$ and identify $\mathsf {G}$ with $[\mathsf {G}/r^2,r] \in \mathfrak {C}(\mathcal {M}_1)$ .

Theorem 6.3. Suppose that there holds $\mathsf {D}^*(\Lambda _2^{-2}) = 0$ and let $c \,:\!=\, \sqrt {2}n/Tr(\Lambda _2)$ . Then, $(\mathcal {M}(\Omega, \mathbb {S}_+^n),\mathrm {WB}_\Lambda /c)$ is a metric cone over $(\mathcal {M}_1, \mathrm {SWB}_\Lambda /c)$ , namely, for $\mathsf {G}_0, \mathsf {G}_1 \in \mathcal {M}_1$ and $r_0,r_1 \ge 0$ ,

(6.5)

\begin{equation} \mathrm {WB}_{\Lambda }^2(r_0^2 \mathsf {G}_0, r_1^2 \mathsf {G}_1)/c^2 = r_0^2 + r_1^2 - 2 r_0 r_1 \cos (\mathrm {SWB}_\Lambda (\mathsf {G}_0, \mathsf {G}_1)/c)\,, \end{equation}

and $(\mathcal {M}_1, \mathrm {SWB}_\Lambda /c)$ is a complete geodesic space with $\mathrm {diam}(\mathcal {M}_1) \le \pi$ .

Proof. We first prove that $(\mathcal {M}(\Omega, \mathbb {S}_+^n),\mathrm {WB}_\Lambda /c)$ is a metric cone over $(\mathcal {M}_1, d)$ for some metric $d$ . For this, we note from (3.18) in the proof of Lemma3.9 that

\begin{equation*} \mathrm {WB}^2_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1) \le 2 \int _{\Omega } \Big \lVert \Big (\sqrt {G_1} - \sqrt {G_0}\Big ) \Lambda _2^{-1} \Big \lVert _{\mathrm {F}}^2\ \mathrm {d} \lambda \le 4 \big (n/Tr(\Lambda _2)\big )^2 \big (Tr_{\Lambda } \mathsf {G}_0(\Omega ) + Tr_{\Lambda }\mathsf {G}_1(\Omega )\big )\,, \end{equation*}

which yields $\mathrm {WB}_{\Lambda }^2(\mathsf {G}_0,\mathsf {G}_1) \le 4 c^2$ for $\mathsf {G}_0,\mathsf {G}_1 \in \mathcal {M}_1$ . By Lemma6.2, it suffices to check the scaling property (6.4):

(6.6)

\begin{equation} \mathrm {WB}_{\Lambda }^2(r_0^2 \mathsf {G}_0, r_1^2 \mathsf {G}_1)/c^2 = r_0 r_1 \mathrm {WB}_{\Lambda }^2(\mathsf {G}_0, \mathsf {G}_1)/c^2 + (r_0 - r_1)^2\,, \end{equation}

for $\mathsf {G}_0, \mathsf {G}_1 \in \mathcal {M}_1$ and $r_0,r_1 \ge 0$ to show that $(\mathcal {M}(\Omega, \mathbb {S}_+^n),\mathrm {WB}_\Lambda /c)$ is a metric cone. Note that (6.6) for the case of $r_0 = 0$ or $r_1 = 0$ follows from Proposition4.4. Thus, we can assume $r_0, r_1 \gt 0$ . Let $\{\mu _t = (\mathsf {G}_t,\mathsf {q}_t,\mathsf {R}_t)\}_{t \in [0,1]} \in \widetilde {\mathcal {CE}}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ be an admissible curve. We define scalar functions $b(t) = r_0 + (r_1 - r_0)t$ and $a(t) \,:\!=\, t r_1 /b(t)$ . It is clear that $a(t)$ is strictly increasing with inverse denoted by $t(a)$ . We then define $\widetilde {\mathsf {G}}_t = b(t)^2\mathsf {G}_{a(t)}$ with

\begin{equation*} \widetilde {\mathsf {q}}_t = a^{\prime}(t)b(t)^2 \mathsf {q}_{a(t)}\,,\quad \widetilde {\mathsf {R}}_t = a^{\prime}(t)b(t)^2 \mathsf {R}_{a(t)} + 2b(t)(r_1 - r_0)\mathsf {G}_{a(t)}\,, \end{equation*}

which satisfies the continuity equation with end points $r_0^2 \mathsf {G}_0$ and $r_1^2 \mathsf {G}_1$ . We now compute

(6.7)

\begin{align} \mathcal {J}_{\Lambda, Q}\big (\widetilde {\mathsf {G}},\widetilde {\mathsf {q}},\widetilde {\mathsf {R}}\big ) = & \int _0^1 a^{\prime}(t(a))\, b(t(a))^2 \mathcal {J}_{\Lambda, \Omega }(\mathsf {G}_a, \mathsf {q}_a, \mathsf {R}_a) \, \mathrm {d} a + c^2 (r_1 - r_0)^2 \int _0^1 Tr_\Lambda \mathsf {G}_{a(t)}(\Omega )\, \mathrm {d} t \\ & + c^2 \int _0^1 b(t(a))\, (r_1 - r_0) Tr_\Lambda \mathsf {R}_{a}(\Omega )\, \mathrm {d} a\,. \notag \end{align}

The last two terms in (6.7) can be simplified by (3.13) on $[0,1]$ with test function $\Phi _s = b(t(s)) \,\Lambda _2^{-2}$ :

\begin{equation*} \int _{0}^1 t^{\prime}(a) (r_1 - r_0) Tr_\Lambda \mathsf {G}_a(\Omega ) + b(t(a)) Tr_\Lambda \mathsf {R}_a(\Omega ) \,\mathrm {d} a = r_1 Tr_\Lambda \mathsf {G}_1(\Omega ) - r_0 Tr_\Lambda \mathsf {G}_0(\Omega )\,, \end{equation*}

which implies, thanks to $Tr_\Lambda \mathsf {G}_0(\Omega ) = Tr_\Lambda \mathsf {G}_1(\Omega ) = 1$ ,

(6.8)

\begin{equation} \int _{0}^1 (r_1 - r_0)^2 Tr_\Lambda \mathsf {G}_{a(t)}(\Omega )\, \mathrm {d} t + \int _0^1 b(t(a)) (r_1 - r_0) Tr_\Lambda \mathsf {R}_a(\Omega ) \,\mathrm {d} a = (r_1 - r_0)^2\,. \end{equation}

Therefore, by noting $a^{\prime}(t) b(t)^2 = r_0 r_1$ and using (6.8), it follows that

\begin{align*} \mathcal {J}_{\Lambda, Q}\big (\widetilde {\mathsf {G}},\widetilde {\mathsf {q}},\widetilde {\mathsf {R}}\big ) = r_0 r_1 \int _0^1 \mathcal {J}_{\Lambda, \Omega }(\mathsf {G}_a, \mathsf {q}_a, \mathsf {R}_a) \, \mathrm {d} a + c^2 (r_1 - r_0)^2\,, \end{align*}

which readily gives $\mathrm {WB}_{\Lambda }^2(r_0^2 \mathsf {G}_0, r_1^2 \mathsf {G}_1)/c^2 \le r_0 r_1 \mathrm {WB}_{\Lambda }^2(\mathsf {G}_0, \mathsf {G}_1)/c^2 + (r_0 - r_1)^2$ . The other direction can be proved similarly. We have proved the existence of $(\mathcal {M}_1, d)$ such that $(\mathcal {M}(\Omega, \mathbb {S}_+^n),\mathrm {WB}_\Lambda /c)$ is the associated metric cone.

We now show that the metric $d$ on $\mathcal {M}_1$ is given by $\mathrm {SWB}_\Lambda /c$ .

By Corollary5.7 and [Reference Bridson and Haefliger18, Corollary 5.11], we have that $(\mathcal {M}_1,d)$ is a geodesic space, which, by Lemma6.1, gives, for $\mathsf {G}_0,\mathsf {G}_1 \in \mathcal {M}_1$ ,

\begin{equation*} d(\mathsf {G}_0,\mathsf {G}_1) = \inf \Big \{\int _0^1 |\mathsf {G}_t^{\prime}|\, \mathrm {d} t\,;\ \mathsf {G}_t\ \text {is absolutely continuous in} \ (\mathcal {M}(\Omega, \mathbb {S}_+^n),\mathrm {WB}_\Lambda /c)\ \text {with}\ \mathsf {G}_t \in \mathcal {M}_1 \Big \}\,. \end{equation*}

It then follows from Theorem5.5 and definition (6.1) that $d(\mathsf {G}_0,\mathsf {G}_1) = \mathrm {SWB}_{\Lambda }(\mathsf {G}_0,\mathsf {G}_1)/c$ and hence (6.5) holds. Recalling $\mathrm {WB}_{\Lambda }^2(\mathsf {G}_0,\mathsf {G}_1)/c^2 \le 4$ for $\mathsf {G}_0,\mathsf {G}_1 \in \mathcal {M}_1$ , (6.5) gives $0 \le \mathrm {SWB}_\Lambda (\mathsf {G}_0, \mathsf {G}_1)/c \le \pi$ . Finally, for the completeness of $(\mathcal {M}_1, \mathrm {SWB}_\Lambda /c)$ , it suffices to note that $\mathrm {SWB}_\Lambda$ and $\mathrm {WB}_\Lambda$ are topologically equivalent on $\mathcal {M}_1$ , again by (6.5), and $\mathcal {M}_1$ is a closed set in $(\mathcal {M}(\Omega, \mathbb {S}_+^n),\mathrm {WB}_\Lambda )$ by Proposition5.2.

7. Example and discussion

In this section, we detail the connections between our model ( $\mathcal {P}$ ) and the existing ones.

Example 7.1. (Kantorovich–Bures metric [16]). We set the dimension parameters $n = m =d$ and $k = 1$ and the weight matrices $\Lambda _i = I$ for $i = 1, 2$ in (3.1) and consider the differential operator $\mathsf {D} = \nabla _s$ for the continuity equation (3.13), where $ \nabla _s$ is the symmetric gradient defined by $\nabla _s(q) = \frac {1}{2}(\nabla q + (\nabla q)^{\mathrm {T}})$ for a smooth vector field $q \in C_c^\infty (\mathbb {R}^d,\mathbb {R}^d)$ . Then, ( $\mathcal {P}$ ) gives the convex formulation of the Kantorovich–Bures metric $d_{KB}$ on $\mathcal {M}(\Omega, \mathbb {S}_+^d)$ [16, Definition 2.1]:

(𝒫_WB)

\begin{align} \mathrm {WB}^2_{(I,I)}(\mathsf {G}_0,\mathsf {G}_1)&= \frac {1}{2}d^2_{KB}(\mathsf {G}_0,\mathsf {G}_1) = \inf \big \{\mathcal {J}_{\Lambda, Q}(\mu ) \,;\ \mu = \mathsf {(G,q,R)} \in \mathcal {M}(Q,\mathbb {X})\ \text {satisfies} \notag \\ \partial _t \mathsf {G} &= \{ - \nabla \mathsf {q}_t + \mathsf {R}_t\}^{\mathrm {sym}} \ \text {with}\ \mathsf {G}_t|_{t = 0} = \mathsf {G}_0\,,\ \mathsf {G}_t|_{t = 1} = \mathsf {G}_1 \big \}\,, \end{align}

for $\mathsf {G}_0, \mathsf {G}_1 \in \mathcal {M}(\Omega, \mathbb {S}^d_+)$ , where $\mathcal {J}_{\Lambda, Q}(\mu )$ with $\Lambda = (I,I)$ is given by (3.24):

\begin{equation*} \mathcal {J}_{\Lambda, Q}(\mu ) = \frac {1}{2} \lVert G^\dagger q \rVert ^2_{L^2_{\mathsf {G}}(Q)} + \frac {1}{2} \lVert G^\dagger R \rVert ^2_{L^2_{\mathsf {G}}(Q)}\,. \end{equation*}

Example 7.2. (Wasserstein–Fisher–Rao metric [27, 56, 64]). If we set $n = m = 1$ , $k = d$ and $\Lambda _1 = \sqrt {\alpha } I$ , $\Lambda _2 = \sqrt {\beta }I$ with $\alpha, \beta \gt 0$ , and consider the differential operator $\mathsf {D} = \mathrm {div}$ , then ( $\mathcal {P}$ ) gives the Wasserstein–Fisher–Rao metric [64, (3.1)]: for given distributions $\rho _0, \rho _1 \in \mathcal {M}(\Omega, \mathbb {R}_+)$ ,

(𝒫_WFR)

\begin{align} \mathrm {WFR}^2(\rho _0, \rho _1) = \inf \Big \{\int _0^1 \int _\Omega \rho ^{\dagger }\Big (\frac {1}{2\alpha }|q|^2 + \frac {1}{2\beta } r^2\Big )\,\mathrm {d} x\,\mathrm {d} t \,;\ \partial _t \rho + \mathrm {div} \,q = r \ \text {with}\ \rho _t|_{t = 0} = \rho _0\,,\ \rho _t|_{t = 1} = \rho _1 \Big \}\,. \end{align}

Example 7.3. (Matricial interpolation distance [25]). Let $N$ be a positive integer and $(\mathbb {M}^n)^N$ denote the space of block-row vectors $(A_1,\ldots, A_N)$ with $A_i \in \mathbb {M}^n$ . The spaces $(\mathbb {S}^n)^N$ and $(\mathbb {A}^n)^N$ are defined similarly. For $M \in (\mathbb {M}^n)^N$ , we define its component transpose by $M^t \,:\!=\, (M_1^{\mathrm {T}},\ldots, M_N^{\mathrm {T}})$ . We fix a sequence of symmetric matrices $\{L_k\}_{k=1}^N \subset \mathbb {S}^n$ and define the linear operator $\nabla _L : \mathbb {S}^n \to (\mathbb {A}^n)^N$ by $(\nabla _L X)_k = L_k X - XL_k$ . We denote by $\nabla _L^*$ its dual operator with respect to the Frobenius inner product. We now let $k = n (d + N)$ and write $\mathsf {q} \in \mathcal {M}(Q, \mathbb {R}^{n \times k})$ for $[\mathsf {q}_0, \mathsf {q}_1]$ with $\mathsf {q}_0 \in \mathcal {M}(Q, (\mathbb {M}^n)^d)$ and $\mathsf {q}_1 \in \mathcal {M}(Q,(\mathbb {M}^n)^N)$ . With the above notions, we define

\begin{equation*} \mathsf {D}\,\mathsf {q} \,:\!=\, \frac {1}{2}\mathrm {div} (\mathsf {q}_0 + \mathsf {q}_0^t) - \frac {1}{2}\nabla _L^* (\mathsf {q}_1 - \mathsf {q}_1^t)\,. \end{equation*}

Then, it is clear that ( $\mathcal {P}$ ) with weight matrices $\Lambda _i = I$ for $i = 1,2$ gives the model in [25, (5.7a)–(5.7c)]:

(𝒫_2,FR)

\begin{align} \mathrm {W}_{2,\mathrm {FR}}(\mathsf {G}_0, \mathsf {G}_1)^2 &= \frac {1}{2} \inf \big \{ \lVert G^\dagger q_0 \rVert ^2_{L^2_{\mathsf {G}(Q)}} + \lVert G^\dagger q_1 \rVert ^2_{L^2_{\mathsf {G}(Q)}} + \lVert G^\dagger R \rVert ^2_{L^2_{\mathsf {G}(Q)}}\,; \nonumber \\ \partial _t \mathsf {G} &= - \frac {1}{2}\mathrm {div} (\mathsf {q}_0 + \mathsf {q}_0^t) + \frac {1}{2}\nabla _L^* (\mathsf {q}_1 - \mathsf {q}_1^t) + \mathsf {R}^{\mathrm {sym}}\ \text {with}\ \mathsf {G}_t|_{t = 0} = \mathsf {G}_0\,,\ \mathsf {G}_t|_{t = 1} = \mathsf {G}_1 \big \}\,. \end{align}

We next relate our model ( $\mathcal {P}$ ) to the matrix-valued optimal ballistic transport problems in refs. [Reference Brenier15, Reference Vorotnikov91]. As reviewed in the introduction, Brenier [Reference Brenier15] recently attempted to find the weak solution of the incompressible Euler equation on the domain $[0,T] \times \Omega \subset \mathbb {R}^{1 + d}$ (we omit the initial and boundary conditions for simplicity):

(7.1)

\begin{equation} \partial _t v + \mathrm {div}\,(v \otimes v) + \nabla p = 0\,,\quad \mathrm {div}\, v = 0\,, \end{equation}

by minimising the kinetic energy $\int _0^T\int _\Omega |v(t,x)|^2 \, \mathrm {d} x\, \mathrm {d} t$ , where $v$ is a $\mathbb {R}^n$ -valued vector field and $p$ is a scalar function. It turns out that this problem admits a concave maximisation dual problem, to which the relaxed solution always exists under very light assumptions. Such an approach was extended by Vorotnikov [Reference Vorotnikov91] in an abstract functional analytic framework that includes a broad class of PDEs with quadratic nonlinearity as examples, such as the Hamilton–Jacobi equation, the template matching equation, and the multidimensional Camassa–Holm equation. More precisely, [Reference Vorotnikov91] considered the following abstract Euler equation on $[0,T] \times \Omega$ :

(7.2)

\begin{equation} \partial _t v = \mathsf {P} \circ \mathsf {L}\, (v \otimes v)\,,\quad v(0,\cdot ) = v_0 \in \mathsf {P}(L^2(\Omega, \mathbb {R}^n))\,, \end{equation}

where $\mathsf {P}$ is an orthogonal projection and $\mathsf {L}: L^2(\Omega, \mathbb {S}^n) \to L^2(\Omega, \mathbb {R}^n)$ is a (closed densely defined) linear operator. One can see that for $\mathsf {L} = - \mathrm {div}$ and $\mathsf {P}$ being the Leray projection, the problem (7.2) reduces to (7.1). The dual problem associated with the weak solution of (7.2) with minimal kinetic energy reads as follows:

(7.3)

\begin{align} \sup \Big \{\int _0^T\int _\Omega v_0 \cdot q - \frac {1}{2} q \cdot G^\dagger q \ \mathrm {d} x\, \mathrm {d} t\,;\ \partial _t G + 2 (\mathsf {L}^* \circ \mathsf {P})\,q = 0\ \text {with}\ G(T) = I \Big \}\,, \end{align}

where $G$ and $q$ are $\mathbb {S}^n_{+}$ -valued and $\mathbb {R}^n$ -valued vector fields, respectively. Note that the Hamilton–Jacobi equation $\partial _t \psi + \frac {1}{2} |\nabla \psi |^2 = 0$ can be reformulated as $\partial _t v + \frac {1}{2} \nabla Tr(v \otimes v) = 0$ by letting $v = \nabla \psi$ , which is a special case of (7.2) with $\mathsf {P} = I$ and $\mathsf {L} = - \frac {1}{2} \nabla Tr$ . The corresponding dual maximisation problem is given by

(7.4)

\begin{align} \sup \Big \{-\int _\Omega \psi _0 \rho _0 \, \mathrm {d} x - \frac {1}{2} \int _0^T\int _\Omega \rho ^\dagger |q|^2 \, \mathrm {d} x\, \mathrm {d} t\,;\ \partial _t \rho + \mathrm {div} \, q = 0\ \text {with}\ \rho (T) = 1 \Big \}\,, \end{align}

which closely relates to the ballistic transport problem [Reference Barton and Ghoussoub5]. In view of (7.3) and (7.4), one may regard

(7.5)

\begin{equation} \partial _t G + 2 (\mathsf {L}^* \circ \mathsf {P})\,q = 0 \end{equation}

as a matricial continuity equation, and our model (3.14) can be hence viewed as an unbalanced variant of (7.5). Then, the conservativity condition $\mathsf {D}^*(I) = 0$ for (7.5) is simply $\mathsf {P} \circ \mathsf {L}(I) = 0$ , which has been used to guarantee the existence of a measure-valued solution to (7.3); see [Reference Vorotnikov91, Theorem 4.6]. Thanks to the above observations, one may expect that each meaningful choice of $\mathsf {L}$ and $\mathsf {P}$ in [Reference Vorotnikov91, Section 6] can generate a reasonable distance ( $\mathcal {P}$ ) with $\mathsf {D} = 2 (\mathsf {L}^* \circ \mathsf {P})$ . For instance, setting $n = d$ , $\mathsf {P} = I$ and $\mathsf {L} = - \mathrm {div} - \frac {1}{2} \nabla Tr$ in (7.2) gives the template matching equation $\partial _t v + \mathrm {div}\, (v \otimes v) + \frac {1}{2} \nabla |v|^2 = 0$ and a distance ( $\mathcal {P}$ ) with $\mathsf {D} = 2 (\mathsf {L}^* \circ \mathsf {P})$ :

(7.6)

\begin{equation} \inf \big \{\mathcal {J}_{\Lambda, Q}\mathsf {(G,q,R)} \,;\ \partial _t \mathsf {G} + 2 \nabla _s \mathsf {q} + \mathrm {div} \mathsf {q} I = \mathsf {R}_t^{\mathrm {sym}} \ \text {with}\ \mathsf {G}_t|_{t = 0} = \mathsf {G}_0\,,\, \mathsf {G}_t|_{t = 1} = \mathsf {G}_1 \big \}\,. \end{equation}

Remark 7.1. An important question is how to compare these matrix-valued OT models ( $\mathcal {P}_{\mathrm {WB}}$ ), ( $\mathcal {P}_{2,\mathrm {FR}}$ ), and (7.6) (as well as others in the literature), which requires a deeper theoretical analysis and is completely open, to the best of our knowledge.

8. Concluding remarks

We have proposed a general class of unbalanced matrix-valued OT distances $\mathrm {WB}_{\Lambda }(\cdot, \cdot )$ over the space $\mathcal {M}(\Omega, \mathbb {S}^n_+)$ , called the weighted Wasserstein–Bures metric. The definition relies on a dynamic formulation and convex analysis. We have shown that $\mathcal {M}(\Omega, \mathbb {S}^n_+)$ equipped with the metric $\mathrm {WB}_{\Lambda }(\cdot, \cdot )$ is a complete geodesic space, and it can be viewed as a metric cone. In the follow-up work [Reference Li and Zou63], we have considered the convergence of the discrete approximation of the transport model ( $\mathcal {P}$ ). Our results provide a unified framework for unbalanced transport distances on matrix-valued measures and directly apply to various existing models such as the Kantorovich–Bures distance ( $\mathcal {P}_{\mathrm {WB}}$ ), the matricial interpolation distance ( $\mathcal {P}_{2,\mathrm {FR}}$ ) and the WFR one ( $\mathcal {P}_{\mathrm {WFR}}$ ). Meanwhile, it paves the way for practical applications, in particular, diffusion tensor imaging as in refs. [Reference Chen, Haber, Yamamoto, Georgiou and Tannenbaum26, Reference Peyré, Chizat, Vialard and Solomon77, Reference Ryu, Chen, Li and Osher86].

Acknowledgements

The authors would like to thank the anonymous referees and editors for their careful reading and constructive comments and suggestions, which have helped us improve this work.

Financial Support

The work of Bowen Li is supported in part by National Key R&D Program of China (project 2024YFA1016000). Jun Zou was substantially supported by Hong Kong RGC General Research Fund (projects 14308322 and 14306921) and NSFC/Hong Kong RGC Joint Research Scheme 2022/23 (project N_CUHK465/22).

Competing interests

The authors declare none.

Appendix A: Auxiliary proofs

Proof of Lemma 4.1. For $\mu \in \mathcal {M}(\mathcal {X},\mathbb {X})$ , by definition, we have $ \iota _{C(\mathcal {X},\mathcal {O}_\Lambda )}^*(\mu ) = \sup \{\langle \mu, \Xi \rangle _{\mathcal {X}}\,;\, \Xi \in C(\mathcal {X},\mathcal {O}_\Lambda ) \}\,.$ To show that the admissible set $C(\mathcal {X},\mathcal {O}_\Lambda )$ can be relaxed to $L^\infty _{|\mu |} (\mathcal {X},\mathcal {O}_\Lambda )$ , it suffices to prove

(A.1)

\begin{equation} \sup _{ \Xi \in L_{|\mu |}^\infty (\mathcal {X},\mathcal {O}_\Lambda )}\langle \mu, \Xi \rangle _{\mathcal {X}} \le \sup _{ \Xi \in C(\mathcal {X},\mathcal {O}_\Lambda )}\langle \mu, \Xi \rangle _{\mathcal {X}}\,. \end{equation}

For this, we consider an essentially bounded measurable field $\Xi \in L_{|\mu |}^\infty (\mathcal {X},\mathcal {O}_\Lambda )$ . Without loss of generality, we assume that it is bounded by $\lVert \Xi \rVert _\infty$ everywhere. By Lusin’s theorem, for any $\varepsilon \gt 0$ , there exists a continuous field with compact support $\widetilde {\Xi }$ such that

(A.2)

\begin{equation} |\mu |(\{x \in \mathcal {X}\,; \ \Xi (x) \neq \widetilde {\Xi }(x) \}) \le \varepsilon \,. \end{equation}

Define $\mathbb {P}_{\mathcal {O}_\Lambda }$ as the $L^2$ -projection from $\mathbb {X}$ to the closed convex set $\mathcal {O}_\Lambda$ . By abuse of notation, we still denote by $\widetilde {\Xi }$ the composite function $\mathbb {P}_{\mathcal {O}_\Lambda } \circ \widetilde {\Xi } \in C(\mathcal {X},\mathcal {O}_\Lambda )$ . It is clear that $\lVert \widetilde {\Xi } \rVert _\infty \le \lVert \Xi \rVert _\infty$ , and (A.2) still holds. Then it follows that $ | \langle \mu, \Xi \rangle _{\mathcal {X}} - \langle \mu, \widetilde {\Xi } \rangle _{\mathcal {X}}| \le 2 \varepsilon \lVert \Xi \rVert _\infty \,,$ which further implies

\begin{equation*} \langle \mu, \Xi \rangle _{\mathcal {X}} \le \langle \mu, \widetilde {\Xi }\rangle _{\mathcal {X}} + 2 \varepsilon \lVert \Xi \rVert _\infty \le \sup _{ \Xi \in C(\mathcal {X},\mathcal {O}_\Lambda )}\langle \mu, \Xi \rangle _{\mathcal {X}} + 2 \varepsilon \lVert \Xi \rVert _\infty \,. \end{equation*}

Since $\varepsilon$ is arbitrary, we have proved the claim (A.1). Thus, we can take the pointwise $\sup$ in (4.4) and obtain the desired $\iota _{C(\mathcal {X},\mathcal {O}_\Lambda )}^*(\mu ) = \mathcal {J}_{\Lambda, \mathcal {X}}(\mu )$ by Proposition3.1. Next, we characterise the subgradient $\partial \mathcal {J}_{\Lambda, \mathcal {X}}(\mu )$ . By Lemma2.4, we have $\Xi \in \partial \mathcal {J}_{\Lambda, \mathcal {X}}(\mu ) \bigcap C(\mathcal {X},\mathbb {X})$ if and only if $ \langle \mu, \Xi \rangle _{\mathcal {X}} = \iota _{C(\mathcal {X},\mathcal {O}_\Lambda )}(\Xi ) + \mathcal {J}_{\Lambda, \mathcal {X}}(\mu ) \,,$ which yields $\Xi \in C(\mathcal {X},\mathcal {O}_\Lambda )$ and

(A.3)

\begin{align} \int _{\mathcal {X}} \mu _\lambda \cdot \Xi - J_\Lambda (\mu _\lambda )\, \mathrm {d} \lambda = 0 \,, \end{align}

where $\lambda$ is a reference measure such that $|\mu | \ll \lambda$ and $\mu _\lambda$ is the density of $\mu$ . We note from $J_\Lambda = \iota ^*_{\mathcal {O}_\Lambda }$ and $\Xi (x) \in \mathcal {O}_\Lambda$ that $ \mu _\lambda \cdot \Xi - J_\Lambda (\mu _\lambda ) \le 0$ , $\lambda$ -a.e., where by (A.3), the equality actually holds $\lambda$ -a.e.. Then (4.5) follows.

Proof of Lemma 5.1. It suffices to consider $[a,b] = [0,1]$ . We denote by $\widetilde {\mathrm {WB}}_\Lambda$ the right-hand side of (5.1). By Hölder’s inequality and recalling ( $\mathcal {P}$ ) with the admissible set $\widetilde {\mathcal {CE}}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ , we have $\widetilde {\mathrm {WB}}_\Lambda \le \mathrm {WB}_\Lambda$ . For the other direction, we consider $\{\mu _t\}_{t \in [0,1]} \in \widetilde {\mathcal {CE}}([0,1];\,\mathsf {G}_0,\mathsf {G}_1)$ and reparameterize it by the $\varepsilon$ -arc length function $s = \mathsf {s}_\varepsilon (t)$ :

\begin{equation*} s = \mathsf {s}_\varepsilon (t) = \int _0^t \Big (\mathcal {J}_{\Lambda, \Omega }(\mu _\tau )^{1/2} + \varepsilon \Big )\, \mathrm {d} \tau \,:\, [0,1] \to [0, L(\mu _t) + \varepsilon ]\,, \end{equation*}

where $L(\mu _t)\,:\!=\, \int _0^1 \mathcal {J}_{\Lambda, \Omega }(\mu _\tau )^{1/2}\, \mathrm {d} \tau$ . It is clear that $\mathsf {s}_\varepsilon (t)$ is strictly increasing and absolutely continuous and has an absolutely continuous inverse. Then, by Lemma3.15 and writing $\widetilde {\mu }^\varepsilon _s = \mu _{\mathsf {s}_\varepsilon ^{-1}(s)}$ for short, we have

(A.4)

\begin{align} \mathrm {WB}^2_\Lambda (\mathsf {G}_0,\mathsf {G}_1) \le (L(\mu _t) + \varepsilon ) \int ^{L(\mu _t) + \varepsilon }_0 \mathcal {J}_{\Lambda, \Omega }(\widetilde {\mu }^\varepsilon _s)\, \mathrm {d} s = (L(\mu _t) + \varepsilon ) \int ^1_0 \frac {\mathcal {J}_{\Lambda, \Omega }(\mu _t)}{\mathcal {J}_{\Lambda, \Omega }(\mu _t)^{1/2} + \varepsilon }\, \mathrm {d} t\,, \end{align}

where the first inequality is by ( $\mathcal {P}^{\prime}$ ) with $[a,b] = [0, L(\mu _t) + \varepsilon ]$ . Letting $\varepsilon \to 0$ in (A.4), we can find $\mathrm {WB}_{\Lambda } \le \widetilde {\mathrm {WB}}_\Lambda$ . If we assume that $\mu$ minimises ( $\mathcal {P}$ ), we have

\begin{align*} \mathrm {WB}_\Lambda (\mathsf {G}_0,\mathsf {G}_1) = \Big (\int _0^1 \mathcal {J}_{\Lambda, \Omega }(\mu _t) \, \mathrm {d} t \Big )^{1/2} \le \int _0^1 \mathcal {J}_{\Lambda, \Omega }(\mu _t)^{1/2}\, \mathrm {d} t \,, \end{align*}

which implies that $\mathcal {J}_{\Lambda, \Omega }(\mu _t)$ is constant a.e.. Then (5.2) immediately follows.

References

Ambrosio, L., Gigli, N. & Savaré, G. (2008) Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics ETH Zürich, 2nd ed., Basel, Birkhäuser Verlag.Google Scholar

Ambrosio, L. & Tilli, P. (2004) Topics on Analysis in Metric Spaces. Oxford Lecture Series in Mathematics and its Applications, Vol. 25, Oxford, Oxford University Press.Google Scholar

Arjovsky, M., Chintala, S. & Bottou, L. (2017). Wasserstein generative adversarial networks. In International Conference on Machine Learning, PMLR, pp. 214–223.Google Scholar

Barbu, V. & Precupanu, T. (2012) Convexity and Optimization in Banach Spaces. Springer Monographs in Mathematics, 4th ed., Dordrecht, Springer.CrossRef Google Scholar

Barton, A. & Ghoussoub, N. (2019) Dynamic and stochastic propagation of the brenier optimal mass transport. Eur. J. Appl. Math. 30(6), 1264–1299.CrossRef Google Scholar

Bauschke, H. H. & Combettes, P. L. (2011) Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Vol. 408, Berlin, Springer.CrossRef Google Scholar

Benamou, J.-D. (2003) Numerical resolution of an “unbalanced” mass transport problem. ESAIM: Math. Modell. Numer. Anal. 37(5), 851–868.CrossRef Google Scholar

Benamou, J.-D. & Brenier, Y. (2000) A computational fluid mechanics solution to the monge-kantorovich mass transfer problem. Numer. Math. 84(3), 375–393.CrossRef Google Scholar

Bhatia, R. (1997) Matrix Analysis, Grad. Texts in Math, Vol. 169, New York, Springer-Verlag.CrossRef Google Scholar

Bhatia, R., Jain, T. & Lim, Y. (2019) On the bures–wasserstein distance between positive definite matrices. Expo. Math. 37(2), 165–191.CrossRef Google Scholar

Bonafini, M., Orlandi, G. & Oudet, É. (2018) Variational approximation of functionals defined on 1-dimensional connected sets: The planar case. SIAM J. Math. Anal. 50(6), 6307–6332.CrossRef Google Scholar

Bouchitté, G. (2020) Convex analysis and duality, arXiv preprint arXiv: 2004.09330.Google Scholar

Bouchitté, G. & Valadier, M. (1988) Integral representation of convex functionals on a space of measures. J. Funct. Anal. 80(2), 398–420.CrossRef Google Scholar

Brenier, Y. (1991) Polar factorization and monotone rearrangement of vector-valued functions. Commun. Pure Appl. Math. 44(4), 375–417.CrossRef Google Scholar

Brenier, Y. (2018) The initial value problem for the euler equations of incompressible fluids viewed as a concave maximization problem. Commun. Math. Phys. 364(2), 579–605.CrossRef Google Scholar

Brenier, Y. & Vorotnikov, D. (2020) On optimal transport of matrix-valued measures. SIAM J. Math. Anal. 52(3), 2849–2873.CrossRef Google Scholar

Brezis, H. (2010) Functional Analysis, Sobolev Spaces and Partial Differential Equations, Universitext, New York, NY, Springer. (English)Google Scholar

Bridson, M. R. & Haefliger, A. (2013) Metric Spaces of Non-Positive Curvature, Vol. 319, Berlin, Germay, Springer.Google Scholar

Burago, D., Burago, Y. & Ivanov, S. (2022) A Course in Metric Geometry, Volume 33 of Graduate Studies in Mathematics, Providence, RI, American Mathematical Society.Google Scholar

Caffarelli, L. A. & McCann, R. J. (2010) Free boundaries in optimal transport and monge-ampere obstacle problems. Ann. Math. 171(2), 673–730.CrossRef Google Scholar

Carlen, E. A. & Maas, J. (2014) An analog of the 2-wasserstein metric in non-commutative probability under which the fermionic fokker–planck equation is gradient flow for the entropy. Commun. Math. Phys. 331(3), 887–926.CrossRef Google Scholar

Carlen, E. A. & Maas, J. (2017) Gradient flow and entropy inequalities for quantum markov semigroups with detailed balance. J. Funct. Anal. 273(5), 1810–1869.CrossRef Google Scholar

Chen, Y., Gangbo, W., Georgiou, T. T. & Tannenbaum, A. (2020) On the matrix monge–kantorovich problem. Eur. J. Appl. Math. 31(4), 574–600.CrossRef Google Scholar

Chen, Y., Georgiou, T. T. & Tannenbaum, A. (2017) Matrix optimal mass transport: A quantum mechanical approach. IEEE Trans. Automat. Control 63(8), 2612–2619.CrossRef Google Scholar

Chen, Y., Georgiou, T. T. & Tannenbaum, A. (2019) Interpolation of matrices and matrix-valued densities: The unbalanced case. Eur. J. Appl. Math. 30(3), 458–480.CrossRef Google Scholar

Chen, Y., Haber, E., Yamamoto, K., Georgiou, T. T. & Tannenbaum, A. (2018) An efficient algorithm for matrix-valued and vector-valued optimal mass transport. J. Sci. Comput. 77(1), 79–100.CrossRef Google Scholar

Chizat, L., Peyré, G., Schmitzer, B. & Vialard, F.-X. (2018) An interpolating distance between optimal transport and fisher–rao metrics. Found. Comput. Math. 18(1), 1–44.CrossRef Google Scholar

Chizat, L., Peyré, G., Schmitzer, B. & Vialard, F.-X. (2018) Scaling algorithms for unbalanced optimal transport problems. Math. Comput. 87(314), 2563–2609.CrossRef Google Scholar

Chizat, L., Peyré, G., Schmitzer, B. & Vialard, F.-X. (2018) Unbalanced optimal transport: Dynamic and kantorovich formulations. J. Funct. Anal. 274(11), 3090–3123.CrossRef Google Scholar

Cole, S., Eckstein, M., Friedland, S. & Życzkowski, K. (2023) On quantum optimal transport. Math. Phys. Anal. Geomet. 26(2), 14.CrossRef Google Scholar

Datta, N. & Rouzé, C. (2020) Relating relative entropy, optimal transport and fisher information: A quantum hwi inequality. Ann. Henri Poincaré 21(7), 2115–2150.CrossRef Google Scholar

De Palma, G., Marvian, M., Trevisan, D. & Lloyd, S. (2021) The quantum wasserstein distance of order 1. IEEE Trans. Inform. Theory 67(10), 6627–6643.CrossRef Google Scholar

De Palma, G. & Trevisan, D. (2021). Quantum optimal transport with quantum channels. In Annales Henri Poincaré, Vol. 22, Springer, pp. 3199–3234.Google Scholar

Dolbeault, J., Nazaret, B. & Savaré, G. (2009) A new class of transport distances between measures. Calc. Var. Partial Differ. Eq. 34(2), 193–231.CrossRef Google Scholar

Duran, A. J. & Lopez-Rodriguez, P. (1997) The lpspace of a positive definite matrix of measures and density of matrix polynomials inl1. J. Approx. Theory 90(2), 299–318.CrossRef Google Scholar

Evans, L. C. & Gariepy, R. F. (2015) Measure Theory and Fine Properties of Functions, Textbooks in Mathematics, Boca Raton, FL, CRC Press.Google Scholar

Ferradans, S., Papadakis, N., Peyré, G. & Aujol, J.-F. (2014) Regularized discrete optimal transport. SIAM J. Imaging Sci. 7(3), 1853–1882.CrossRef Google Scholar

Figalli, A. (2010) The optimal partial transport problem. Arch. Ration. Mech. Anal. 195(2), 533–560.CrossRef Google Scholar

Figalli, A. & Gigli, N. (2010) A new transportation distance between non-negative measures, with applications to gradients flows with dirichlet boundary conditions. J. math. pures appl. 94(2), 107–130.CrossRef Google Scholar

Flaherty, F. & do Carmo, M. (2013). Riemannian Geometry. Mathematics: Theory & Applications, Birkhäuser Boston.Google Scholar

Fleissner, F. C. (2021) A minimizing movement approach to a class of scalar reaction–diffusion equations. ESAIM: Control Optim. Calc. Var. 27, 18.Google Scholar

Gerald, B. (1999) Folland Real Analysis: Modern Techniques and Their Applications (second edition), Pure and Applied Mathematics (New York), New York, John Wiley & Sons, Inc.Google Scholar

Frogner, C., Zhang, C., Mobahi, H., Araya, M. & Poggio, T. A. (2015) Learning with a wasserstein loss. Adv. Neur. Inf. Process. Syst. 28, 2053–2061.Google Scholar

Gallouët, T. O. & Monsaingeon, L. (2017) A jko splitting scheme for kantorovich–fisher–rao gradient flows. SIAM J. Math. Anal. 49(2), 1100–1130.CrossRef Google Scholar

Golse, F., Mouhot, C. & Paul, T. (2016) On the mean field and classical limits of quantum mechanics. Commun. Math. Phys. 343(1), 165–205.CrossRef Google Scholar

Golse, F. & Paul, T. (2017) The schrödinger equation in the mean-field and semiclassical regime. Arch. Ration. Mech. Anal. 223(1), 57–94.CrossRef Google Scholar

Golse, F. & Paul, T. (2018) Wave packets and the quadratic monge–kantorovich distance in quantum mechanics. C. R. Math. 356(2), 177–197.CrossRef Google Scholar

Gross, L. (1975) Hypercontractivity and logarithmic sobolev inequalities for the clifford-dirichlet form. Duke Math. J. 42(3), 383–396.CrossRef Google Scholar

Guittet, K. (2002) Extended kantorovich norms: A tool for optimization. Technical Report 4402.Google Scholar

Hanin, L. G. (1999) An extension of the kantorovich norm. Contemp. Math. 226, 113–130.CrossRef Google Scholar

Jordan, R., Kinderlehrer, D. & Otto, F. (1998) The variational formulation of the fokker–planck equation. SIAM J. Math. Anal. 29(1), 1–17.CrossRef Google Scholar

Kantorovich, L. V. (1942). On the translocation of masses. In: Dokl. Aauk. USSR (NS), Vol. 37, pp. 199–201.Google Scholar

Kantorovich, L. V. & Rubinshtein, G. S. (1957) On a functional space and certain extremum problems. Dokl. Akad. Nauk. 115(6), 1058–1061.Google Scholar

Kantorovich, L. V. & Rubinshtein, S. (1958) On a space of totally additive functions. Vestn. St. Petersburg Univ. Math. 13(7), 52–59.Google Scholar

Kastoryano, M. J. & Temme, K. (2013) Quantum logarithmic sobolev inequalities and rapid mixing. J. Math. Phys. 54(5), 052202.CrossRef Google Scholar

Kondratyev, S., Monsaingeon, L., Vorotnikov, D., et al. (2016) A new optimal transport distance on the space of finite radon measures. Adv. Differ. Eq. 11(12), 1117–1164.Google Scholar

Kondratyev, S. & Vorotnikov, D. (2019) Spherical hellinger–kantorovich gradient flows. SIAM J. Math. Anal. 51(3), 2053–2084.CrossRef Google Scholar

Kondratyev, S. & Vorotnikov, D. (2020) Convex sobolev inequalities related to unbalanced optimal transport. J. Differ. Eq. 268(7), 3705–3724.CrossRef Google Scholar

Kondratyev, S. & Vorotnikov, D. (2020) Nonlinear fokker-planck equations with reaction as gradient flows of the free energy. J. Funct. Anal. 278(2), 108310.CrossRef Google Scholar

Laschos, V. & Mielke, A. (2019) Geometric properties of cones with applications on the hellinger–kantorovich space, and a new distance on the space of probability measures. J. Funct. Anal. 276(11), 3529–3576.CrossRef Google Scholar

Bihan, D. Le (2014) Diffusion mri: What water tells us about the brain. EMBO Mol. Med. 6(5), 569–573.CrossRef Google Scholar PubMed

Li, B. & Lu, J. (2023) Interpolation between modified logarithmic sobolev and poincare inequalities for quantum markovian dynamics. J. Stat. Phys. 190(10).CrossRef Google Scholar

Li, B. & Zou, J. (2024) On the convergence of discrete dynamic unbalanced transport models. ESAIM: Math. Modell. Numer. Anal. 58(3), 957–992.CrossRef Google Scholar

Liero, M., Mielke, A. & Savaré, G. (2016) Optimal transport in competition with reaction: The hellinger–kantorovich distance and geodesic curves. SIAM J. Math. Anal. 48(4), 2869–2911.CrossRef Google Scholar

Liero, M., Mielke, A. & Savaré, G. (2018) Optimal entropy-transport problems and a new hellinger–kantorovich distance between positive measures. Invent. Math. 211(3), 969–1117.CrossRef Google Scholar

Lombardi, D. & Maitre, E. (2015) Eulerian models and algorithms for unbalanced optimal transport. ESAIM: Math. Modell. Numer. Anal. 49(6), 1717–1744.CrossRef Google Scholar

Lombardini, L. & Rossi, F. (2022) Obstructions to extension of wasserstein distances for variable masses. Proc. Am. Math. Soc. 150(11), 4879–4890.CrossRef Google Scholar

Lott, J. & Villani, C. (2009) Ricci curvature for metric-measure spaces via optimal transport. Ann. Math. 169(3), 903–991.CrossRef Google Scholar

Maas, J., Rumpf, M., Schönlieb, C. & Simon, S. (2015) A generalized model for optimal transport of images including dissipation and density modulation. ESAIM: Math. Modell. Numer. Anal. 49(6), 1745–1769.CrossRef Google Scholar

Marchese, A., Massaccesi, A. & Tione, R. (2019) A multimaterial transport problem and its convex relaxation via rectifiable g-currents. SIAM J. Math. Anal. 51(3), 1965–1998.CrossRef Google Scholar

McCann, R. J. (1997) A convexity principle for interacting gases. Adv. Math. 128(1), 153–179.CrossRef Google Scholar

Monge, G. (1781). Mémoire sur la théorie des déblais et des remblais. In: Histoire de l’Académie Royale des Sciences de Paris.Google Scholar

Monsaingeon, L. & Vorotnikov, D. (2021) The schrödinger problem on the non-commutative fisher-rao space. Calc. Var. Partial Differ. Eq. 60(1), 14.CrossRef Google Scholar

Olkiewicz, R. & Zegarlinski, B. (1999) Hypercontractivity in noncommutative

$l_p$ spaces. J. Funct. Anal. 161(1), 246–285.CrossRef Google Scholar

Otto, F. (2001) The geometry of dissipative evolution equations: The porous medium equation. Commun. Part. Differ. Eq. 26(1-2), 101–174.CrossRef Google Scholar

Otto, F. & Villani, C. (2000) Generalization of an inequality by talagrand and links with the logarithmic sobolev inequality. J. Funct. Anal. 173(2), 361–400.CrossRef Google Scholar

Peyré, G., Chizat, L., Vialard, F.-X. & Solomon, J. (2019) Quantum entropic regularization of matrix-valued optimal transport. Eur. J. Appl. Math. 30(6), 1079–1102.CrossRef Google Scholar

Piccoli, B. & Rossi, F. (2014) Generalized wasserstein distance and its application to transport equations with source. Arch. Ration. Mech. Anal. 211(1), 335–358.CrossRef Google Scholar

Piccoli, B. & Rossi, F. (2016) On properties of the generalized wasserstein distance. Arch. Ration. Mech. Anal. 222(3), 1339–1365.CrossRef Google Scholar

Powers, R. T. & Størmer, E. (1970) Free states of the canonical anticommutation relations. Commun. Math. Phys. 16(1), 1–33.CrossRef Google Scholar

Reid, W. T. (1970) Some elementary properties of proper values and proper vectors of matrix functions. SIAM J. Appl. Math. 18(2), 259–266.CrossRef Google Scholar

Robertson, J. B., Rosenberg, M., et al. (1968) The decomposition of matrix-valued measures. Mich. Math. J. 15(3), 353–368.CrossRef Google Scholar

Rockafellar, R. (1971) Integrals which are convex functionals. ii. Pac. J. Math. 39(2), 439–469.CrossRef Google Scholar

Rouzé, C. & Datta, N. (2019) Concentration of quantum states from quantum functional and transportation cost inequalities. J. Math. Phys. 60(1), 012202.CrossRef Google Scholar

Rudin, W. (2006) Real and Complex Analysis, New York, McGraw-Hill.Google Scholar

Ryu, E. K., Chen, Y., Li, W. & Osher, S. (2018) Vector and matrix optimal mass transport: Theory, algorithm, and applications. SIAM J. Sci. Comput. 40(5), A3675–A3698.CrossRef Google Scholar

Santambrogio, F. (2015) Optimal transport for applied mathematicians. Birkäuser, NY 55(58-63), 94.Google Scholar

Sturm, K.-T. (2006) On the geometry of metric measure spaces I and II. Acta Math. 196(1), 65–177.CrossRef Google Scholar

Villani, C. (2003). Topics in optimal transportation. In: Number 58 in Graduate Studies in Mathematics Number, American Mathematical Soc.Google Scholar

Villani, C. (2009) Optimal Transport: Old and New, Volume 338 of Grundlehren der Mathematischen Wissenschaften, Berlin, Springer-Verlag.CrossRef Google Scholar

Vorotnikov, D. (2022) Partial differential equations with quadratic nonlinearities viewed as matrix-valued optimal ballistic transport problems. Arch. Ration. Mech. Anal. 243(3), 1653–1698.CrossRef Google Scholar

Wandell, B. A. (2016) Clarifying human white matter. Ann. Rev. Neurosci. 39(1), 103–128.CrossRef Google Scholar PubMed

Wirth, M. & Zhang, H. (2022) Curvature-dimension conditions for symmetric quantum markov semigroups. Ann. Henri Poincaré 24(3), 1–34.Google Scholar PubMed

Article contents

On a general matrix-valued unbalanced optimal transport problem

Abstract

Keywords

MSC classification

1. Introduction

1.1. Classical optimal transport

1.2. Unbalanced optimal transport

1.3. Noncommutative optimal transport

1.4. Contribution

1.5. Layout

2. Preliminaries and notation

2.1. Notation and convention

2.2. Preliminaries

3. Definition and basic properties

3.1. Action functional

3.2. Continuity equation

3.3. Weighted Wasserstein–Bures distance

3.4. A priori estimate

3.5. Time and space scaling

3.6. Compactness

4. Properties of weighted Wasserstein–Bures metrics

4.1. Existence of minimiser and optimality condition

4.2. Primal-dual formulations

4.3. Varying weight matrices

5. Geometric properties and Riemannian interpretation

6. Cone space and spherical distance

7. Example and discussion

8. Concluding remarks

Acknowledgements

Financial Support

Competing interests

Appendix A: Auxiliary proofs

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests