1 Introduction and main results
Let $G = \langle S \rangle $ denote the rank-r free group with generating set $S = \{s_{1}, \ldots , s_{r}\}$ and identity e, and let $(X,\mu ,T)$ be a measure-preserving G-system, that is, T is a homomorphism from G to the automorphism group of the standard probability space $(X,\mu )$ . We will not need to make explicit use of the $\sigma $ -algebra on X, so we leave it unnamed.
An observable on X is a measurable map with domain X. In this paper the codomain will be a finite set endowed with the discrete sigma algebra; in this case we call the map a finite observable and the codomain an alphabet.
Any observable $\alpha {\colon}\! X \to \mathtt {A}$ induces a map $\alpha ^{G} {\colon}\! X \to \mathtt {A}^{G}$ by setting
We call the $\mathtt {A}$ -coloring $\alpha ^{G}(x)$ of G the itinerary of x, since it records the observations that will be made over the entire orbit of x under the action of G. We also similarly define the map $\alpha ^{H} {\colon}\! X \to \mathtt {A}^{H}$ for any subset H of G. We abbreviate $\alpha ^{n} := \alpha ^{\mathrm {B}(e,n)}$ , where $\mathrm {B}(e,n)$ is the closed ball of radius n centered at the identity in G, which is endowed with the word-length metric. If $\beta {\colon}\! X \to \mathtt {B}$ is a second finite observable, we denote by $\alpha \beta {\colon}\! X \to \mathtt {A} \times \mathtt {B}$ the map $\alpha \beta (x) = (\alpha (x), \beta (x))$ .
The (Shannon) entropy of a finite observable $\alpha {\colon}\! X \to \mathtt {A}$ is defined by
where $\alpha _{*} \mu \in \operatorname {\mathrm {Prob}}(\mathtt {A})$ is the pushforward measure, with the convention $0 \log 0 = 0$ . The entropy of $\alpha $ can be interpreted as the expected amount of information revealed by observing $\alpha $ , assuming its distribution $\alpha _{*} \mu $ is known.
An early application of Shannon’s entropy to ergodic theory was its use by Kolmogorov and Sinai to show that there exist non-isomorphic Bernoulli shifts over $\mathbb {Z}$ . A Bernoulli shift over $\mathbb {Z}$ is a system of the form $(\mathtt {A}^{\mathbb {Z}}, \mu ^{\mathbb {Z}}, S)$ for some alphabet $\mathtt {A}$ and $\mu \in \operatorname {\mathrm {Prob}}(\mathtt {A})$ ; S is the shift action of $\mathbb {Z}$ . They did this by defining an entropy rate for $\mathbb {Z}$ -systems, which can be interpreted as the average information per unit time revealed by observing the system. For a Bernoulli shift $(\mathtt {A}^{\mathbb {Z}}, \mu ^{\mathbb {Z}}, S)$ , the entropy rate is simply the ‘base entropy’ $\mathrm {H}_{\mu }(\alpha )$ , where $\alpha {\colon}\! \mathtt {A}^{n} \to \mathtt {A}$ is the ‘time zero’ observable.
Isomorphism invariance of the Kolmogorov–Sinai entropy rate is typically proven using the fact that entropy rate is non-increasing under factor maps (which are surjective homomorphisms of measure-preserving systems). This fact can be interpreted as stating that a system cannot simulate another system that is ‘more random’.
The entropy rate was soon generalized to systems acted on by an arbitrary amenable group (such as $\mathbb {Z}^{d}$ ). Extending beyond amenable groups proved more difficult, and in fact it was found to be impossible for such an extension to preserve all desirable properties of the Kolmogorov–Sinai entropy rate. In particular, an entropy rate for non-amenable group actions which assigns Bernoulli shifts their base entropy cannot be non-increasing under factor maps [Reference Ornstein and Weiss13, Appendix C].
The first invariant to distinguish between Bernoulli shifts over free groups is Lewis Bowen’s f-invariant. Following [Reference Bowen2], this can be defined by
The main theorem of [Reference Bowen3] is that $f_{\mu }(T, \alpha )$ depends on the observable $\alpha $ only through the $\sigma $ -algebra it generates. In particular, the common value of $f_{\mu } (T, \alpha )$ among all $\alpha $ which generate the $\sigma $ -algebra of the measurable space X (assuming such $\alpha $ exist) is a measure-conjugacy invariant of the system $(X, \mu , T)$ . In the same paper, Bowen showed that the f-invariant of a Bernoulli shift is the Shannon entropy of the base measure; in particular, Bernoulli shifts with different base entropies are non-isomorphic.
In [Reference Bowen2], Bowen gave an alternate formula for the f-invariant, which we now introduce.
For any homomorphism $\sigma {\colon}\! G \to \operatorname {\mathrm {Sym}}(n)$ we have a G-system $([n], \operatorname {\mathrm {Unif}}(n), \sigma )$ , and we can consider a labeling $\mathbf {x} \in \mathtt {A}^{n}$ as an $\mathtt {A}$ -valued observable on this system. We denote the law of its itinerary by $P^{\sigma }_{\mathbf {x}} = \mathbf {x}^{G}_{*}\operatorname {\mathrm {Unif}}(n)$ and call this the empirical distribution of $\mathbf {x}$ . We say that $\mathbf {x}$ is a good model for $\alpha $ over $\sigma $ if it is difficult to distinguish the G-systems $(X,\mu , T)$ and $([n], \operatorname {\mathrm {Unif}}(n), \sigma )$ via their respective observables $\alpha $ and $\mathbf {x}$ . To make this precise, we denote
which is a set of good models for $\alpha $ over $\sigma $ if $\mathcal {O}$ is a weak $^{*}$ -open neighborhood of $\alpha ^{G}_{*}\mu \in \operatorname {\mathrm {Prob}}(\mathtt {A}^{G})$ ; the particular set $\mathcal {O}$ quantifies how good the models are. The alphabet $\mathtt {A}$ is given the discrete topology and $\mathtt {A}^{G}$ the product topology, so ‘weak $^{*}$ -close’ means marginals on some finite sets are close in total variation norm.
For each $n \in \mathbb {N}$ , let $\mathtt {s}_{n} = \operatorname {\mathrm {Unif}}(\operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n)))$ . Bowen showed in [Reference Bowen2] that the f-invariant is given by
To make an analogy with statistical physics, we can think of $\alpha ^{G}_{*} \mu $ as a macroscopic statistical distribution of the state of a system; then the f-invariant is the exponential growth rate of the expected number of ‘microstates’ that are consistent with these statistics. What we here call good models are often called microstates for this reason.
More generally, given any random or deterministic sofic approximation $\Sigma = \{\mathtt {s}_{n}\}_{n=1}^{\infty }$ , we can define the sofic entropy relative to $\Sigma $ by
Here each $\mathtt {s}_{n}$ is a probability measure on the set of functions $G \to \operatorname {\mathrm {Sym}}(n)$ which is supported on functions which are approximately free homomorphisms.
This paper is motivated by a desire to better understand the dependence of sofic entropy on the sofic approximation $\Sigma $ . For any choice of $\Sigma $ , the sofic entropy agrees with Kolmogorov–Sinai entropy if the acting group is amenable [Reference Bowen6] and with the Shannon entropy of the base if the system is a Bernoulli shift [Reference Bowen4]. For some systems, the sofic entropy can be finite relative to some sofic approximations and $-\infty $ relative to others. It is unknown whether two deterministic sofic approximations can yield different finite entropy values for the same system.
In this paper, we express the entropy relative to a type of stochastic block model in terms of the relative f-invariant, which we now introduce.
If $\alpha ,\beta $ are two finite observables with codomains $\mathtt {A},\mathtt {B}$ , the conditional entropy is
This can be interpreted as the expected amount of information revealed by observing $\alpha $ if both the value of $\beta $ and the joint distribution of $\alpha $ and $\beta $ are known. The relative f-invariant is defined by
Both the infimum and supremum can be replaced by limits; this follows from Lemma 3.2 below. It follows from Corollary 3.5 that we could also directly define
as long as $f_{\mu }(T, \beta )> -\infty $ .
We now define the relevant type of stochastic block model. If H is a finite subset of G, we denote by $d^{H}(\mu , \nu )$ the total variation distance between the marginals of $\mu $ and $\nu $ on $\mathtt {A}^{H}$ . Our convention for the total variation distance between measures $\mu , \nu \in \operatorname {\mathrm {Prob}}(\mathtt {A})$ is
For each $k \in \mathbb {N}$ we define a pseudometric on $\operatorname {\mathrm {Prob}}(\mathtt {A}^{G})$ by
Note that $\{d_{k}^{*}\}_{k \in \mathbb {N}}$ together generate the weak $^{*}$ topology on $\operatorname {\mathrm {Prob}}(\mathtt {A}^{G})$ . These generalize the function $d_{\sigma }^{*}$ from [Reference Bowen2], which corresponds to the case $k=0$ . For $\mathcal {O} = \{\nu \in \operatorname {\mathrm {Prob}}(\mathtt {A}^{G}) : d^{*}_{k}(\alpha ^{G}_{*} \mu , \nu ) < \varepsilon \}$ we write
Our stochastic block model is now defined as follows: given $\mathbf {y}_{0} \in \mathtt {B}^{n}$ , $\sigma _{0} \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ , and $k \in \mathbb {N}$ , let
The labeling $\mathbf {y}_{0}$ partitions the elements of $[n]$ into $\lvert \mathtt {B} \rvert $ communities, and we can think of the random homomorphism $\sigma $ as a random choice of directed edges between and within the communities. Certain statistics of these random edge choices are determined by the reference homomorphism $\sigma _{0}$ ; note that for $k>0$ these statistics are more precise than those specified by a standard stochastic block model. In §2 we define weights, which are the objects used to record the relevant statistics.
1.1 Main results
Our main theorems show that the relative f-invariant can be interpreted as the growth rate of the expected number of ways to extend a planted good model for $\beta $ to a good model for $\alpha \beta $ , over a stochastic block model which has statistics determined by $\beta $ and its planted model.
We first prove that if $\beta ^{G}_{*}\mu $ is Markov then we can use a stochastic block model which only takes into account ‘one-step statistics’.
Theorem A. Let $\alpha {\colon}\! X \to \mathtt {A}$ and $\beta {\colon}\! X \to \mathtt {B}$ be finite observables, and for each n let $\mathbf {y}_{n} \in \mathtt {B}^{n}$ and $\sigma _{n} \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ be such that
Suppose that $\beta ^{G}_{*} \mu $ is a Markov measure. With $\mathtt {s}_{n} = \mathtt {SBM}(\sigma _{n},\mathbf {y}_{n}, 0)$ , we have
Proposition A. The assumptions of Theorem A are non-vacuous; that is, for any finite observable $\beta {\colon}\! X \to \mathtt {B}$ there exist sequences $\{\mathbf {y}_{n} \in \mathtt {B}^{n}\}_{n=1}^{\infty }$ and $\{ \sigma _{n} \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n)) \}_{n=1}^{\infty }$ such that $\lim _{n \to \infty } d_{0}^{*}(P_{\mathbf {y}_{n}}^{\sigma _{n}}, \beta ^{G}_{*}\mu ) = 0$ .
This follows from the fact that free group actions are ‘sofic’, which is proven for example in [Reference Dykema, Kerr and Pichot10, Reference Păunescu14, Reference Popa15]. A more elementary proof is given in §4 below.
If $\beta ^{G}_{*} \mu $ is not Markov, then the same formula holds with a more precise type of stochastic block model.
Theorem B. Let $\alpha {\colon}\! X \to \mathtt {A}$ and $\beta {\colon}\! X \to \mathtt {B}$ be finite observables. Let $m_{n}$ approach infinity as n goes to infinity while satisfying $m_{n} = o(\log \log n)$ . For each n, let $\mathbf {y}_{n} \in \mathtt {B}^{n}$ and $\sigma _{n} \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ be such that
Suppose that $f_{\mu }(T, \beta )> -\infty $ . With $\mathtt {s}_{n} = \mathtt {SBM}(\sigma _{n}, \mathbf {y}_{n}, m_{n})$ ,
Proposition B. The assumptions of Theorem B are non-vacuous; that is, for any finite observable $\beta {\colon}\! X \to \mathtt {B}$ and any sequence $\{m_{n} \in \mathbb {N}\}_{n=1}^{\infty }$ approaching infinity while satisfying $m_{n} = o(\log \log n)$ , there exist sequences $\{\mathbf {y}_{n} \in \mathtt {B}^{n}\}_{n=1}^{\infty }$ and $\{ \sigma _{n} \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n)) \}_{n=1}^{\infty }$ such that $\lim _{n \to \infty } d_{m_{n}}^{*}(P_{\mathbf {y}_{n}}^{\sigma _{n}}, \beta ^{G}_{*}\mu ) = O ( {1}/{\log n} )$ .
Using Theorem B, we prove the following formula for the growth rate of the expected number of good models over a stochastic block model. This can be compared to the variational principle in [Reference Kerr and Li12], and has a similar proof.
Theorem C. Let $\mathtt {s}_{n}, \alpha , \beta $ be as in the statement of Theorem B. Then
Here $\mathsf {J}(\alpha _{*}^{G} \mu ,\, \beta _{*}^{G} \mu )$ is the set of joinings of the G-systems $(\mathtt {A}^{G}, \alpha ^{G}_{*}\mu , S)$ and $(\mathtt {B}^{G}, \beta ^{G}_{*}\mu , S)$ , that is, shift-invariant probability measures on $(\mathtt {A} \times \mathtt {B})^{G}$ whose $\mathtt {A}^{G}, \mathtt {B}^{G}$ marginals are $\alpha _{*}^{G} \mu ,\, \beta _{*}^{G} \mu $ , respectively. S denotes the shift action of G. We use $\mathtt {a}, \mathtt {b}$ to denote the maps
and
which observe the $\mathtt {A}$ (respectively, $\mathtt {B}$ ) label at the identity.
Note that the supremum is always greater than or equal to $f_{\mu } (T, \alpha )$ , with equality attained by the product joining; this means that the expected number of good models for $\alpha $ over a block model with built-in good models for any $\beta $ is at least the expected number of good models over a uniformly random homomorphism. It is possible for the supremum to be strictly larger, however. For example, suppose $f_{\mu } (T, \alpha ) < 0$ and $\alpha = \beta $ , and let $\lambda $ be the diagonal joining. Then
1.2 Related work
The expressions appearing on the right-hand sides of Theorems A and B are very closely related to Ben Hayes’ definition of ‘relative sofic entropy in the presence’ [Reference Hayes11, Definition 2.5]. Some differences are that we consider expected numbers of good models over random sofic approximations, and that Hayes takes a supremum inside the logarithm over which good model is to be extended, while we fix a sequence $\{\mathbf {y}_{n}\}$ of planted good models. Hayes also does not restrict to shift systems as we do here.
In [Reference Coja-Oghlan, Hahn-Klimroth, Loick, Müller, Panagiotou, Pasch, Bläser and Monmege8], the free energy (that is, the limit of normalized log partition functions) over a type of stochastic block model is shown to satisfy a variational principle; see Propositions 3.6 and 3.7 of that paper.
1.3 Random sofic approximations
As noted above, the f-invariant is closely related to another invariant of measure-preserving systems called sofic entropy, which was introduced by Bowen in [Reference Bowen4].
A homomorphism $\sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ is called $(D,\delta )$ -sofic for some finite $D \subset G$ and $\delta> 0$ if
A sequence of homomorphisms $\Sigma = (\sigma _{n} \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n)) )_{n \in \mathbb {N}}$ is called a sofic approximation if, for every $(D,\delta )$ , the homomorphism $\sigma _{n}$ is $(D,\delta )$ -sofic for all large enough n.
The sofic entropy relative to $\Sigma $ is the exponential growth rate of the number of good models over $\sigma _{n}$ . Specifically, for any finite observable $\alpha $ on X we have
This is an isomorphism invariant of the system $(X, \mu , T)$ if $\alpha $ is any generating observable, that is if the $\sigma $ -algebra of the measurable space X is the coarsest one which is shift-invariant and $\alpha $ -measurable.
By analogy with this expression, we might call the sequences of random homomorphisms appearing in expressions above ‘random sofic approximations’. The following proposition provides further justification for this terminology.
Proposition 1.1. If $(\mathtt {s}_{n})$ is any of the sequences appearing in Theorems A, B, and C, then for any $(D,\delta )$ there exists $\varepsilon>0$ such that
for all large enough n.
In particular, if $\sigma _{1} \sim \mathtt {s}_{1}$ , $\sigma _{2} \sim \mathtt {s}_{2}$ etc. are independent then $(\sigma _{n})$ is a sofic approximation with probability 1.
1.4 Organization
In §2 we define weights and discuss some of their useful properties. In §3 we prove a few basic results about the functions f and F. Some of the results of these two sections are used in §4 to show that the assumptions of the main theorems are not vacuous. In §5 we show how the function F is related to the number of homomorphism-labeling pairs $(\sigma , \mathbf {y})$ that realize a given weight, which is the main ingredient of the proofs of Theorems A and B given in the next two sections. In §8 we show how to deduce Theorem C from Theorem B. Section 9 contains a proof of Proposition 1.1. The final section contains a proof of Lemma 2.3, which asserts that a weight can be approximated by a denominator-n weight with a specified marginal.
2 Weights
If $\alpha {\colon}\! X \to \mathtt {A}$ is a finite observable, for $a,a^{\prime } \in \mathtt {A}$ and $i \in [r]$ let
and also denote
For $\mathbf {x} \in \mathtt {A}^{n}$ and $\sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ let
and $W_{\sigma , \mathbf {x}}(a) = P^{\sigma , \{e\}}_{\mathbf {x}} (a)$ . This could equivalently be defined as a special case of the previous construction, with $\sigma $ specifying an action on $X = [n]$ with an observable $\mathbf {x} {\colon}\! [n] \to \mathtt {A}$ .
More abstractly, any $W \in ( \operatorname {\mathrm {Prob}}(\mathtt {A}^{2}) )^{r}$ is called an $\mathtt {A}$ -weight if
for all $i,j \in [r]$ and $a \in \mathtt {A}$ . For each $a \in \mathtt {A}$ we denote this common value by $W(a)$ . Note that the objects $W_{\alpha }$ and $W_{\sigma , \mathbf {x}}$ defined above satisfy this condition.
We say that W has denominator n if $n \cdot W(a,a^{\prime };i) \in \mathbb {N}$ for all $a,a^{\prime },i$ .
The measures $W(\cdot ,\cdot; i)$ for $i \in [r]$ are called the edge measures of W, and $W(\cdot )$ is called the vertex measure.
For any alphabet $\mathtt {A}$ , we use the metric on $\mathtt {A}$ -weights defined by
We can use weights to count good models up to equivalence under the pseudometrics $d^{*}_{k}$ using the following proposition.
Proposition 2.1. If $\sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ and $\mathbf {x} \in \mathtt {A}^{n}$ , then for any observable $\alpha {\colon}\! X \to ~\mathtt {A}$ ,
Note this implies also that
Proof. By definition of the distance between weights,
For many ‘incompatible’ pairs $\mathbf {a},\mathbf {a}^{\prime }$ , both terms will be zero: suppose $g \in \mathrm {B}(e,k) \cap \mathrm {B}(s_{i},k)$ , so that $g s_{i}^{-1} \in \mathrm {B}(e,k)$ . If the second term in the absolute value is non-zero, then for some $x \in X$ we have $\alpha ^{k}(x) = \mathbf {a}$ and $\alpha ^{k}(T_{s_{i}}x) = \mathbf {a}^{\prime }$ , and therefore
The same argument shows that $\mathbf {a}^{\prime }_{g s_{i}^{-1}} = \mathbf {a}_{g}$ for all $g \in \mathrm {B}(e,k) \cap \mathrm {B}(s_{i},k)$ whenever the first term is non-zero. Therefore we can restrict the sum to pairs $\mathbf {a}, \mathbf {a}^{\prime }$ with $\mathbf {a}^{\prime }_{g s_{i}^{-1}} = \mathbf {a}_{g}$ for all $g \in \mathrm {B}(e,k) \cap \mathrm {B}(s_{i},k)$ . Equivalently, we can sum over all $\mathbf {A} \in \mathtt {A}^{\mathrm {B}(e,k) \cup \mathrm {B}(s_{i},k)}$ to get
It will be useful to consider the pushforward map induced by a map between alphabets: if $\pi {\colon}\! \mathtt {A} \to \mathtt {B}$ is a measurable map and W is an $\mathtt {A}$ -weight, then $\pi W$ is the $\mathtt {B}$ -weight given by
Note that this implies that the vertex measure of W is
For example, let $\pi _{\mathtt {B}} {\colon}\! \mathtt {A} \times \mathtt {B} \to \mathtt {B}$ be the projection map. If W is an $\mathtt {A} \times \mathtt {B}$ -weight then $\pi _{\mathtt {B}} W$ is given by
We call this the $\mathtt {B}$ -marginal of W.
All weights in the present paper will be over alphabets of the form $\mathtt {A}^{\mathrm {B}(e,k)} \times \mathtt {B}^{\mathrm {B}(e,k^{\prime })}$ . We use this fact to introduce some simplified notation for projections.
-
• $\pi _{A}$ denotes projection onto the entire $\mathtt {A}$ factor $\mathtt {A}^{\mathrm {B}(e,k)}$ ; $\pi _{B}$ is used similarly.
-
• For $m<k$ and $m^{\prime }<k^{\prime }$ , $\pi _{m,m^{\prime }}$ denotes projection onto $\mathtt {A}^{\mathrm {B}(e,m)} \times \mathtt {B}^{\mathrm {B}(e,m^{\prime })}$ .
-
• $\pi _{m}$ denotes the projection $\mathtt {A}^{\mathrm {B}(e,k)} \to \mathtt {A}^{\mathrm {B}(e,m)}$ , except that if $m=0$ we write $\pi _{e}$ .
We define $F(W)$ for an abstract weight W by
where H is the Shannon entropy. Note that this is consistent with the above definitions in that, for example,
We can revisit the definition of our version of the stochastic block model using weights. Let $H \subset G$ and let W be a denominator- $n\ \mathtt {B}^{\mathrm {B}(e,k)}$ -weight. Suppose there exist $\mathbf {y} \in \mathtt {B}^{n}$ and $\sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ such that $W = W_{\sigma , \mathbf {y}^{k}}$ . Then
so we can also denote this distribution by $\mathtt {SBM}(\mathbf {y}, W)$ . Specifying the distribution by a weight rather than a specific homomorphism will occasionally be more convenient.
2.1 Constructing weights and good models
We borrow the first result of this type from [Reference Bowen2]; it allows us to find a denominator-n approximation to a given weight.
Lemma 2.2. (Lemma 2.3 of [Reference Bowen2])
There is a constant C such that for any $\mathtt {A}$ -weight W there is a denominator- $n\ \mathtt {A}$ -weight within distance $C \lvert \mathtt {A} \rvert ^{2} r/n$ of W.
The following lemma allows us not only to construct a denominator-n approximation to a given weight, but also to specify a marginal of this approximation:
Lemma 2.3. Let W be an $\mathtt {A} \times \mathtt {B}$ -weight. If $W_{\mathtt {B}}$ is a $\mathtt {B}$ -weight of denominator n with $d(W_{\mathtt {B}}, \pi _{\mathtt {B}} W) < \delta $ then there is an $\mathtt {A} \times \mathtt {B}$ -weight $W_{\mathtt {A}\mathtt {B}}$ with denominator n such that $\pi _{\mathtt {B}} W_{\mathtt {A}\mathtt {B}} = W_{\mathtt {B}}$ and $d(W_{\mathtt {A}\mathtt {B}}, W) < 265r ( \lvert \mathtt {A} \times \mathtt {B} \rvert ^{2} / n + \delta )$ .
The construction is fairly involved, so it is postponed to §10. The constant 265 is not intended to be optimal.
The definition of a weight $W_{\sigma , \mathbf {x}^{k}}$ in terms of a homomorphism $\sigma $ and a labeling $\mathbf {x}$ is straightforward. However, we will also need to know whether a given weight can be realized in this way. The next two results address this inverse problem.
Proposition 2.4. If W is a denominator- $n\ \mathtt {A}$ -weight, then there exist $\mathbf {x} \in \mathtt {A}^{n}$ and $\sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ such that $W = W_{\sigma , \mathbf {x}}$ .
Proof. This is implied by Proposition 2.1 of [Reference Bowen2].
Unfortunately, this does not imply that for every denominator- $n\ \mathtt {A}^{\mathrm {B}(e,k)}$ -weight W there is some $\sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ and $\mathbf {x} \in \mathtt {A}^{n}$ such that $W = W_{\sigma , \mathbf {x}^{k}}$ ; instead it provides $\mathbf {X} \in (\mathtt {A}^{\mathrm {B}(e,k)})^{n}$ such that $W = W_{\sigma , \mathbf {X}}$ .
However, if we already know that W is close to a weight of the form $W_{\alpha ^{k}}$ for some observable $\alpha $ , then the following proposition shows that W is also close to a weight of the form $W_{\sigma , \mathbf {x}^{k}}$ .
Proposition 2.5. Let $\alpha {\colon}\! X \to \mathtt {A}$ , $\sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ , and $\mathbf {X} \in (\mathtt {A}^{\mathrm {B}(e,k)})^{n}$ be such that $d(W_{\sigma , \mathbf {X}}, W_{\alpha ^{k}}) \leq \varepsilon $ for some $\varepsilon \geq 0$ . Writing $\mathbf {x} = \pi _{e} \mathbf {X} \in \mathtt {A}^{n}$ , we have
An immediate consequence is that $\mathbf {X} \in \Omega _{0}^{*}(\sigma , \alpha ^{k}, \varepsilon )$ implies $\pi _{e} \mathbf {X} \in \Omega _{k}^{*}(\sigma , \alpha , c \varepsilon )$ where $c = 1 + 2r \lvert \mathrm {B}(e,k) \rvert $ ; cf. Claim 2 in the proof of Proposition 3.2 of [Reference Bowen2].
Proof. Claim 4 in the proof of Proposition 3.2 of [Reference Bowen2] implies that
It follows that for any $i \in [r]$ ,
so
The following corollary of the first part of the proof will be useful later. It says that if the weight $W_{\sigma , \mathbf {X}}$ generated by some $\mathbf {X} \in (\mathtt {A}^{\mathrm {B}(e,k)})^{n}$ and $\sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ is exactly attainable in some sense, then $\mathbf {X}$ can be exactly recovered from $\sigma $ and the projection $\pi _{e} \mathbf {X} \in ~\mathtt {A}^{n}$ .
Corollary 2.6. Suppose that $\sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ and $\mathbf {X} \in (\mathtt {A}^{\mathrm {B}(e,k)})^{n}$ are such that either
-
(1) $W_{\sigma , \mathbf {X}} = W_{\alpha ^{k}}$ for some $\alpha {\colon}\! X \to \mathtt {A}$ , or
-
(2) $W_{\sigma , \mathbf {X}} = W_{\sigma _{0}, \mathbf {x}_{0}^{k}}$ for some $\sigma _{0} \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(m))$ and $\mathbf {x}_{0} \in \mathtt {A}^{m}$ .
Then $(\pi _{e} \mathbf {X})^{k} = \mathbf {X}$ .
Note that $(\pi _{e} \mathbf {X})^{k}$ is the k-neighborhood labeling generated from $\pi _{e} \mathbf {X}$ using $\sigma $ , rather than $\sigma _{0}$ or some other homomorphism.
Proof. In the first case, we are in the setting of the previous proposition with $\varepsilon =0$ , so the first inequality of its proof gives the claimed result.
The second case is actually the same; this is only obscured somewhat by the notation. We are in the setting of the previous proposition with the space $X = [m]$ having a G-action specified by $\sigma _{0}$ and a finite observable $\mathbf {x}_{0} {\colon}\! [m] \to \mathtt {A}$ .
3 Properties of F and f
Lemma 3.1. (Continuity as weight function)
If $W_{1}, W_{2}$ are $\mathtt {A}$ -weights with $d(W_{1}, W_{2}) \leq \varepsilon \leq 1$ then
where $\mathrm {H}(p)$ denotes the entropy of the probability measure $(p, 1-p) \in \operatorname {\mathrm {Prob}}(\{0,1\})$ .
Proof. We use Fano’s inequality in the following form (equation (2.139) of [Reference Cover and Thomas9]). Suppose $X,Y$ are $\mathtt {A}$ -valued random variables defined on the same probability space and let $p_{e} = \mathbb {P}(X \ne Y)$ be their probability of disagreement. Then
Using the chain rule and non-negativity of Shannon entropy, we can deduce that
Let $\mu _{1}, \mu _{2} \in \operatorname {\mathrm {Prob}}(\mathtt {A})$ be the respective distributions of $X_{1},X_{2}$ . Because $\| \mu _{1} - \mu _{2} \|_{\operatorname {\mathrm {TV}}}$ is the minimum value of $\mathbb {P}(X_{1} \ne X_{2})$ over all possible couplings, if $\| \mu _{1} - \mu _{2} \|_{\operatorname {\mathrm {TV}}} < \varepsilon $ then
The assumed bound $d(W_{1}, W_{2}) \leq \varepsilon $ implies that each vertex and edge measure of $W_{1}$ is within total variation distance $\varepsilon $ of its counterpart in $W_{2}$ , so
Let $\alpha {\colon}\! X \to \mathtt {A}$ and $\beta {\colon}\! X \to \mathtt {B}$ be observables. We say that $\beta $ is a coarsening of $\alpha $ if each part of the partition of X induced by $\beta $ is a union of parts of the partition induced by $\alpha $ (up to null sets). Equivalently, there is some function $g {\colon}\! \mathtt {A} \to \mathtt {B}$ such that $\beta = g \circ \alpha $ almost surely. In this situation we can also call $\alpha $ a refinement of $\beta $ .
A useful property of the Shannon entropy $\mathrm {H}_{\mu }(\alpha )$ is monotonicity under refinement. The function F does not share this property, but it is monotone under the following particular kind of refinement introduced in [Reference Bowen3].
We say that $\beta $ is a simple splitting of $\alpha $ if there is some $s \in \{s_{1}^{\pm 1}, \ldots , s_{r}^{\pm 1}\}$ and a coarsening $\tilde {\alpha }$ of $\alpha $ such that, up to null sets, the partition induced by $\beta $ is the coarsest common refinement of the partitions induced by $\alpha $ and $\tilde {\alpha } \circ T_{s}$ .
We say that $\beta $ is a splitting of $\alpha $ if there are observables $\alpha = \beta _{0}, \beta _{1}, \ldots , \beta _{n} = \beta $ such that $\beta _{i}$ is a simple splitting of $\beta _{i-1}$ for $i = 1, 2, \ldots , n$ . We will use the following monotonicity properties of the relative version of F.
Lemma 3.2. (Monotonicity under splitting)
-
(1) If $\alpha _{1}$ is a splitting of $\alpha _{2}$ then $F(\alpha _{1} | \beta ) \leq F(\alpha _{2} | \beta )$ .
-
(2) If $\beta _{1}$ is a splitting of $\beta _{2}$ then $F(\alpha | \beta _{1}) \geq F(\alpha | \beta _{2})$ .
Proof. (1) This is essentially Proposition 5.1 of [Reference Bowen3]; conditioning on $\beta $ makes no difference to the proof.
(2) The proof is based on the proof of part (1), but in place of the chain rule for conditional entropy we use the following bound:
We will also use the following consequence of the previous bound:
It suffices to check the case where $\beta _{1}$ is a simple splitting of $\beta _{2}$ . Let $t \in \{s_{1}^{\pm 1}, \ldots , s_{r}^{\pm 1} \}$ and let $\tilde {\beta }$ be a coarsening of $\beta _{2}$ such that the partition induced by $\beta _{1}$ is the same as the coarsest common refinement of the partitions induced by $\beta _{2}$ and $\tilde {\beta } \circ T_{t}$ up to null sets. Then, using the two bounds just derived,
But
so we can remove the t term from the sum to get
One corollary is the following convenient formula.
Corollary 3.3. Let $\alpha , \beta $ be finite observables such that $\beta ^{G}_{*}\mu $ is a Markov measure. Then $F_{\mu }(T, \alpha ^{k_{1}} \mid \beta ^{k_{2}})$ is independent of $k_{2}$ . In particular,
Proof. By the previous proposition, for any $k \leq k_{2}$ we have
On the other hand, by Theorem 6.1 of [Reference Bowen5] $F_{\mu }(T, \beta ^{k}) = F_{\mu } (T, \beta ^{k_{2}})$ so
Applying monotonicity under splitting to the first term on the right gives
This establishes independence of $k_{2}$ ; the formula for f follows.
Proposition 3.4. Let $\alpha , \beta $ be finite observables. Then for any $k \in \mathbb {N}$ ,
It follows that
Proof. By Lemma 3.2, $F_{\mu }(T, \alpha ^{k} \mid \beta ) \leq F_{\mu }(T, \alpha \mid \beta )$ . Using elementary properties of Shannon entropy, we have
By T-invariance of $\mu $ we have
so the first inequality follows.
For any $k_{1}, k_{2} \in \mathbb {N}$ this gives
so the second inequality follows upon taking the supremum over $k_{2}$ then the infimum over $k_{1}$ .
We can use this bound to give a proof of the chain rule for the relative f-invariant, a version of which first appeared in [Reference Bowen5] (there it is called the Abramov–Rokhlin formula; see also [Reference Bowen and Gutman7]).
Corollary 3.5. (Chain rule)
Proof. By definition of the relative version of F and the chain rule for conditional entropy, for each $k_{1}, k_{2}$ we have
By Lemma 3.2 each term is monotone in $k_{2}$ , so the limits as $k_{2} \to \infty $ exist. By Proposition 3.4 all terms are bounded above (recall we only consider finite observables, so in particular all observables have finite entropy), so we can split the limit across the sum on the right to get
Taking $k_{1}$ to infinity gives the result.
4 Non-vacuity of main theorems
4.1 Theorem A
Here we prove Proposition A, which asserts the non-vacuity of Theorem A. Given $\beta {\colon}\! X \to \mathtt {B}$ , we need to show that there exist $\mathbf {y}_{n} \in \mathtt {B}^{n}$ and $\sigma _{n} \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ such that $\lim _{n \to \infty } d_{0}^{*}(P_{\mathbf {y}_{n}}^{\sigma _{n}}, \beta ^{G}_{*} \mu ) = 0$ .
By Lemma 2.2, there is a sequence $\{W_{n}\}_{n=1}^{\infty }$ of $\mathtt {B}$ -weights such that $W_{n}$ has denominator n for each n and $d(W_{n}, W_{\beta }) = o(1)$ . By Proposition 2.4, for each n we can pick $\mathbf {y}_{n},\sigma _{n}$ such that $W_{\sigma _{n}, \mathbf {y}_{n}} = W_{n}$ . Since $d_{0}^{*}(P_{\mathbf {y}_{n}}^{\sigma _{n}}, \beta ^{G}_{*} \mu ) = d(W_{\sigma _{n}, \mathbf {y}_{n}}, W_{\beta })$ , these suffice.
4.2 Theorems B and C
Here we prove Proposition B, which asserts the non-vacuity of Theorem B (and by extension Theorem C, since the assumptions are the same).
Let $m_{n}$ approach infinity as n approaches infinity while satisfying $m_{n} = o(\log \log n)$ and let $\beta {\colon}\! X \to \mathtt {B}$ be a finite observable. We need to show that there exist $\mathbf {y}_{n} \in \mathtt {B}^{n}$ and $\sigma _{n} \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ such that $d_{m_{n}}^{*}(P_{\mathbf {y}_{n}}^{\sigma _{n}}, \beta ^{G}_{*} \mu ) = O({1}/{\log n})$ .
By Lemma 2.2, there is a sequence $\{W_{n}\}_{n=1}^{\infty }$ of weights such that $W_{n}$ is a denominator- $n\ \mathtt {B}^{\mathrm {B}(e,m_{n})}$ -weight for each n and $d(W_{n}, W_{\beta ^{m_{n}}}) = O({\lvert \mathtt {B}^{\mathrm {B}(e,m_{n})} \rvert ^{2}}/{n})$ . By Proposition 2.4, for each n we can pick $\mathbf {Y}_{n},\sigma _{n}$ such that $W_{\sigma _{n}, \mathbf {Y}_{n}} = W_{n}$ . Let $\mathbf {y}_{n} = \pi _{e} \mathbf {Y}_{n}$ . By Proposition 2.5,
5 Counting lemmas
For a $\mathtt {B}$ -weight W, let $Z_{n}(W)$ denote the number of pairs $(\sigma , \mathbf {y}) \in \operatorname {\mathrm {Hom}}(G,\operatorname {\mathrm {Sym}}(n)) \times \mathtt {B}^{n}$ such that $W_{\sigma ,\mathbf {y}} = W$ .
Proposition 5.1. If W is a $\mathtt {B}$ -weight with denominator n then
Proof. We write
where $\mathbb {E}_{\sigma }$ denotes the expectation over a uniform choice of $\sigma \in \operatorname {\mathrm {Hom}}(G,\operatorname {\mathrm {Sym}}(n))$ .
Proposition 2.1 of [Reference Bowen2] states that
Lemma 2.2 of the same paper gives an estimate of this quantity, but for our purposes we need to be more careful about how the estimate depends on the size of the alphabet.
We use the version of Stirling’s approximation given by
valid for $k \geq 1$ . To estimate the products that appear in the expectation, we will need to omit all factors which equal $0! = 1$ since Stirling’s approximation is not valid for these. To do this carefully, let
and for each $i \in [r]$ let
For the numerator of the above expectation we get
and a lower bound which is identical except missing the first factor. For the denominator, let $S = \sum _{i \in [r]}\lvert \mathtt {B}^{\prime }_{i} \rvert $ . We get
and again we have a lower bound which is identical except missing the first factor $3^{S}$ . Therefore the quotient is bounded above by
and below by
Since W has denominator n, we have
and
Therefore $Z_{n}(W)$ satisfies
Since $S \leq r \lvert \mathtt {B} \rvert ^{2}$ and $\lvert \mathtt {B}^{\prime } \rvert \leq \lvert \mathtt {B} \rvert $ , we conclude that
and the stated inequality follows.
The following proposition establishes the connection between the relative version of F and expected numbers of good models over stochastic block models.
Proposition 5.2. Given any denominator- $n\ (\mathtt {A} \times \mathtt {B}^{\mathrm {B}(e,k)})$ -weight $W_{\mathtt {A}\mathtt {B}}$ , let $W_{\mathtt {B}}$ denote the $\mathtt {B}^{\mathrm {B}(e,k)}$ -weight $\pi _{\mathtt {B}} W_{\mathtt {A}\mathtt {B}}$ . Let $\mathbf {y} \in \mathtt {B}^{n}$ be a fixed labeling with $p_{\mathbf {y}} = \pi _{e} W_{\mathtt {B}}(\cdot )$ , and let
assuming $W_{\mathtt {B}}$ is such that the desired support is non-empty. Then
In particular,
Lemma 5.3. Let $W_{\mathtt {A}\mathtt {B}}$ be an $\mathtt {A} \times \mathtt {B}^{\mathrm {B}(e,k)}$ -weight of denominator n. Then
Proof. Suppose $\lvert \{ (\sigma , \mathbf {x}, \mathbf {y}) : W_{\sigma , (\mathbf {x}, \mathbf {y}^{k})} = W_{\mathtt {A}\mathtt {B}}\} \rvert \ne 0$ ; we then need to show
The inequality $\leq $ is clear, since we have an injection $(\sigma , \mathbf {x}, \mathbf {y}) \mapsto (\sigma , \mathbf {x}, \mathbf {y}^{k})$ .
The converse inequality holds because $(\sigma , \mathbf {x}, \mathbf {Y}) \mapsto (\sigma , \mathbf {x}, \pi _{e} \mathbf {Y})$ in an injection from the set on the right to the set on the left. This follows from Corollary 2.6.
Proof of Proposition 5.2
Let
and let $\tilde {\mu }_{2}$ be its marginal on the ‘ ${\tilde {\mathbf y}}$ ’-coordinate. This marginal is supported on $\{ {\tilde {\mathbf y}} : p_{{\tilde {\mathbf y}}} = \pi _{e} W_{\mathtt {B}}(\cdot ) \}$ . Note that $\tilde \mu $ conditioned on any particular ${\tilde {\mathbf y}}$ in the support of $\tilde {\mu }_{2}$ is $\mathtt {SBM}({\tilde {\mathbf y}}, W_{\mathtt {B}})$ , and that
is the same for each ${\tilde {\mathbf y}}$ in the support of $\tilde {\mu }_{2}$ , with one choice being $\mathbf {y}$ from the proposition statement. This is because any two choices have the same label frequencies and hence are related by a permutation of $[n]$ . With the choice ${\tilde {\mathbf y}} = \mathbf {y}$ the expectation is $\mathcal {E}$ , so
Note that our assumption that the intended support of $\mu $ is non-empty allows us to rule out the ‘0’ case in the application of Lemma 5.3.
The rest of the result then follows from our estimates on $Z_{n}$ in Proposition 5.1.
6 Proof of Theorem A
6.1 Upper bound
Note that we will not rely on the Markov assumption for the upper bound.
For each $k \in \mathbb {N}$ ,
Write
and assume that n is large enough that $m_{n} \geq k$ .
Writing $\mathcal {W}_{n}(\alpha \beta ,k, \varepsilon )$ for the set of all denominator-n weights W with $d(W, W_{(\alpha \beta )^{k}}) < \varepsilon $ ,
since if $W_{\sigma , \mathbf {y}_{n}^{k}} \ne \pi _{\mathtt {B}} W$ then $W_{\sigma , (\mathbf {X}, \mathbf {y}_{n}^{k})} \ne W$ . But $\mathtt {s}_{n}$ conditioned on $\{W_{\sigma , \mathbf {y}_{n}^{k}} = \pi _{\mathtt {B}} W\}$ is $\mathtt {SBM}(\mathbf {y}_{n}, \pi _{\mathtt {B}} W)$ , so we can bound the expectation above using Proposition 5.2, getting
Note $(9n)^{r\lvert \mathtt {B}^{\mathrm {B}(e,k)} \rvert ^{2}(\lvert \mathtt {A}^{\mathrm {B}(e,k)} \rvert +1)} \leq e^{o_{n \to \infty }(n)}$ . Fix $\delta> 0$ . By continuity of F (Lemma 3.1), for all small enough $\varepsilon $ (possibly depending on k), we have
Bounding each probability by 1, we get
But
so this implies
for any $k_{2} \geq k$ , by monotonicity under splitting. Taking the limit as $k_{2} \to \infty $ followed by the infimum over $\varepsilon $ (which takes $\delta $ to 0) and k gives
Since
for every k, this completes the upper bound.
6.2 Lower bound
Fix $k \in \mathbb {N}$ . To estimate
we bound below using the expected size of
This is not a true lower bound but, by equation (7.1) below, there are constants $C,d,c$ independent of n such that
The ‘error’ factor has an exponential growth rate which vanishes as $\varepsilon \to 0$ , so will not be a problem.
We now find a lower bound for the expectation of $\lvert \mathcal {X}_{k} \rvert $ . Applying Proposition 5.2 as above, we have
For any $\delta>0$ , for small enough $\varepsilon>0$ (independent of n), by continuity of F this is at least
We give a lower bound for the sum by first rewriting it as
Fix $\eta> 0$ . By Lemma 2.3, for all large enough n the $\mathtt {B}$ -weight $W_{\sigma _{n}, \mathbf {y}_{n}}$ can be extended to a $\mathtt {B}^{\mathrm {B}(e,k)}$ -weight $W_{\mathtt {B}}$ with $d(W_{\mathtt {B}}, W_{\beta ^{k}}) \leq \eta $ ; to apply the lemma we can think of the extended weight $W_{\mathtt {B}}$ as having alphabet $\mathtt {B}^{\mathrm {B}(e,k) \setminus \{e\}} \times \mathtt {B}$ , and recall that we assume $\lim _{n \to \infty } d(W_{\sigma _{n}, \mathbf {y}_{n}}, W_{\beta }) = 0$ . Choose $\sigma , \mathbf {Y}$ such that $W_{\sigma , \mathbf {Y}} = W_{\mathtt {B}}$ . Since $\pi _{e} W_{\mathtt {B}} = W_{\sigma _{n}, \mathbf {y}_{n}}$ , it must be that $\pi _{e} \mathbf {Y}$ is a permutation of $\mathbf {y}_{n}$ : they must assign labels with the same frequencies since
Therefore we can choose $\sigma , \mathbf {Y}$ such that $\pi _{e} \mathbf {Y} = \mathbf {y}_{n}$ .
Let $\widetilde {W_{\mathtt {B}}} = W_{\sigma , \mathbf {y}_{n}^{k}} = W_{\sigma , (\pi _{e}\mathbf {Y})^{k}}$ . By Proposition 2.5,
So, as long as $\eta $ is small enough and n is large enough (depending on $\varepsilon ,k$ ), by Lemma 2.3,
Now consider the probability appearing in the $\widetilde {W_{\mathtt {B}}}$ term:
By symmetry in the choice of $\mathbf {y}$ with the correct letter frequencies (any two $\mathbf {y}$ with the same $p_{\mathbf {y}}$ are related by a permutation of $[n]$ , so have the same number of $\sigma $ which give a particular weight), we can write this as
By continuity of F, we then get
for all large enough n and small enough $\eta $ (again depending on $k,\varepsilon $ ), with $\delta> 0$ the same as chosen above. Since $\beta ^{G}_{*} \mu $ is a Markov chain, $F_{\mu }(\beta ^{k}) = F_{\mu }(\beta )$ [Reference Bowen5, Theorem 6.1].
Putting this all together, for any $k \in \mathbb {N}$ , for all $\delta>0$ , we have
for all large enough n and small enough $\varepsilon>0$ .
It follows that for any $k \in \mathbb {N}$ ,
Taking the limit as $k \to \infty $ gives the desired bound, using Corollary 3.3 and that the family of pseudometrics $\{d^{*}_{k} : k \in \mathbb {N}\}$ generates the weak $^{*}$ topology.
7 Proof of Theorem B
Let $W_{n} = W_{\sigma _{n}, \mathbf {y}_{n}^{m_{n}}}$ , so that
Note that, by definition of $\mathtt {s}_{n}$ ,
Lemma 7.1. With $W_{n}$ as just defined in terms of $m_{n}$ , $\sigma _{n}$ , and $\mathbf {y}_{n}$ , we have
Proof. The assumption in the theorem statement that $d_{m_{n}}^{*}(P_{\mathbf {y}_{n}}^{\sigma _{n}}, \beta ^{G}_{*}\mu ) = O ( {1}/{\log n} )$ implies the existence of a constant C such that
By Lemma 3.1 we have
using that $m_{n} = o(\log \log n)$ . Since $m_{n}$ approaches infinity as n goes to infinity we have $f_{\mu } (T, \beta ) = \lim _{n \to \infty } F(W_{\beta ^{m_{n}}})$ , so the result follows.
Lemma 7.2. If $m_{n} = o(\log \log n)$ , then for any $k> 0$ and $\varepsilon> 0$ we have $\lvert \mathtt {B}^{\mathrm {B}(e,m_{n})} \rvert ^{k} = o(n^{\varepsilon })$ .
Proof. This is certainly true if $\lvert \mathtt {B} \rvert = 1$ ; assume therefore that $\lvert \mathtt {B} \rvert \geq 2$ .
Our assumption $m_{n} = o(\log \log n)$ guarantees that
for all large enough n. Therefore
This inequality can be rearranged to give
Since $\varepsilon>0$ is arbitrary, the result follows.
In the remainder of this section we prove Theorem B by first proving the right-hand side is an upper bound for the left, then proving it is also lower bound.
7.1 Upper bound
Just as in the proof of the upper bound in Theorem A, for each $k \in \mathbb {N}$ and $\varepsilon>0$ we have
where
We assume that n is large enough that $m_{n} \geq k$ .
Since $\mathtt {s}_{n}$ is $\mathtt {SBM}(\sigma _{n}\kern-0.1pt,\kern-0.1pt \mathbf {y}_{n}\kern-0.1pt,\kern-0.2pt m_{n})$ rather than $\mathtt {SBM}(\sigma _{n}\kern-0.1pt,\kern-0.1pt \mathbf {y}_{n}\kern-0.1pt,\kern-0.2pt k)$ , we cannot apply Proposition 5.2 directly to this expression. We get around this as follows. Let
All elements of this set are denominator- $n\ \mathtt {A}^{\mathrm {B}(e,m)} \times \mathtt {B}^{\mathrm {B}(e,m^{\prime })}$ -weights; we avoid the question of exactly which weights are in this set, but call such weights attainable. For $k \leq m$ and $k^{\prime } \leq m^{\prime }$ let
denote the set of such weights whose appropriate marginal is within $\varepsilon $ of the $(\mathtt {A}^{\mathrm {B}(e,k)} \times \mathtt {B}^{\mathrm {B}(e,k^{\prime })})$ -weight $W_{\alpha ^{k} {} \beta ^{k^{\prime }}}$ . For now we take $m=k=k^{\prime }$ , but we will need more generality below. Then
so we can apply Proposition 5.2 to get
By Lemma 7.2 we have $(9n)^{r\lvert \mathtt {B}^{\mathrm {B}(e,m_{n})} \rvert ^{2}(\lvert \mathtt {A}^{\mathrm {B}(e,k)} \rvert +1)} \leq e^{o_{n \to \infty }(n)}$ . Using this and Lemma 7.1, we have
where the little o is uniform over all terms in the sum. Here we use the assumption that $f_{\mu }(T, \beta )$ is finite.
By definition of $\mathcal {W}_{n}(k, m_{n})$ , for any $W \in \mathcal {W}_{n}(k, m_{n}; \alpha {}\beta , k,k; \varepsilon )$ we can pick $\sigma \in \operatorname {\mathrm {Hom}}(G,\operatorname {\mathrm {Sym}}(n))$ , $\mathbf {X} \in (\mathtt {A}^{\mathrm {B}(e,k)})^{n}$ , and $\mathbf {y} \in \mathtt {B}^{n}$ so that $W = W_{\sigma , (\mathbf {X}, \mathbf {y}^{m_{n}})}$ . Then since $\mathbf {X} {} \mathbf {y}^{m_{n}}$ is a splitting of $\mathbf {X} {} \mathbf {y}^{k}$ , by Lemma 3.2 we have
where here $F_{\operatorname {\mathrm {Unif}}([n])}(\sigma , \mathbf {X} {} \mathbf {y}^{m_{n}})$ is F of the observable $\mathbf {X} {} \mathbf {y}^{m_{n}}$ on the measure-preserving system $([n], \operatorname {\mathrm {Unif}}([n]), \sigma )$ (we shift to this notation from weights in order to apply the splitting lemma). By continuity of F, for all small enough $\varepsilon $ (depending on k) we have
Along with the above, this implies that
Bounding all terms in the sum by 1, we get
Using Lemma 7.2, we have
so this implies
Taking the infimum over $\varepsilon $ and k, and using the chain rule for f (Corollary 3.5, again using the assumption that $f_{\mu }(T, \beta )$ is finite), gives
Since
for every k, this completes the upper bound.
7.2 Lower bound
In this section we denote
(note the dependence on n is implicitly specified by $\sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ and $\mathbf {y} \in \mathtt {B}^{n})$ , and with $\Sigma = \{\mathtt {s}_{n}\}_{n=1}^{\infty }$ ,
The following two claims are used to relate the sizes of the sets defined above.
Claim 1. Let $k \leq \min (k_{1},k_{2})$ . For any $\sigma ,\mathbf {y}$ , we have
where $c = 1+\lvert \mathrm {B}(e,k) \rvert $ .
Proof. If $(\mathbf {X}, \mathbf {y}^{k_{2}}) \in \Omega _{0}^{*}(\sigma , \alpha ^{k_{1}} {} \beta ^{k_{2}}, \varepsilon )$ , then
this follows from the fact that total variation distance is non-increasing under pushforwards. Applying Proposition 2.5, we get
Claim 2. Fix $\sigma , \mathbf {y}$ , and $k \leq \min (k_{1}, k_{2})$ . As established in the previous claim, we can consider $\pi _{e}$ as a map from $ \mathcal {X}_{k_{1}, k_{2}} (\sigma , \alpha {}\beta , \varepsilon \mid \mathbf {y})$ to $\Omega _{k}^{*}(\sigma , \alpha {}\beta , c\varepsilon \mid \mathbf {y})$ . There are constants $C,d$ independent of n such that $\pi _{e}$ is at most $C \exp (nd\varepsilon + n \mathrm {H}(2 \lvert \mathrm {B}(e,k) \rvert \varepsilon ) )$ -to-one.
Proof. If $\Omega _{k}^{*}(\sigma , \alpha {}\beta , c\varepsilon \mid \mathbf {y})$ is empty, then the claim is vacuously true. Otherwise, fix $\mathbf {x} \in \Omega _{k}^{*}(\sigma , \alpha {}\beta , c\varepsilon \mid \mathbf {y})$ . If $\mathbf {X} \in \pi _{e}^{-1}\{\mathbf {x}\}$ , then $\pi _{e} (\mathbf {X}, \mathbf {y}^{k}) = (\mathbf {x}, \mathbf {y})$ . Claim 3 in the proof of Proposition 3.2 of [Reference Bowen2] gives an upper bound of the desired form for the number of such pairs $(\mathbf {X}, \mathbf {y}^{k})$ , and therefore the number of such $\mathbf {X}$ .
Claim 2 implies that
where $C,d$ are independent of n.
We now find a lower bound for the expectation of $\lvert \mathcal {X} \rvert $ . Fix $k_{1}, k_{2} \in \mathbb {N}$ , and suppose n is large enough that $m_{n} \geq \max (k_{1}, k_{2})$ . Using Proposition 5.2 and Lemma 7.2, we have
We bound the infimum below as follows. Given any $W \in \mathcal {W}_{n}(k_{1}, m_{n}; \alpha {}\beta , k_{1}, k_{2}; \varepsilon )$ , we can let $\mathbf {X}, \mathbf {y}, \sigma $ be such that $W = W_{\sigma , (\mathbf {X}, \mathbf {y}^{m_{n}})}$ . Then by Lemma 3.2 and continuity of F,
for any $\delta>0$ for all small enough $\varepsilon $ (with ‘small enough’ dependent only on $k_{1}, k_{2}$ ). This implies that the infimum is bounded below by
We bound the sum below by first rewriting it as
The following claim, then, implies that the sum is bounded below by 1.
Claim 3. For all large enough n,
Proof. By Lemma 2.3, if
and $d(W_{\sigma , \mathbf {y}_{n}^{m_{n}}}, W_{\beta ^{m_{n}}}) < {\varepsilon }/{530 r}$ then there is a $(\mathtt {A}^{\mathrm {B}(e,k_{1})} \times \mathtt {B}^{\mathrm {B}(e,m_{n})})$ -weight W with $\pi _{\mathtt {B}} W = W_{\sigma , \mathbf {y}_{n}^{m_{n}}}$ and $d(W, W_{\alpha ^{k_{1}} {} \beta ^{m_{n}}}) < \varepsilon $ . By definition of $\mathtt {s}_{n}$ and Lemma 7.2, both conditions are met for all large enough n.
The claim will follow if we show that W is attainable.
With W as chosen above, by Proposition 2.4 we can choose $\tilde \sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ , ${\tilde {\mathbf X}} \in (\mathtt {A}^{\mathrm {B}(e,k_{1})})^{n}$ , and ${\tilde {\mathbf Y}} \in (\mathtt {B}^{\mathrm {B}(e,m_{n})})^{n}$ such that $W = W_{\tilde \sigma , ({\tilde {\mathbf X}}, {\tilde {\mathbf Y}})}$ .
Let ${\tilde {\mathbf y}} = \pi _{e} {\tilde {\mathbf Y}} \in \mathtt {B}^{n}$ . To complete the proof we show that ${\tilde {\mathbf y}}^{m_{n}} = {\tilde {\mathbf Y}}$ , that is,
for all $i \in [n]$ and $g \in \mathrm {B}(e,m_{n})$ . We prove this by induction on the word length $\lvert g \rvert $ .
The base case $\lvert g \rvert = 0$ (that is, $g=e$ ) follows immediately from the definition of ${\tilde {\mathbf y}}$ .
For the inductive step, write $g = ht$ with $\lvert h \rvert = \lvert g \rvert -1$ and $t \in \{s_{1}^{\pm 1}, \ldots , s_{r}^{\pm 1}\}$ . Then, assuming the result holds for h,
Now since $W_{\tilde \sigma , {\tilde {\mathbf Y}}} = W_{\sigma _{n}, \mathbf {y}_{n}^{m_{n}}}$ , we can pick $j \in [n]$ such that
This implies
Hence, for all large enough n, we have
and therefore
Combining this lower bound with equation (7.1) and the definition of $\mathrm {h}_{\Sigma ,\mu } (T, \alpha \mid \beta : k, c \varepsilon )$ , we get
Taking the infimum in $\varepsilon $ then letting $\delta $ go to zero gives
for $k \leq \min (k_{1}, k_{2})$ . First take $k_{2} \to \infty $ , then $k_{1} \to \infty $ , then take the infimum over k. We get
where the last line follows because the collection of pseudometrics $\{ d_{k}^{*} : k \in \mathbb {N}\}$ generates the weak $^{*}$ topology on $\operatorname {\mathrm {Prob}}((\mathtt {A} \times \mathtt {B})^{G})$ .
8 Proof of Theorem C
By analogy with sofic entropy, we denote $\Sigma := \{\mathtt {s}_{n}\}_{n=1}^{\infty }$ and denote the left-hand side of the formula in the theorem statement by $\mathrm {h}_{\Sigma ,\mu }(T, \alpha )$ .
Endow $\operatorname {\mathrm {Prob}}(\mathtt {A}^{G})$ with the metric
Note that this induces the weak* topology (where $\mathtt {A}$ is given the discrete topology and $\mathtt {A}^{G}$ the product topology).
Writing $\mu _{\mathtt {A}} = \alpha ^{G}_{*}\mu \in \operatorname {\mathrm {Prob}}(\mathtt {A}^{G})$ , we then have
We will similarly denote $\mu _{\mathtt {B}} = \beta ^{G}_{*} \mu \in \operatorname {\mathrm {Prob}}(\mathtt {B}^{G})$ .
8.1 Lower bound
Let $\lambda \in \operatorname {\mathrm {Prob}}((\mathtt {A} \times \mathtt {B})^{G})$ be any joining of (the shift systems with respective measures) $\mu _{\mathtt {A}}$ and $\mu _{\mathtt {B}}$ . Then, for any $\mathbf {x} \in \mathtt {A}^{n}$ and $\mathbf {y} \in \mathtt {B}^{n}$ , we have
where d is defined on $\operatorname {\mathrm {Prob}}((\mathtt {A} \times \mathtt {B})^{G})$ analogously to the definition given on $\operatorname {\mathrm {Prob}}(\mathtt {A}^{G})$ above. This inequality holds because total variation distance is non-increasing under pushforwards. Consequently,
Taking the supremum over joinings $\lambda $ gives the lower bound.
8.2 Upper bound
For $\varepsilon>0$ , let
be the set of shift-invariant ‘approximate joinings’ of $\mu _{\mathtt {A}}$ and $\mu _{\mathtt {B}}$ . Since $\operatorname {\mathrm {Prob}}((\mathtt {A} \times \mathtt {B})^{G})$ is compact, for each $\varepsilon>0$ there exist $\lambda _{1}, \ldots , \lambda _{m} \in \mathsf {J}_{\varepsilon }$ such that
By definition of $\mathtt {s}_{n}$ we have $\mathbb {P}_{\sigma \sim \mathtt {s}_{n}} ( d(P_{\mathbf {y}_{n}}^{\sigma }, \mu _{\mathtt {B}}) < \varepsilon ) = 1$ for all large enough n. Therefore,
Note that the entire expression in the infimum is decreasing as $\varepsilon \to 0$ , so we may replace the infimum with a limit. Rather than taking a continuous limit we write
For each m, pick $\lambda _{m} \in \mathsf {J}_{1/m}$ to get within $1/m$ of the supremum. Then the right-hand side is equal to
Let $\lambda _{m_{j}}$ be a subsequence with weak* limit $\lambda _{0}$ . By weak* continuity of pushforwards under projection we have $\lambda _{0} \in \mathsf {J}(\mu _{\mathtt {A}}, \mu _{\mathtt {B}})$ . Now, for any $\delta>0$ , for all large enough j we have both $1/m_{j} < \delta /2$ and $d(\lambda _{m_{j}}, \lambda _{0}) < \delta /2$ , so by the triangle inequality
It follows that the expression in $(*)$ , and hence $h_{\Sigma }(\alpha )$ , is bounded above by
Taking the infimum over $\delta $ shows that
9 Proof of Proposition 1.1
All sequences of interest are of the form
with $\mathbf {y}_{n} \in \mathtt {B}^{n}$ , $\sigma _{n} \in \operatorname {\mathrm {Sym}}(n)$ , $m_{n} = o(\log \log n)$ , and where $W_{n}$ is the $\mathtt {B}^{\mathrm {B}(e,m_{n})}$ -weight $W_{\sigma _{n}, \mathbf {y}_{n}^{m_{n}}}$ . In the case of Theorem A we simply have $m_{n} = 0$ for all n.
The theorem will follow from the following lemma.
Lemma 9.1. Let $\zeta _{n}$ denote the uniform measure on $\operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ . Then, for any finite $D \subset G$ and $\delta> 0$ , there exists $\varepsilon>0$ such that
for all large enough n.
This can be proven by making superficial changes to the proof of the similar result [Reference Airey, Bowen and Lin1, Lemma 3.1].
To prove Proposition 1.1, it now suffices to show that, for any $\varepsilon> 0$ ,
for all large enough n. To do this, first note that the left-hand side here depends only on the vector $p_{\mathbf {y}_{n}} \in \operatorname {\mathrm {Prob}}(\mathtt {B})$ of letter frequencies. Therefore,
But by Proposition 2.5, if $\sigma \in \operatorname {\mathrm {Hom}}(G, \operatorname {\mathrm {Sym}}(n))$ and $\mathbf {Y} \in (\mathtt {B}^{\mathrm {B}(e,m_{n})})^{n}$ are such that $W_{\sigma , \mathbf {Y}} = W_{n} = W_{\sigma _{n}, \mathbf {y}_{n}^{m_{n}}}$ , then the projection $\mathbf {Y}_{e} \in \mathtt {B}^{n}$ satisfies $(\mathbf {Y}_{e})^{m_{n}} = \mathbf {Y}$ . Therefore, for each $\sigma $ ,
Hence,
Combining these last few statements, we see that
We can ignore the first factor here since it only decays exponentially fast. By Proposition 5.1,
The third factor is clearly not a problem and can also be ignored. For the first factor,
using Lemma 7.2. For the second factor, first note that by definition of $F(W_{n})$ we have
So
again using Lemma 7.2. This implies that for every $\varepsilon>0$ we have
for all large enough n, which implies the result.
10 Proof of Lemma 2.3
We show how to construct a denominator-n weight $W_{\mathtt {A}\mathtt {B}}$ that has a given $\mathtt {B}$ -marginal $W_{\mathtt {B}}$ and is close to a given $(\mathtt {A} \times \mathtt {B})$ -weight W whose $\mathtt {B}$ -marginal $\pi _{\mathtt {B}} W$ is close to $W_{\mathtt {B}}$ . As in the theorem statement, we assume
To minimize the appearance of factors of $\tfrac 12$ , in this section we work with the $\ell ^{1}$ distance on weights, which is twice the distance defined above. Therefore the previous assumption becomes
We fix distinguished elements $a_{0} \in \mathtt {A}$ and $b_{0} \in \mathtt {B}$ which will be referred to throughout this section.
10.1 The vertex measure
We first define the weight’s vertex measure by
See Table 1.
Note that $\lvert W_{\mathtt {A}\mathtt {B}}((a,b)) - W((a,b)) \rvert \leq 1/n$ for $a \ne a_{0}$ and
Therefore the $\ell ^{1}$ distance between the vertex measures is
10.1.1 Nonnegativity
The terms defined by rounding down W using the floor function are guaranteed to be non-negative, but the others are not. In the following we show how to repair any negativity.
Let $-R/n$ denote the sum of all negative terms in the vertex measure. Since W contains only non-negative terms, we have
Therefore
Suppose there is some $b \in \mathtt {B}$ such that $W_{\mathtt {A}\mathtt {B}}((a_{0},b)) < 0$ . Since $W_{\mathtt {A}\mathtt {B}}$ has denominator n, we must have $W_{\mathtt {A}\mathtt {B}}((a_{0},b)) \leq -1/n$ . By construction, we have
so there exists some $a^{+} \in \mathtt {A}$ with $W_{\mathtt {A}\mathtt {B}}((a^{+},b)) \geq 1/n$ . Increase $W_{\mathtt {A}\mathtt {B}}((a_{0},b))$ by $1/n$ and decrease $W_{\mathtt {A}\mathtt {B}}((a^{+},b))$ by $1/n$ .
The number of times we must repeat this step before all terms are non-negative is exactly R, and each step moves the measure by $\ell ^{1}$ distance $2/n$ ; therefore the final edited vertex measure is distance at most $2R/n$ from the original $W_{\mathtt {A}\mathtt {B}}$ . If we now let $W_{\mathtt {A}\mathtt {B}}$ denote the new, non-negative vertex measure, by the above bound on $R/n$ we get
10.2 The $\mathtt {B}$ half-marginal
For the purposes of this construction we use the $\mathtt {B}$ ‘half-marginal’, which we denote by
This is an element of $\operatorname {\mathrm {Prob}}( (\mathtt {B} \times (\mathtt {A} \times \mathtt {B}) )^{r} )$ .
Before constructing the edge measure of $W_{\mathtt {A}\mathtt {B}}$ , in this section we first construct what will be its half-marginal.
For each $i \in [r]$ , $b,b^{\prime } \in \mathtt {B}$ , and $a^{\prime } \in \mathtt {A}$ , we define
See Table 2 for a representation of which terms are defined by each equation.
The definition of the terms in (10.3) ensures that
This will ensure that $W_{\mathtt {A}\mathtt {B}}$ has the correct vertex measure. Note also that, by line (10.2),
Using this and definition (10.3), we also get
This will ensure that the $\mathtt {B}$ -marginal of $W_{\mathtt {A}\mathtt {B}}$ is $W_{\mathtt {B}}$ .
We show now that the half-marginal $W_{\mathtt {A}\mathtt {B}}(\cdot , (\cdot , \cdot ); i)$ is $\ell ^{1}$ -close to $W(\cdot , (\cdot , \cdot ); i)$ by considering separately the contributions to the $\ell ^{1}$ distance from terms defined using equations (10.1), (10.2), and (10.3).
(10.1) terms: Each of the terms of $W_{\mathtt {A}\mathtt {B}}$ defined using the floor in equation (10.1) is distance at most $1/n$ from the corresponding term of W; therefore the total contribution of these terms to the $\ell ^{1}$ distance is
(10.2) terms: By the triangle inequality,
The total contribution of such terms is therefore
(10.3) terms: Again applying the triangle inequality,
Summing over all $a \in \mathtt {A}$ , $b^{\prime } \in \mathtt {B}$ and $i \in [r]$ , we see that the total contribution of such terms is bounded by
Adding up the contributions of the three types of terms, we see that the $\ell ^{1}$ distance between the half-marginals of W and $W_{\mathtt {A}\mathtt {B}}$ is bounded by
10.2.1 Nonnegativity
Again, the preceding construction does not guarantee that all terms are non-negative. In the following we describe how to correct negativity.
Let $-R/n$ be the sum of all negative terms of the half-marginal. As above, we get
Suppose there is some $b_{-} \in B$ , $(a^{\prime }_{-},b^{\prime }_{-}) \in \mathtt {A} \times \mathtt {B}$ , and $i \in [r]$ such that $W_{\mathtt {A}\mathtt {B}}(b_{-}, (a^{\prime }_{-},b^{\prime }_{-}); i) < 0$ . Then $W_{\mathtt {A}\mathtt {B}}(b_{-}, (a^{\prime }_{-},b^{\prime }_{-}); i) \leq -1/n$ . Since
and
there exist $a^{\prime }_{+} \in \mathtt {A}$ and $b_{+} \in \mathtt {B}$ such that
Decrease both of these terms by $1/n$ , and increase both $W_{\mathtt {A}\mathtt {B}}(b_{-}, (a^{\prime }_{-},b^{\prime }_{-}); i)$ and $W_{\mathtt {A}\mathtt {B}}(b_{+}, (a^{\prime }_{+},b^{\prime }_{-}); i)$ by $1/n$ . This moves the half-marginal by $\ell ^{1}$ distance $4/n$ :
This step must be done at most R times to eliminate all negative entries, so the final half-marginal satisfies
10.3 The edge measure
Finally, we define the edge measure of $W_{\mathtt {A}\mathtt {B}}$ by
See Table 3.
It follows from this definition that $W_{\mathtt {A}\mathtt {B}}$ is a (signed) weight with $\mathtt {B}$ -marginal $W_{\mathtt {B}}$ .
We now check that $W_{\mathtt {A}\mathtt {B}}$ is $\ell ^{1}$ -close to W. We consider separately the contribution to the $\ell ^{1}$ distance of terms defined in equations (10.4), (10.5), and (10.6).
(10.4) terms: Each term of $W_{\mathtt {A}\mathtt {B}}$ defined using the floor function in equation (10.4) is distance at most $1/n$ from the corresponding W term. The total contribution of these terms to the $\ell ^{1}$ distance is therefore at most $\lvert \mathtt {A} \rvert ^{2} \lvert \mathtt {B} \rvert ^{2} r/n$ .
(10.5) terms: Applying the triangle inequality to terms defined in equation (10.5),
By the $\ell ^{1}$ bound on the distance between the half-marginals, the total contribution of all such terms is therefore at most
(10.6) terms: Applying the triangle inequality to terms defined in equation (10.6),
Therefore the total contribution of all such terms is
Summing up the contributions from terms of all three types, we get that
10.3.1 Nonnegativity
We can modify a solution with negative entries to get a non-negative one similarly to above. Let $-R/n$ be the sum of all negative entries; then
Suppose there is some entry
We want to increment this term by $1/n$ without affecting the vertex measure or the $\mathtt {B}$ marginal. Since
there exists some $(a^{\prime }_{+}, b^{\prime }_{+}) \in \mathtt {A} \times \mathtt {B}$ such that $W_{\mathtt {A}\mathtt {B}}((a_{-}, b_{-}), (a^{\prime }_{+}, b^{\prime }_{+}); i) \geq 1/n$ ; similarly, since
there exists some $a_{+}$ such that $W_{\mathtt {A}\mathtt {B}}((a_{+}, b_{-}), (a^{\prime }_{-}, b^{\prime }_{-}); i) \geq 1/n$ . Increase
by $1/n$ , and decrease
by $1/n$ . This moves the weight by $\ell ^{1}$ distance $4/n$ .
Since R is the maximum number of times we need to do this before there are no more negative entries, the final weight satisfies
To simplify, we write
or
Acknowledgements
Thanks go to Tim Austin for suggesting that results like Theorems B and C should hold, for many helpful discussions, and for providing comments on earlier drafts. Thanks also go to Ben Hayes for sharing helpful references to the operator algebras literature, and to Lewis Bowen and Brandon Seward for helpful conversations. Thanks go to an anonymous referee for many helpful comments on an earlier draft. This material is based upon work supported by the National Science Foundation under grant no. DMS-1855694.