1. Introduction
The friendship paradox, introduced by Feld in [Reference Feld15], states roughly that your friends have more friends than you do on average (for an explicit statement, see Theorem 1.1). Following the work of [Reference Kramer, Cutler and Radcliffe24], extending the friendship paradox to multiple steps (i.e., iterated friendships), here we quantify an explicit connection between long-range degree correlation, degree variability, and the degree-wise effect of additional steps for random walks on a graph. Results for random paths are also considered.
Throughout, we suppose $G=(\mathcal{V},\mathcal{E})$ is a connected graph with node set $\mathcal{V}=\{v_1,v_2,\ldots,v_n\}$ and undirected edge set $\mathcal{E}$ and define the degree function $\mathrm{d}$, so that for $v\in \mathcal{V}$, $\mathrm{d}(v)$ is the degree of v (i.e., the number of neighbors of v) in G. Moreover, we denote by $\bf{A}=[A_{i,j}]$ the associated n × n adjacency matrix; the degree of a node vi can then be computed via
As we will be interested in the expected degree over random sequences of nodes in the graph, it will be convenient to consider a time-homogeneous random walk $\bf{X}=(X_0,X_1,\ldots)$ dictated by a transition matrix, $\bf{P}=[P_{i,j}]$, with
for $1\leq i$, $j\leq n$, and $k\geq 0$. Importantly, we will assume throughout that X 0 is uniformly selected from V.
We first restate the friendship paradox formalized in [Reference Feld15].
Theorem 1.1. ([Reference Feld15]) Suppose X 0 is a node selected uniformly at random from $\mathcal{V}$, and $E=\{V,W\}$ is an edge pair selected uniformly at random from $\mathcal{E}$. Then, selecting Y 1 from the nodes in E, each with half chance,
Similarly, several authors have employed the following counterpart for random walks with a uniformly selected initial node X 0 (see for instance [Reference Berenhaut and Jiang3, Reference Cao and Ross6, Reference Jackson21]).
Theorem 1.2. Suppose $\bf{X}=(X_0,X_1,X_2,\ldots)$ is a simple random walk on the graph G, where X 0 is selected uniformly at random from $\mathcal{V}$. Then
The following two results regarding multiple step walks and paths can be found in [Reference Kramer, Cutler and Radcliffe24].
Theorem 1.3. ([Reference Kramer, Cutler and Radcliffe24]) Suppose $\bf{X}=(X_0,X_1,X_2,\ldots)$ is a simple random walk on the graph G, where X 0 is selected uniformly at random from $\mathcal{V}$. Then, for $k\geq 0$,
Theorem 1.4. ([Reference Kramer, Cutler and Radcliffe24]) Suppose $k\geq 1$ is odd, X 0 is selected uniformly at random from $\mathcal{V}$, and $W=\{Y_0,Y_1,\ldots,Y_k\}$ is a path selected uniformly at random from the set of all length-k paths on G. Then, we have
For further recent theoretical results regarding the friendship paradox, see for instance [Reference Berenhaut, Jiang, McNab and Krizay4, Reference Cantwell, Kirkley and Newman5, Reference Dizon-Ross and Ross12, Reference Pal, Yu, Novick, Swami and Bar-Noy31].
In place of comparisons with $\mathbb{E}(\mathrm{d}(X_0))$, as in the results above, the primary concern of this paper is the study of the degree-wise benefit of additional steps on the graph (for both random walks and random paths). Motivation for such considerations is provided by the recent success of applications of the one-step friendship paradox (see [Reference Christakis and Fowler8, Reference Cohen, Havlin and Ben-Avraham9, Reference Eom and Jo14, Reference Garcia-Herranz, Moro, Cebrian, Christakis and Fowler18, Reference Herrera, Srinivasan, Brownstein, Galvani and Meyers20, Reference Kim, Hwong, Stafford, Hughes, O’Malley, Fowler and Christakis23, Reference Singer34]).
Several authors have alluded to weakness in expression of the friendship paradox in networks exhibiting positive degree correlation over edges (see for instance [Reference Cantwell, Kirkley and Newman5, Reference Lee, Lee, Eom, Holme and Jo25, Reference Momeni and Rabbat28, Reference Pal, Yu, Novick, Swami and Bar-Noy31]). Before formally stating our results, it will be convenient to consider longer-range degree correlation. It is common for networks to exhibit sharing of similar characteristics across edges (i.e., “homophily,” “birds of a feather flock together,” “assortativity”; see [Reference Newman29]). In social networks, in particular, individuals may prefer to be associated with others sharing, for instance, similar age, education, or occupations. In the case where the characteristic of interest is nodal degree, Pearson correlation $\rho=\rho(G)$ refers to the tendency of nodes in a network to be associated with others sharing similar (or different) degrees. Specifically, for the graph G, suppose $E=\{V,W\}$ is an edge selected uniformly at random, then
A positive degree correlation, ρ > 0, indicates a tendency for high-degree nodes in the graph to connect to other high-degree nodes and similarly for low-degree nodes.
Values of ρ have been studied in various classes of networks (see [Reference Chen and Olvera-Cravioto7, Reference Foster, Foster, Grassberger and Paczuski16, Reference Li, Wang and Van Mieghem26, Reference Piraveenan, Prokopenko and Zomaya33]). For connections between assortativity and other topological network properties, see [Reference D’Agostino, Scala, Zlatić and Caldarelli11, Reference Ellens, Spieksma, Van Mieghem, Jamakovic and Kooij13, Reference Van Mieghem, Ge, Schumm, Trajanovski and Wang36, Reference Van Mieghem, Wang, Ge, Tang and Kuipers37]. For work considering assortativity in the context of the friendship paradox, see for instance [Reference Jo and Eom22, Reference Lee, Lee, Eom, Holme and Jo25, Reference Momeni and Rabbat28, Reference Pal, Yu, Novick, Swami and Bar-Noy31]. See [Reference Noldus and Van Mieghem30] and the references therein for some further work on assortativity.
Although much previous research has focused on degree correlation among nearest neighbors, some recent considerations of the concept of assortative mixing beyond first neighbors can be found in [Reference Allen-Perkins, Pastor and Estrada1, Reference Arcagni, Grassi, Stefani and Torriero2, Reference Fujiki, Takaguchi and Yakubo17, Reference Mayo, Abdelzaher and Ghosh27] and can in a general sense be referred to as long-range degree correlation (see [Reference Arcagni, Grassi, Stefani and Torriero2]).
Assortativity, as defined in [Reference Newman29], can be interpreted as a measure of the correlation between nodal degrees based on the first-order adjacency matrix. Here, we define the kth-order path-based degree correlation as the Pearson coefficient, measuring the correlation between degrees for the two end points of a randomly chosen path with length k.
Definition 1.5. For $k \geq 0$, let $Y_k^-$ and $Y_k^+$ be the beginning and terminal nodes for a path selected uniformly at random from the set of all length-k paths on G. The kth-order (path-based) degree correlation $\rho_{0,k}$ is given by
In parallel with the definition above, we also consider the walk-based degree correlation as that for the starting node, X 0, of a random walk $\bf{X}=(X_0,X_1,X_2,\ldots)$ (with uniform initial distribution) and the node reached at the walk’s kth step, Xk. Note that the use of the uniform distribution here is motivated by the desired applications. This is in contrast to recent work regarding long-range correlation, wherein the initial distribution is stationary for the corresponding Markov chain (see [Reference Arcagni, Grassi, Stefani and Torriero2, Reference Gutiérrez-Gómez and Delvenne19, Reference Peel, Delvenne and Lambiotte32]).
Definition 1.6. Consider a random walk, $\bf{X}=\{X_0, X_1, X_2, \ldots\}$, on the graph G (with uniform initial distribution). The kth-order (walk-based) degree correlation, $\phi_{0,k}=\phi_{0,k}(\bf{X})$, is given by
Note. As an aside, it may be of some value to consider applications for which $\phi_{0,1}$ as in Definition 1.6 may be of interest. The correlation computed involves the same node pairs as in (5) but with weighting in a manner that places equal emphases on each node regardless of degree. Ties in a sense are weaker (or diluted) for nodes of large degree.
Now, for the random walk $\bf{X}=(X_0,X_1,\dots)$, let $X_\infty$ be a node selected according to the stationary distribution $\pi=(\pi_1, \dots, \pi_n)$ of the random walk X, that is, $\pi_i=\mathrm{d}(v_i)/\sum_j \mathrm{d}(v_j)$. A main quantity of interest will be what we term the proportional residual benefit at time k for the walk, which measures the degree-wise (remaining) benefit of additional steps of a random walk.
Definition 1.7. The proportional residual benefit for the walk X at time k, $\tau_k=\tau_k(\bf{X})$ is given by
We now have the following quantitative relationship between the walk-based long-range degree correlation and the proportional residual benefit. The proof is provided in Section 2. Throughout, for a random variable U with finite second moment, we denote by $\mathrm{c_v}(U)$, the coefficient of variation of U, that is,
Theorem 1.8. Suppose $\bf{X}=(X_0,X_1,\ldots)$ is a random walk on the graph G. The proportional residual benefit τk can be written as the product of the kth-order degree correlation $\phi_{0,k}$ and the two coefficients of variation, $c_v(\mathrm{d}(X_0))$ and $c_v(\mathrm{d}(X_k))$, that is
Now, as in Definition 1.5, for $k \geq 0$, let $Y_k^-$ and $Y_k^+$ be the beginning and terminal nodes for a path selected uniformly at random from the set of all length-k paths. We have the following definition and result.
Definition 1.9. Suppose $k\geq 0$. The proportional one-step benefit at length k, γk, is given by
Theorem 1.10. The proportional one-step benefit, γk, at length k can be written as the product of the kth-order degree correlation $\rho_{0,k}$ and the two coefficients of variation, ${c_v}(\mathrm{d}(Y_{k}^-))$ and ${c_v}(\mathrm{d}(Y_{k}^+))$, that is,
In terms of residual benefit, in the case of random paths, we will also prove the following.
Theorem 1.11. Suppose G is a non-bipartite (and connected) graph. If $\bf{(a)}$ $k\geq 0$ is even or $\bf{(b)}$ $k\geq 1$ is odd and the corresponding kth-order path-based degree correlation $\rho_{0,k}$ is nonnegative, then the limiting expected degree of $Y_i^+$ is no less than that of $Y_k^+$, that is,
Note. (Disparity persistence and core-periphery structure) Theorems 1.8, 1.10, and 1.11 provide insight into when it may be beneficial in acquaintance sampling and elsewhere to continue on to neighbors of neighbors in an iterated fashion. In fact, the benefit can be high in networks wherein the degree correlation is positive out to a longer range, but, globally, the degree variability is high. We refer to such a phenomenon as disparity-persistence, since disparity in degree persists over extended neighborhoods. A prime example wherein such behavior can occur is social networks exhibiting strong core-periphery structure, with a large core of high-degree nodes and a periphery of loosely connected nodes, with low degree.
The remainder of the paper proceeds as follows. In Section 2, we prove Theorem 1.8, while in Section 3, we prove Theorems 1.10 and 1.11.
2. Proof of Theorem 1.8 (The random walk case)
In this section, we will prove Theorem 1.8.
Proof of Theorem 1.8
Set $d_i=\mathrm{d}(v_i)$, $\bf{d}=(d_1,d_2,\ldots,d_n)^{\prime}$, and $\bf{1}=(1,1,\ldots,1)^{\prime}$. Let P be the transition matrix for the random walk, X, and note that $\bf{d}=\bf{A} \bf{1}$. Now, note that
Hence, letting D be the diagonal matrix with diagonal d and noting that $\bf{P}=\bf{D}^{-1}\bf{A}$ gives
Thus,
Now,
and hence
Thus,
We easily derive the following corollary.
Corollary 2.1. Suppose $\bf{X}=(X_0,X_1,\ldots)$ is a random walk on the graph $G=(\mathcal{V},\mathcal{E})$. For $k\geq 1$, the expected degree of $X_\infty$ is no less than that of Xk if and only if the kth-order degree correlation $\phi_{0,k}$ is nonnegative, that is,
3. Proof of Theorems 1.10 and 1.11 (The random path case)
In this section, we will prove Theorem 1.10 and further discuss the relationship between the proportional one-step benefit γk and the path-based degree correlation $\rho_{0,k}$ in parallel with the random-walk case, $(X_k)_{k\geq 0}$. As per convention, throughout, we will take the zeroth power of the adjacency matrix, A, of a graph G (i.e., A0) to be the n × n identity matrix, and hence for $k\geq 0$, the entry $(A^k)_{i,j}$ counts the number of distinct paths of length k connecting nodes vi and vj.
As before, suppose for $i \geq 0$, $Y_{i}^-$ and $Y_{i}^+$ are the beginning and terminal nodes of a path selected uniformly at random from the set of all length-i paths on G. We have the following lemma (see also Lemma 4 in [Reference Kramer, Cutler and Radcliffe24]).
Lemma 3.1. Suppose $k\geq 0$. The expected degree of $Y_k^{+}$ can be written in terms of successive entries in the sequence $(N_0, N_1, \ldots)$, where Nj is the number of length j paths on G, that is,
Proof. We have
where the third equality follows from the symmetry of the adjacency matrix, A.
Now we turn to a proof of Theorem 1.10.
Proof of Theorem 1.10
Suppose $k\geq 0$ and $Y_k^{-}$ and $Y_k^{+}$ are two nodes connected by a randomly chosen path with length k. We have
Note that by Lemma 3.1 and symmetry,
Hence, using Eqs. (12) and (13),
and hence by the definition of γk, and Eqs. (13) and (14),
Therefore, the proportional increase in relative expected degree for an increase of one in the length of our random path can be determined by the corresponding path-based degree correlation, along with the two coefficients of variation. We have the following corollary.
Corollary 3.2. Suppose $k\geq 0$. The expected degree of $Y_{k+1}^{+}$ is no less than that of $Y_k^{+}$ if and only if the kth-order degree correlation $\rho_{0,k}$ is nonnegative, that is,
Note that employing an inequality on the number of paths, developed by [Reference Täubig, Weihmann, Kosub, Hemmecke and Mayr35] (with c = 1 and b = 0), we have
for $a\geq 0$, and hence by Lemma 3.1, when k is even, $\mathbb{E}(\mathrm{d}(Y_{k+1}^{+}))\geq \mathbb{E}(\mathrm{d}(Y_k^{+}))$. In the case k = 0, this is simply the result of Feld in Theorem 1.1. In addition, by Corollary 3.2, for any network G, G is assortative (i.e., $\rho_{0,1} \gt 0$) if and only if $\mathbb{E}(\mathrm{d}(Y_{2}^+)) \gt \mathbb{E}(\mathrm{d}(Y_1^+))$. For consideration of degree comparison for X 1 and X 2 under a random graph model, see [Reference Berenhaut, Jiang, McNab and Krizay4].
The following lemma follows directly from eigen decomposition of the matrix A and Lemma 3.1 (see for instance [Reference Cvetkovic, Cvetković, Rowlinson and Simic10]).
Lemma 3.3. ([Reference Cvetkovic, Cvetković, Rowlinson and Simic10]) Suppose $k\geq 0$ and the graph G is non-bipartite. The expected degree of $Y_k^{+}$ tends to λ 1, the largest eigenvalue of the adjacency matrix, A, as k tends to infinity, that is,
We now turn to a proof of Theorem 1.11
Proof of Theorem 1.11
Suppose $m\geq 0$. From Rayleigh’s inequality, we have
The result for k even follows by Lemma 3.3. Now, suppose k is odd and $\rho_{0,k}\geq 0$. Employing Corollary 3.2 and Eq. (19) then gives
and the result follows.
Competing interests
The authors declare no conflict of interest.