Book contents
- Frontmatter
- Contents
- List of Algorithms
- List of Symbols and Notation
- Preface
- 1 Preliminaries and Notation
- 2 Similarity/Proximity Measures between Nodes
- 3 Families of Dissimilarity between Nodes
- 4 Centrality Measures on Nodes and Edges
- 5 Identifying Prestigious Nodes
- 6 Labeling Nodes: Within-Network Classification
- 7 Clustering Nodes
- 8 Finding Dense Regions
- 9 Bipartite Graph Analysis
- 10 Graph Embedding
- Bibliography
- Index
3 - Families of Dissimilarity between Nodes
Published online by Cambridge University Press: 05 July 2016
- Frontmatter
- Contents
- List of Algorithms
- List of Symbols and Notation
- Preface
- 1 Preliminaries and Notation
- 2 Similarity/Proximity Measures between Nodes
- 3 Families of Dissimilarity between Nodes
- 4 Centrality Measures on Nodes and Edges
- 5 Identifying Prestigious Nodes
- 6 Labeling Nodes: Within-Network Classification
- 7 Clustering Nodes
- 8 Finding Dense Regions
- 9 Bipartite Graph Analysis
- 10 Graph Embedding
- Bibliography
- Index
Summary
Introduction
This chapter is a follow-up to the previous chapter. It presents more advanced material involving recent attempts to define useful distances and similarities between nodes of a graph. While meaningful in many contexts and popular, the shortest-path distance does not convey information about the degree of connectivity between the nodes. In some occasions, we would like a distance that also captures the information about their connection rate, with a high connectivity being considered as an indication that the two nodes are close in some sense (e.g., they can easily exchange information). In other words, the presence of many indirect paths (as opposed to direct links) between nodes also suggests some kind of proximity between them.
As seen in the previous chapter, the resistance distance and the commute-time distance capture this property. However, we also saw that these quantities suffer from the fact that, when the graph becomes larger, they converge to a meaningless limit function (see [790, 792] or very recently [370], and the previous chapter, Section 2.5.3). This effect was called “being lost in space” in [790] and is related to the fact that a simple random walk mixes before hitting its target [370].
This means that both the shortest-path distance and the (Euclidean) commutetime distance have some inconvenient flaws, at least in the case of large graphs, and depending on the application. In some sense, they can be considered as two extremes of a continuum, considering only the length at one end, and considering only connectivity (and without taking care of the length) at the other end.
In this context, several researchers recently proposed towork with parametric dissimilarities or distances interpolating between the shortest-path distance and the commutetime distance [20, 155, 157, 292, 459, 833]. They all depend on a continuous parameter and therefore define “families of distances.” At one limit of the value of the parameter, these quantities converge to the shortest-path distance while at the other end, they converge to the commute-time distance. They therefore “interpolate” between the two distances. The idea is that when the parametric, interpolated, distance is not too far from the shortest-path distance, it integrates the degree of connectivity between the nodes into the distance while not being too sensitive to the effect of “being lost in space” [790].
- Type
- Chapter
- Information
- Algorithms and Models for Network Data and Link Analysis , pp. 102 - 142Publisher: Cambridge University PressPrint publication year: 2016