Book contents
- Frontmatter
- Contents
- List of Algorithms
- List of Symbols and Notation
- Preface
- 1 Preliminaries and Notation
- 2 Similarity/Proximity Measures between Nodes
- 3 Families of Dissimilarity between Nodes
- 4 Centrality Measures on Nodes and Edges
- 5 Identifying Prestigious Nodes
- 6 Labeling Nodes: Within-Network Classification
- 7 Clustering Nodes
- 8 Finding Dense Regions
- 9 Bipartite Graph Analysis
- 10 Graph Embedding
- Bibliography
- Index
7 - Clustering Nodes
Published online by Cambridge University Press: 05 July 2016
- Frontmatter
- Contents
- List of Algorithms
- List of Symbols and Notation
- Preface
- 1 Preliminaries and Notation
- 2 Similarity/Proximity Measures between Nodes
- 3 Families of Dissimilarity between Nodes
- 4 Centrality Measures on Nodes and Edges
- 5 Identifying Prestigious Nodes
- 6 Labeling Nodes: Within-Network Classification
- 7 Clustering Nodes
- 8 Finding Dense Regions
- 9 Bipartite Graph Analysis
- 10 Graph Embedding
- Bibliography
- Index
Summary
Introduction
This chapter introduces several methods of clustering the nodes of a graph into a partition. In multivariate statistics and data analysis [413, 429, 560], pattern recognition [418, 761, 807], data mining [361, 372], or machine learning [23, 91], clustering means grouping a set of objects into subsets, or clusters, such that those belonging to the same cluster are more “related” than those belonging to different clusters.1 In other words, a clustering provides a partition of the set of objects into disjoint clusters such that members of a cluster are highly “similar” while objects belonging to different clusters are dissimilar [264, 303, 418, 821, 824]. Of course, this supposes three different ingredients:
▸ a measure of similarity or dissimilarity between the objects
▸ a criterion, also called cost, loss, or objective function, measuring the quality of a partition
▸ an optimization technique, or procedure, for computing a high-quality partition, according to the criterion being considered
The similarity measure could, for instance, be the similarity provided by a kernel on a graph, or simply whether the nodes are connected. In addition, the criterion could be the total within-cluster inertia induced by the kernel on a graph in the embedding space, as in the case of a simple k-means clustering.
However, most of the clustering algorithms, such as the k-means, assume that the user provides a priori the number of clusters, which is not very realistic because this number is, in general, not known in advance. There exists, however, a number of heuristic procedures to suggest a “natural” number of clusters (see for instance [576]). Thus, some clustering algorithms do not need this assumption and are therefore able to detect a number of clusters as well. These are often called community detection algorithms in the context of node clustering. One popular example of a community detection algorithm is modularity optimization, which is described in this chapter.
There exist several different types of clustering algorithms [6, 264, 303, 418, 761, 821, 824], the most prominent ones being the following:
▸ Top-down, divisive, techniques, also called partitioning or splitting methods. These methods start from an initial situation where all the nodes of the graph are contained in only one cluster.
- Type
- Chapter
- Information
- Algorithms and Models for Network Data and Link Analysis , pp. 276 - 348Publisher: Cambridge University PressPrint publication year: 2016