Clustering Nodes

François Fouss; Marco Saerens; Masashi Shimbo

doi:10.1017/CBO9781316418321.008

Introduction

This chapter introduces several methods of clustering the nodes of a graph into a partition. In multivariate statistics and data analysis [413, 429, 560], pattern recognition [418, 761, 807], data mining [361, 372], or machine learning [23, 91], clustering means grouping a set of objects into subsets, or clusters, such that those belonging to the same cluster are more “related” than those belonging to different clusters.1 In other words, a clustering provides a partition of the set of objects into disjoint clusters such that members of a cluster are highly “similar” while objects belonging to different clusters are dissimilar [264, 303, 418, 821, 824]. Of course, this supposes three different ingredients:

▸ a measure of similarity or dissimilarity between the objects
▸ a criterion, also called cost, loss, or objective function, measuring the quality of a partition
▸ an optimization technique, or procedure, for computing a high-quality partition, according to the criterion being considered

The similarity measure could, for instance, be the similarity provided by a kernel on a graph, or simply whether the nodes are connected. In addition, the criterion could be the total within-cluster inertia induced by the kernel on a graph in the embedding space, as in the case of a simple k-means clustering.

However, most of the clustering algorithms, such as the k-means, assume that the user provides a priori the number of clusters, which is not very realistic because this number is, in general, not known in advance. There exists, however, a number of heuristic procedures to suggest a “natural” number of clusters (see for instance [576]). Thus, some clustering algorithms do not need this assumption and are therefore able to detect a number of clusters as well. These are often called community detection algorithms in the context of node clustering. One popular example of a community detection algorithm is modularity optimization, which is described in this chapter.

There exist several different types of clustering algorithms [6, 264, 303, 418, 761, 821, 824], the most prominent ones being the following:

▸ Top-down, divisive, techniques, also called partitioning or splitting methods. These methods start from an initial situation where all the nodes of the graph are contained in only one cluster.

Book contents

7 - Clustering Nodes

Summary

Access options

Book contents

7 - Clustering Nodes

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive