89
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Efficient computation of the Weighted Clustering Coefficient

&
Pages 381-401 | Published online: 26 Jul 2016
 

Abstract

The clustering coefficient of an unweighted network has been extensively used to quantify how tightly connected is the neighbor around a node and it has been widely adopted for assessing the quality of nodes in a social network. The computation of the clustering coefficient is challenging since it requires to count the number of triangles in the graph. Several recent works proposed efficient sampling, streaming and MapReduce algorithms that allow to overcome this computational bottleneck.

As a matter of fact, the intensity of the interaction between nodes, that is usually represented with weights on the edges of the graph, is also an important measure of the statistical cohesiveness of a network. Recently various notions of weighted clustering coefficient have been proposed but all those techniques are hard to implement on large-scale graphs.

In this work we show how standard sampling techniques can be used to obtain efficient estimators for the most commonly used measures of weighted clustering coefficient. Furthermore we also propose a novel graph-theoretic notion of clustering coefficient in weighted networks.

Acknowledgments

We thank Corinna Cortes for suggesting the problem.

Notes

1 We note that our definition of weighted random random graph is different from the definition of [Citation10] and it is more in line with the standard definition used in data mining and biology [Citation12].

2 The problem of computing a core decomposition in an uncertain graph with different probabilities on edges has been considered in [Citation5]. The authors show how to compute the expected degree of an uncertain graph in polynomial time. This method cannot however be applied to speed up the computation of the the novel notion of weighted clustering coefficient we propose.

3 This setting is particularly relevant when graphs are generated using inference models [Citation13].

4 Note that enumerating all the triangles in the graph would not work in this setting because of the dependency induced by the number of wedges in the realization of the random graph.

5 Unfortunately to the best of our knowledge, there is no analytic technique to estimate this quantity correctly or with a close approximation without using a dynamic programming.

6 Note that for this to work it is fundamental that the weight on the edges have been normalized and so are in [0, 1].

7 Note that a naive implementation of the sampling procedure would have running time quadratic in the size of the adjacency list. Fortunately this is not necessary in fact it is possible to select a random pair of neighbors in linear time. In particular, it is enough to assign to each neighbor a random number and then select the two neighbors with the smallest assigned values. In this way each pair of nodes has the same probability of being selected and so we can obtain a random sample.

8 Experimentally we observed that a nonlinear mapping perform better than the linear mapping proposed by Onnela et al.

9 Note that all the discussed notion of clustering coefficient can be extended to capture the directionality of the edges.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access
  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart
* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.