80
Views
11
CrossRef citations to date
0
Altmetric
Original Articles

Tree decompositions and social graphs

, &
Pages 315-361 | Published online: 13 Jul 2016
 

Abstract

Recent work has established that large informatics graphs such as social and information networks have non-trivial tree-like structure when viewed at moderate size scales. Here, we present results from the first detailed empirical evaluation of the use of tree decomposition (TD) heuristics for structure identification and extraction in social graphs. Although TDs have historically been used in structural graph theory and scientific computing, we show that—even with existing TD heuristics developed for those very different areas—TD methods can identify interesting structure in a wide range of realistic informatics graphs. Our main contributions are the following: we show that TD methods can identify structures that correlate strongly with the core-periphery structure of realistic networks, even when using simple greedy heuristics; we show that the peripheral bags of these TDs correlate well with low-conductance communities (when they exist) found using local spectral computations; and we show that several types of large-scale “ground-truth” communities, defined by demographic metadata on the nodes of the network, are well-localized in the large-scale and/or peripheral structures of the TDs. Our other main contributions are the following: we provide detailed empirical results for TD heuristics on toy and synthetic networks to establish a baseline to understand better the behavior of the heuristics on more complex real-world networks; and we prove a theorem providing formal justification for the intuition that the only two impediments to low-distortion hyperbolic embedding are high tree-width and long geodesic cycles. Our results suggest future directions for improved TD heuristics that are more appropriate for realistic social graphs.

Acknowledgements

We would like to thank Felix Reidl for considerable help in simplifying the proof of Theorem 3. We would also like to thank Mason Porter for helpful discussions and for providing several of the networks that we considered as well as Dima Krioukov and his collaborators for providing us access to their code for generating networks based on their hyperbolic model. In addition, we would like to acknowledge financial support from the Air Force Office of Scientific Research, the Army Research Office, the Defense Advanced Research Projects Agency, the National Consortium for Data Science, and the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of any of the above funding agencies.

Notes

1 Alternatively, the bags and edges of the TD form separators (cuts) in the graph. The set of vertices contained in any bag, or intersection of two adjacent bags, form a separator in G. This structural property is important as it allows TDs to be thought of as a method of organizing cuts in a network. This is also related to how the treewidth of a network is used to measure how tree-like a network is. Intuitively, a tree has a treewidth of 1 because the graph can be separated by the removal of a single edge (or vertex) in the network, whereas a cycle requires two edges to be cut and thus has a treewidth of 2. TDs with large widths require larger numbers of vertices to separate a network into two disconnected pieces.

2 A related aspect of the definition of a TD is the overlapping nature of the bags of a TD. Vertices in the graph will appear in many bags in the TD. This is particularly true of high degree or high k-core nodes [Citation3].

3 In particular, in the so-called Grid Minor Theorem, Robertson and Seymour showed that every graph of treewidth at least k contains a f(k) × f(k) grid as a graph minor, for some integer-valued function f. The original estimate of the function f gave an exponential relationship between the treewidth and the grid size, and although several results greatly improved the relationship, the question of whether or not it held for any polynomial function f remained open for over 25 years. Recently, Chekuri and Chuzhoy proved that there is a universal constant δ > 0 so that all graphs of treewidth at least k have a grid-minor of size Ω(kδ) × Ω(kδ) [Citation20], resolving this conjecture.

4 In the context of understanding the intermediate-scale structure of real networks and improving inference (e.g., link prediction, overlapping community detection, etc.), there are likely more appropriate objective functions, although their general identification and development is left as future work.

5 Our prior work focused on the use of δ-hyperbolicity [Citation3,58,59]. It can be a useful tool for describing and analyzing real networks, even though it is expensive to compute, but aside from our theoretical result in Section 7 relating it to treewidth and treelength, it is not our focus in this paper.

6 The same is true for many other less unrealistic random graph models, assuming their parameters are set to analogously sparse values (which they are often not).

7 For sparse ER graphs, this happens since there is not enough edges for concentration of measure to occur, i.e., for empirical quantities such as the empirical degrees to be very close to their expected value. For PL graphs, an analogous lack of measure concentration occurs due to the exogenously-specified heterogeneity parameter γ.

8 See [Citation2] for how this affects the NCP of these networks.

9 Among the differences caused by the much higher density of Facebook networks is that these networks have a much deeper k-core structure than the other real networks, and they tend to lack deep cuts, e.g., they lack even good very-imbalanced partitions such as those responsible for the upward-sloping NCP [Citation1,2].

10 These and other visualizations were created with the GraphViz command neato [Citation96], with the help of [Citation97,98].

11 Using medians rather than eccentricity can result in different central bags. However in most of the networks that we studied, the results were very similar. In particular, the biggest changes occurred in the FB networks where the median shifted towards the heavier end of the path-like TD. However, these bags were still a part of the thick trunk of the network and thus the results were very similar. In other networks, the median bag was very close the central eccentric bag, and the main difference is that the median bag tended to have more whisker branches (a branch consisting of one or two bags of small width). This does not substantially change any of our analysis.

12 We will see below that most real networks have small median width, with smallest bags dominated by cliques, intermediate bags dominated by cycles, and with large, connected, central bags which resemble bags of SmallER.

13 We should note that we ran these computations with many different TD heuristics. In most of this section, however, we only show results from (the most scalable) amd heuristic. This is simply for brevity. There were some differences from heuristic to heuristic, but we feel this one is representative of the type of behavior found.

14 To understand this threshold, consider the following example: if a community of size n is a tree, e.g., whiskers in ER(1.6), then it will be contained in n bags in the (ideal) TD; if the community is a “clique whisker,” i.e., a clique connected to the rest of the network by only one edge, it will be contained in just one or two bags; and if the community contains deep core nodes which are connected to many nodes outside of the community, the community will be spread across many bags in the network. Other measures of TD locality showed similar results.

15 As we mentioned in Section 2, this is not the main focus of our paper, but there has been recent theoretical and empirical interest in this and related questions; see, e.g., [Citation66–74].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access
  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart
* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.