A common task in network analysis is to seek a coarse-graining of the network into modules or communities, which describe the large-scale architecture of the network. For instance, we might want to find social groups within a network of friendships, functional modules among gene regulatory interactions, or compartments within food webs. However, different algorithms will return different communities for the same network, and this presents a conundrum for scientific interpretation: which set of communities are the real ones? In this talk, I'll show how using node attributes or "metadata" can solve this problem, by guiding the community detection process toward useful outcomes. The resulting algorithm, which is a generalization of the powerful stochastic block model, is more accurate than any algorithm that uses only network structure or node metadata alone, and can automatically learn the underlying correlation between metadata and structure, if one exists. To illustrate these features, I'll show results for applying the method both to synthetic networks with known structure and to real-world networks with unknown structure. I'll close with a few general comments about the recently proved No Free Lunch theorem in community detection, and the utility of community detection methods in scientific applications.
This is joint work with Mark Newman.