Algebraic Amplification for Node Classification in Sparsely Labeled Networks
Remote Talk
Past Talk
Wolfgang Gatterbauer
Associate Professor, Northeastern University
Friday
Oct 29, 2021
Watch video
2:00 pm
177 Huntington Ave
Online
11th floor
Join Talk (Zoom)Register for Workshop


Register for this talk here. You will receive the Zoom link the day before the talk.

Node classification is an important problem in graph data. It is commonly solved by various label propagation methods (such as belief propagation or linearized belief propagation) that work iteratively starting from a few labeled seed nodes. For graphs with arbitrary potentials (also called compatibilities or attractions) between classes (i.e. going beyond the assortative mixing assumption and explicitly allowing heterophily like in general Markov random fields), these methods crucially depend on knowing the compatibilities between neighboring labels. This information must be provided by either domain experts or heuristics before any label algorithm to work. Can we instead directly estimate the compatibilities from extremely sparsely labeled graphs in a way that scales to networks with millions of edges? We answer this question affirmatively and suggest a method called distant compatibility estimation that works even on extremely sparsely labeled graphs (e.g., 1 in 10,000 nodes is labeled) in a fraction of the time it later takes to label the remaining nodes. Our approach first creates multiple factorized graph representations (with size independent of the graph) and then performs estimation on these smaller graph summaries. We refer to algebraic amplification as the more general idea of leveraging algebraic properties of an algorithm’s update equations to amplify sparse signals. Our experiments show that our estimator is by orders of magnitude faster than alternative textbook approaches and that the end-to-end classification accuracy is comparable to using gold standard compatibilities. This makes it a cheap preprocessing step for any existing label propagation method and removes the current dependence on heuristics. It remains to be seen whether any of the more recent GNN approaches can be extended to work in a similar sparse regime.

[1] VLDB 2015: Linearized and Single-pass Belief Propagation. http://www.vldb.org/pvldb/vol8/p581-gatterbauer.pdf
[2] SIGMOD 2020: Factorized Graph Representations for Semi-Supervised Learning from Sparse Data. https://arxiv.org/pdf/2003.02829

About the speaker
About the speaker
Wolfgang Gatterbauer is an Associate Professor in the Khoury College of Computer Sciences at Northeastern University. A major focus of his research is to extend the capabilities of modern data management systems in generic ways and to allow them to support novel functionalities that seem hard at first. He received an NSF Career award and, with his students and collaborators, a best paper award at EDBT 2021, best-of-conference selections for PODS 2021, SIGMOD 2017, WALCOM 2017, and VLDB 2015, and two SIGMOD 2020 reproducibility awards. For details, visit https://gatterbauer.name.
Wolfgang Gatterbauer is an Associate Professor in the Khoury College of Computer Sciences at Northeastern University. A major focus of his research is to extend the capabilities of modern data management systems in generic ways and to allow them to support novel functionalities that seem hard at first. He received an NSF Career award and, with his students and collaborators, a best paper award at EDBT 2021, best-of-conference selections for PODS 2021, SIGMOD 2017, WALCOM 2017, and VLDB 2015, and two SIGMOD 2020 reproducibility awards. For details, visit https://gatterbauer.name.