Factorized Network Representations for Node Classification in Sparsely Labeled Graphs
Visiting Speaker
Past Event
Wolfgang Gatterbauer
Associate Professor, Khoury College of Computer Sciences at Northeastern University
Mar 13, 2020
Watch video
3:00 pm
177 Huntington Ave
11th floor

Node classification is an important problem in graph data management, and is commonly solved by various label prop- agation methods that work iteratively starting from a few labeled seed nodes. For graphs with arbitrary compatibilities between classes, these methods crucially depend on a compatibility matrix between classes that is commonly provided either by domain experts or heuristics. Can we instead derive compatibilities from the actual graph we like to label in a principled and scalable way? We answer this question positively and suggest a method (“distant compatibility estimation”) that can estimate the compatibilities on extremely sparsely labeled graphs (e.g., 1 in 10,000 nodes is labeled) in a fraction of time it later takes to label the remaining nodes. This makes it a cheap pre-processing step for any existing label propagation method and removes the current dependence on any domain experts or heuristics. Our approach first creates multiple consistent and compact factorized graph representations (with size independent on the graph) and only then perform estimation on these smaller representations. We show that the classification accuracy of our proposed estimator is comparable to using “ground truth" compatibilities and that our estimator is by orders of magnitude faster than standard approaches based on train-test sets.

About the speaker
Wolfgang Gatterbauer is an Associate Professor in the Khoury College of Computer Sciences at Northeastern University. Prior to joining Northeastern University, he was an Assistant Professor in the Tepper school of Business at Carnegie Mellon University; and before that a postdoc in the Database Group at University of Washington. His current main research interests center around algebraic principles that allow algorithms to scale with increasing data size. He is a recipient of the NSF CAREER award and “best-of-conference” mentions from VLDB 2015, SIGMOD 2017 and WALCOLM 2017.