Ayan Chatterjee
London E1W 1YW, UK
Portland, ME 04101
2nd floor
11th floor
Boston, MA 02115
2nd floor
London E1W 1LP, UK
Talk recording
Link prediction, i.e., the task of predicting connections between nodes in a graph, is fundamental to graph machine learning and has applications ranging from drug discovery and disease characterization to recommender systems. Despite commendable progress on this problem, state-of-the-art models often perform poorly in predicting links for low-degree or newly introduced nodes. This limits their generalizability when used in real-world scenarios. In this dissertation, I address these challenges by developing novel and generalizable methods for link prediction with empirical studies on drug-target and protein-protein interaction networks. First, I propose a solution to the ”cold-start” problem in recommender systems, i.e., link prediction for isolated nodes in an inductive setting. Unsupervised Pre-training of Node Attributes (UPNA) uses external knowledge sources (outside the training graph) to learn node representations. These node representations improve link prediction performance on never-before-seen nodes. Second, I introduce a negative sampling method that combines negative samples derived from network hop distance with UPNA to improve generalization in predicting drug-target interactions. Third, I develop another negative sampling method that captures the underlying complementarity mechanisms in protein-protein interactions. These topological negatives in combination with UPNA improve both the generalizability of predictions for protein-protein interaction networks and their transferability to peptides. Finally, I formulate a method to modify the existing temporal link prediction models to improve their generalizability by aligning the node embedding spaces of two disjoint temporal graphs through the structural embedding space, which paves a path forward to a foundation model for temporal graphs.