Ayan Chatterjee
London E1W 1YW, UK
Portland, ME 04101
2nd floor
11th floor
Boston, MA 02115
London E1W 1LP, UK
Talk recording
Predicting connections between nodes in a network, a.k.a. link prediction, is a well-studied problem in network science, graph mining, and graph machine learning. This dissertation addresses the following limitations in state-of-the-art graph machine learning approaches to link prediction:
1. Predicting links for isolated nodes in the inductive setting (a.k.a. the cold-start problem).
2. Improving the generalizability of deep neural networks for link prediction through the use of network-derived negative samples and unsupervised pre-training of node attributes.
3. Generating topologically driven negative samples for link prediction that capture the complementarity in networks.
4. Investigating the feasibility of a foundation model for temporal link prediction through structure-inspired memory embeddings.
Link prediction is a critical task in various domains, including drug discovery and recommender systems. However, existing models often struggle with making accurate predictions for low-degree nodes and newly introduced entities, limiting their effectiveness. In my dissertation, I aim to address these challenges by proposing innovative approaches.
Firstly, I explore the limitations of state-of-the-art models in making inductive link predictions and propose a non-end-to-end training approach. This method leverages informative node attributes generated by unsupervised pre-training on large-scale corpora, enhancing model generalizability. Secondly, I focus on improving the interpretability of link prediction models by incorporating insights from network science into negative sampling techniques. This includes a strategic sampling of protein-protein non-interactions (PPNIs) to strengthen prediction generalizability and interpretability. Additionally, I introduce AI-Bind, a pipeline designed to improve drug-target interaction predictions, and ComPPlete, which enhances protein-protein interaction predictions through strategic sampling and unsupervised pre-training. These approaches aim to revolutionize drug discovery and advance our understanding of biological processes by providing more accurate and interpretable predictions