Learning Research Areas and Author Research Interests from Bibtex and Citations
Visiting Speaker
Past Talk
Tracy Ke
Assistant Professor of Statistics, Harvard University
Friday
Dec 10, 2021
Watch video
4:00 pm
177 Huntington Ave
11th floor
Join Talk (Zoom)Register for Workshop

Given the scientific publications in a field, we are interested in using bibtex and citation data to estimate (a) the primary research areas in this field, (b) the research interests of individual authors (which may evolve with time), and (c) the citation impacts of different research topics in this field.

We answer questions (a)-(b) by studying the co-citation networks of authors. We model them by a dynamic mixed-membership model, where each primary area is a "community," and the author research interests are described by the time-varying "mixed membership vectors." We propose a spectral algorithm for estimating these membership vectors. We answer question (c) by joint modeling citations and paper abstracts. We propose the Hofmann-Stigler model, which imposes K "topic vectors" in text abstracts, K "export scores" to model the citation impact of these topics, and a "topic weight vector" for each paper. We propose a spectral algorithm for parameter estimation, which output can be used to rank topics. We implemented our methods in a data set about publications in statistics. It covers over 83K papers in 36 journals in statistics spanning 41 years. We discovered a "Statistics Triangle" that is connected to Bradley Efron's Statistics Philosophy Triangle (Efron's triangle is subjective, but our triangle is from data). We also discovered the trend of moving toward the popular sub-area of "High-Dimensional Data Analysis" of quite a few high-profile authors. We also found that the research topic "Mathematical Statistics" is ranked 1st in terms of the citation impact.

This is joint work with Pengsheng Ji, Jiashun Jin, and Wanshan Li. The talk is partially based on the paper "Co-citation and Co-authorship Networks of Statisticians" (Journal of Business & Economic Statistics, to appear).

About the speaker
About the speaker
Tracy Ke is Assistant Professor of Statistics at Harvard University (2018 to present). She received her PhD in Operations Research and Financial Engineering from Princeton University in 2014, and was advised by Professor Jianqing Fan. She was Assistant Professor of Statistics at University of Chicago from 2014-2018. Her research interests include high-dimensional statistics, social network analysis, statistical text mining, and machine learning. She has received the ASA Noether Young Scholar Award, IMS Peter Hall Prize, and NSF CAREER Award.
Tracy Ke is Assistant Professor of Statistics at Harvard University (2018 to present). She received her PhD in Operations Research and Financial Engineering from Princeton University in 2014, and was advised by Professor Jianqing Fan. She was Assistant Professor of Statistics at University of Chicago from 2014-2018. Her research interests include high-dimensional statistics, social network analysis, statistical text mining, and machine learning. She has received the ASA Noether Young Scholar Award, IMS Peter Hall Prize, and NSF CAREER Award.