SPuManTe: Significant Pattern Mining with Unconditional Testing
Visiting speaker
Matteo Riondato
Assistant Professor, Amherst College
Past Talk
Friday
Nov 8, 2019
Watch video
2:00 pm
Virtual
177 Huntington Ave.
11th floor
Online
Register here

We present SPuManTE, an efficient algorithm for mining significant patterns from a transactional dataset. SPuManTE controls the Family-wise Error Rate: it ensures that the probability of reporting one or more false discoveries is less than an user-specified threshold. A key ingredient of SPuManTE is UT, our novel unconditional statistical test for evaluating the significance of a pattern, that requires fewer assumptions on the data generation process and is more appropriate for a knowledge discovery setting than classical conditional tests, such as the widely used Fisher’s exact test. Computational requirements have limited the use of unconditional tests in significant pattern discovery, but ut overcomes this issue by obtaining the required probabilities in a novel efficient way. SPuManTE combines UT with recent results on the supremum of the deviations of pattern frequencies from their expectations, grounded in statistical learning theory. Joint work with Leonardo Pellegrina (UniPD) and Fabio Vandin (UniPD), presented at KDD’19.

About the speaker
About the speaker
Matteo Riondato is an assistant professor of computer science at Amherst College, and a visiting faculty at Brown University. Previously he was a research scientist at Two Sigma, and a postdoc at Stanford and Brown. He obtained his PhD in computer science from Brown. His research focuses on algorithms for knowledge discovery, data mining, and machine learning: he develops methods to analyze rich datasets, including graphs and time series, as fast as possible and in a statistically sound way. His works received best-of-conference awards at the 2014 SIAM International Conference on Data Mining (SDM), the 2016 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), and the 2018 IEEE International Conference on Data Mining (ICDM). He tweets @teorionda and lives at http://matteo.rionda.to.Matteo Riondato is an assistant professor of computer science at Amherst College, and a visiting faculty at Brown University. Previously he was a research scientist at Two Sigma, and a postdoc at Stanford and Brown. He obtained his PhD in computer science from Brown. His research focuses on algorithms for knowledge discovery, data mining, and machine learning: he develops methods to analyze rich datasets, including graphs and time series, as fast as possible and in a statistically sound way. His works received best-of- conference awards at the 2014 SIAM International Conference on Data Mining (SDM), the 2016 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), and the 2018 IEEE International Conference on Data Mining (ICDM). He tweets @teorionda and lives at http://matteo.rionda.to.
Matteo Riondato is an assistant professor of computer science at Amherst College, and a visiting faculty at Brown University. Previously he was a research scientist at Two Sigma, and a postdoc at Stanford and Brown. He obtained his PhD in computer science from Brown. His research focuses on algorithms for knowledge discovery, data mining, and machine learning: he develops methods to analyze rich datasets, including graphs and time series, as fast as possible and in a statistically sound way. His works received best-of-conference awards at the 2014 SIAM International Conference on Data Mining (SDM), the 2016 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), and the 2018 IEEE International Conference on Data Mining (ICDM). He tweets @teorionda and lives at http://matteo.rionda.to.Matteo Riondato is an assistant professor of computer science at Amherst College, and a visiting faculty at Brown University. Previously he was a research scientist at Two Sigma, and a postdoc at Stanford and Brown. He obtained his PhD in computer science from Brown. His research focuses on algorithms for knowledge discovery, data mining, and machine learning: he develops methods to analyze rich datasets, including graphs and time series, as fast as possible and in a statistically sound way. His works received best-of- conference awards at the 2014 SIAM International Conference on Data Mining (SDM), the 2016 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), and the 2018 IEEE International Conference on Data Mining (ICDM). He tweets @teorionda and lives at http://matteo.rionda.to.