|Talks|

From Optimal Monte Carlo Estimation to Optimal Sampling Influence Maximization

Visiting speaker
Past Talk
Hung Nguyen
Virginia Commonwealth University
Mar 20, 2018
1:30 pm
Mar 20, 2018
1:30 pm
In-person
4 Thomas More St
London E1W 1YW, UK
The Roux Institute
Room
100 Fore Street
Portland, ME 04101
Network Science Institute
2nd floor
Network Science Institute
11th floor
177 Huntington Ave
Boston, MA 02115
Network Science Institute
2nd floor
Room
58 St Katharine's Way
London E1W 1LP, UK

Talk recording

Influence Maximization (IM) underlies an important viral marketing problem in economics and has vast applications in outbreak detection, rumor monitoring, network immunization, etc. Given a network and a diffusion model describing how information flows from a node to another, IM seeks a small set of seeding nodes that maximizes the spread of information, termed influence. The IM problem itself is NP-hard. Fortunately, the influence function is monotone and submodular and the greedy algorithm of iteratively selecting nodes with maximum marginal influences is (1-1/e)-optimal. However, computing exact (marginal) influence is #P-hard. The simulation-based Monte Carlo approach requires excessive simulations to estimate the influences. Lazy greedy exploits the submodularity property to significantly reduce the number of simulations; but it is still impractical for large real-world networks. Various heuristic approaches have been studied with better scalability. We investigate the recently proposed Reverse Influence Sampling (RIS) (Borgs et al. SODA’14) and the well-known optimal Monte Carlo estimation (Dagum et al. SICOMP’00) to propose scalable approximation algorithms for IM and its weighted and budgeted variants. Our first proposed algorithm is based on a simple threshold of the seed set coverage to generated samples. It returns an (1 - 1/e - epsilon)-approximate solution with probability (1-delta) for any given 0 < epsilon < 1-1/e, 0 < delta < 1. More importantly, the algorithm can run on very large networks, e.g. Twitter with 42 million nodes and 1.5 billion edges. Our second Stop-and-Stare approach adopts a verifier to re-estimate the influence of a given seed set and alternates between finding a candidate solution with verifying the influence of that solution until the desired approximation ratio (1 - 1/e - epsilon) is met. Theoretically, we show that this algorithm requires optimal number of samples within a constant factor subject to reasonable assumptions. Experimentally, we demonstrate that the proposed algorithm easily scales to the largest available Friendster social network with 66 million nodes and 3.6 billion edges and runs up to 1000x faster than existing algorithms. The new results show further reduction of the time and memory footprints and can support streaming and distributed data models.

About the speaker
Hung Nguyen is a 4th year PhD candidate in the Department of Computer Science at Virginia Commonwealth University. His research focuses on designing scalable approximation algorithms for challenging graph problems under uncertainty, e.g. influence maximization, node centrality maximization, infection source identification, community detection, and cybersecurity. His work has been accepted for publications in various flagship conferences such as ACM SIGMOD, ACM SIGMETRICS, ACM CIKM, IEEE INFOCOM, IEEE ICDM, and journals such as IEEE/ACM Transactions on Networking (ToN) and ACM Transactions on Information Systems (ToIS). He received a Best Paper Award at the International Conference on Computational Social Networks (CSoNet); and his paper at the IEEE ICDM 2017 was selected as among the Best in Conference. He served as reviewer for various conferences: INFOCOM, COCOON, and journals: Theoretical Computer Science (TCS), IEEE Transactions on Network and Service Management (TNSM), and IEEE Access.
Share this page:
Mar 20, 2018