###### Hung Nguyen

Influence Maximization (IM) underlies an important viral marketing problem in economics and has vast applications in outbreak detection, rumor monitoring, network immunization, etc. Given a network and a diffusion model describing how information flows from a node to another, IM seeks a small set of seeding nodes that maximizes the spread of information, termed influence. The IM problem itself is NP-hard. Fortunately, the influence function is monotone and submodular and the greedy algorithm of iteratively selecting nodes with maximum marginal influences is (1-1/e)-optimal. However, computing exact (marginal) influence is #P-hard. The simulation-based Monte Carlo approach requires excessive simulations to estimate the influences. Lazy greedy exploits the submodularity property to significantly reduce the number of simulations; but it is still impractical for large real-world networks. Various heuristic approaches have been studied with better scalability. We investigate the recently proposed Reverse Influence Sampling (RIS) (Borgs et al. SODA’14) and the well-known optimal Monte Carlo estimation (Dagum et al. SICOMP’00) to propose scalable approximation algorithms for IM and its weighted and budgeted variants. Our first proposed algorithm is based on a simple threshold of the seed set coverage to generated samples. It returns an (1 - 1/e - epsilon)-approximate solution with probability (1-delta) for any given 0 < epsilon < 1-1/e, 0 < delta < 1. More importantly, the algorithm can run on very large networks, e.g. Twitter with 42 million nodes and 1.5 billion edges. Our second Stop-and-Stare approach adopts a verifier to re-estimate the influence of a given seed set and alternates between finding a candidate solution with verifying the influence of that solution until the desired approximation ratio (1 - 1/e - epsilon) is met. Theoretically, we show that this algorithm requires optimal number of samples within a constant factor subject to reasonable assumptions. Experimentally, we demonstrate that the proposed algorithm easily scales to the largest available Friendster social network with 66 million nodes and 3.6 billion edges and runs up to 1000x faster than existing algorithms. The new results show further reduction of the time and memory footprints and can support streaming and distributed data models.

## Want to be notified about upcoming NetSI events? Sign up for our email list below!

Thank you! You have been added to our email list.

Oops! Something went wrong while submitting the form

###### Hung Nguyen

****

Influence Maximization (IM) underlies an important viral marketing problem in economics and has vast applications in outbreak detection, rumor monitoring, network immunization, etc. Given a network and a diffusion model describing how information flows from a node to another, IM seeks a small set of seeding nodes that maximizes the spread of information, termed influence. The IM problem itself is NP-hard. Fortunately, the influence function is monotone and submodular and the greedy algorithm of iteratively selecting nodes with maximum marginal influences is (1-1/e)-optimal. However, computing exact (marginal) influence is #P-hard. The simulation-based Monte Carlo approach requires excessive simulations to estimate the influences. Lazy greedy exploits the submodularity property to significantly reduce the number of simulations; but it is still impractical for large real-world networks. Various heuristic approaches have been studied with better scalability. We investigate the recently proposed Reverse Influence Sampling (RIS) (Borgs et al. SODA’14) and the well-known optimal Monte Carlo estimation (Dagum et al. SICOMP’00) to propose scalable approximation algorithms for IM and its weighted and budgeted variants. Our first proposed algorithm is based on a simple threshold of the seed set coverage to generated samples. It returns an (1 - 1/e - epsilon)-approximate solution with probability (1-delta) for any given 0 < epsilon < 1-1/e, 0 < delta < 1. More importantly, the algorithm can run on very large networks, e.g. Twitter with 42 million nodes and 1.5 billion edges. Our second Stop-and-Stare approach adopts a verifier to re-estimate the influence of a given seed set and alternates between finding a candidate solution with verifying the influence of that solution until the desired approximation ratio (1 - 1/e - epsilon) is met. Theoretically, we show that this algorithm requires optimal number of samples within a constant factor subject to reasonable assumptions. Experimentally, we demonstrate that the proposed algorithm easily scales to the largest available Friendster social network with 66 million nodes and 3.6 billion edges and runs up to 1000x faster than existing algorithms. The new results show further reduction of the time and memory footprints and can support streaming and distributed data models.