A genetic epidemiology approach to cyber-security
While much attention has been paid to the vulnerability of computer networks to node and link failure, there is limited systematic understanding of the factors that determine the likelihood that a node (computer) is compromised. We therefore collect threat log data in a university network to study the patterns of threat activity for individual hosts. We relate this information to the properties of each host as observed through network-wide scans, establishing associations between the network services a host is running and the kind sof threats to which it is susceptible. We propose a methodology to associate services to threats inspired by the tools used in genetics to identify statistical associations between mutations and diseases. The proposed approach allows us to determine probabilities of infection directly from observation, offering an automated high-throughput strategy to develop comprehensive metrics for cyber-security.