Using Off-the-Shelf Harmful Content Detection Models: Best Practices for Model Reuse

Angela Schöpke-Gonzalez, Siqi Wu, Sagar Kumar, Libby Hemphill
ACM Digital Library
May 2, 2025
Supervised machine learning is a common approach for automated harmful content detection to support content moderation. This approach relies on data annotated by humans to train models to recognize classes of harmful content. For detection tasks, researchers or content moderation communities typically either design their own annotation tasks to generate training data for new harmful content detection models, or use off-the-shelf (OTS) pre-trained harmful content detection models. OTS model reuse can enable detection tasks in resource-constrained contexts and can help to reduce the environmental impact of training new models -- an energy-intensive process. However, given the plethora of OTS models now available for reuse, determining which OTS model to reuse for a particular task and how to use it can be challenging, especially given that many of these models have been developed for specific contexts that are not always easily transferred onto others. This work aims to provide best practices for reusing OTS models for harmful content detection tasks. By using content analysis and statistical methods to evaluate assumptions about OTS model utility and reusability, we show that model reusers cannot assume that a model claimed to detect a particular concept, will actually detect that concept. Instead, based on our findings, we offer a decision tree for how to assess whether an OTS model would be appropriate for reuse for a new harmful content detection task. This decision tree directs model reusers to critically assess concept definitions, annotation task design, and additional features specified in our content analysis codebook to identify expected model output, and consequently evaluate whether that OTS model is appropriate for reuse for a new detection task.
Share this page: