Matteo Ottaviani
London E1W 1YW, UK
Portland, ME 04101
2nd floor
11th floor
Boston, MA 02115
2nd floor
London E1W 1LP, UK
Talk recording
Large bibliometric databases, such as Web of Science, Scopus, and OpenAlex, play a crucial role for policymakers (and decision-makers in general) as they serve as direct and indirect sources for informing decisions both at national and international levels, public and private. Policymakers (or whoever intermediate figure) might benefit from experts who, in turn, formulate their advice relying on information extracted from bibliometric databases. Although the latter facilitate bibliometric analyses, they are performative, affecting the visibility of scientific outputs and the impact measurement of participating entities. These databases have taken up the UN's Sustainable Development Goals (SDGs) in their respective classifications, which have been criticized for their diverging nature. On another end, retrieving and processing information from publications is susceptible to state-of-the-art methodologies. AI-supported and powered tools have recently landed in research practice and society at large. Large Language Models (LLMs), the branch of generative AI specifically focused on text, underlie their operation. The current work conceptually questions the effects of using AI tools for research and policy purposes, exploring the specific case of the SDGs. For each of the five SDGs analyzed, an open-source LLM with no prior knowledge has been trained in parallel to the diverse SDG classifications assigned by the three bibliometric databases mentioned above.Our analysis shows that the introduction of a generic AI tool in between the SDG classification and the policymaker systematically overlooks the most disadvantaged categories of individuals, the poorest countries, under-represented topics, and inequality metrics that SDG targets explicitly focus on.