Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: The use of large language models (LLMs) is becoming common in political
science and digital media research. While LLMs have demonstrated ability in
labelling tasks, their effectiveness to classify Political Content (PC) from
URLs remains underexplored. This article evaluates whether LLMs can accurately
distinguish PC from non-PC using both the text and the URLs of news articles
across five countries (France, Germany, Spain, the UK, and the US) and their
different languages. Using cutting-edge models, we benchmark their performance
against human-coded data to assess whether URL-level analysis can approximate
full-text analysis. Our findings show that URLs embed relevant information and
can serve as a scalable, cost-effective alternative to discern PC. However, we
also uncover systematic biases: LLMs seem to overclassify centrist news as
political, leading to false positives that may distort further analyses. We
conclude by outlining methodological recommendations on the use of LLMs in
political science research.
Key Contributions
Evaluates the ability of LLMs to classify political content (PC) from URLs versus full text across five countries and languages. It demonstrates that URLs can be a scalable, cost-effective alternative for PC classification but also uncovers systematic biases, such as overclassifying centrist news as political.
Business Value
Enables more efficient and scalable analysis of political discourse and media trends, aiding researchers, journalists, and policymakers in understanding public opinion and media influence.