> In an analysis of nearly 5,000 summaries created by popular AI models—including versions of ChatGPT, Claude, LLaMA, and DeepSeek—researchers found that many systems consistently exaggerated the conclusions of original research papers. The problem wasn’t just occasional sloppiness. In some newer models, overgeneralization happened in up to 73% of summaries. That’s a big concern when people rely on these tools to explain health research, inform policies, or guide classroom learning.
https://royalsocietypublishing.org/doi/10.1098/rsos.241776
#SalamiAI #AISalami #LLMsAndScience