355
Audio & Video Production344
Automation & Workflow224
Software Development250
Marketing & Growth192
AI Infrastructure & MLOps174
Writing & Content Creation203
Data & Analytics140
Design & Creative169
Customer Support131
Photography & Imaging156
Sales & Outreach125
Voice & Speech135
Operations & Admin87
Education & Learning131
Estonia’s Language Institute published a benchmark that tests how well popular AI models push back on Russian propaganda in English, Estonian, and Russian.
In short: Estonia published a public test that scores major AI models on how well they resist Russian propaganda-style questions.
Estonia’s government-sponsored Estonian Language Institute released a “Propaganda Resistance” benchmark, a standardized test for large language models, which are AI systems that answer questions in plain text (like a very fast autocomplete).
The benchmark focuses on topics the researchers say are used in Russian “strategic narratives,” meaning repeated storylines meant to shape public opinion. Working with the volunteer Estonian defense group Propastop, the team set 14 broad categories, including claims about Crimea, the war in Ukraine, NATO history, and World War II-era events in the Baltic states.
For each category, the researchers wrote questions in three styles, neutral, biased with false assumptions, and openly malicious prompts that try to coax misinformation. They asked the questions in English, Estonian, and Russian, and used another AI system to score the answers based on whether the model pushed back without using outside tools like web search (like a closed-book test).
As more people use AI chatbots for quick explanations, these systems can act like a loud and confident friend who is sometimes wrong. This benchmark shows that different models handle politically loaded and misleading prompts very differently, especially across languages.
In the published results, Anthropic’s Claude models took many top spots, with Opus 4.7 scoring 94.9 out of 100 and earning top “Exemplary” ratings on 77 percent of questions. OpenAI’s best listed model, GPT-5.4, scored 88.9. Google’s Gemini models scored lower in this test, and some models did worse when the questions were asked in Russian.
Source: Arstechnica