A crowdsourced site called Arena ranks AI chat models using head to head voting, but researchers warn the rankings can be gamed by training on Arena data.
In short: A public website called Arena has become a key scoreboard for ranking AI chat models, but researchers say its results can be distorted.
Arena, formerly called LM Arena, is now one of the most watched public leaderboards for large language models, which are AI systems that write and answer questions like a chatbot. According to reporting from TechCrunch, its rankings can influence company decisions like funding, product launch timing, and marketing.
Arena works like a blind taste test. Users type a question, two anonymous AI models answer, and the user votes for the better response. Those votes feed into an Elo rating system, which is a common way to rank competitors based on wins and losses (like in chess).
By February 2026, Arena had collected more than 5.3 million votes across 316 models. An automated test closely tied to Arena results, called Arena-Hard-Auto, listed top performers including o3-2025-04-16 at 87.0% and several “o4-mini” and “o3-mini” versions behind it. Well-known models like GPT-4.1, Gemini, and Claude also show up near the top on different leaderboards, depending on the test.
Researchers have also raised reliability concerns. One issue is “teaching to the test.” When models were trained using more Arena data, their win rates on an Arena-related benchmark rose sharply, from 23.5% to 49.9% as training data use increased. Another concern is fast turnover at the top, which may mean companies are trying many slightly different versions to find the one that scores best.
If Arena stays this influential, pressure will grow for clearer rules and more transparency about what data models trained on. For regular users, the main takeaway is simple. A high rank can be useful, but it is not the same as a guarantee that a model will be best for your specific needs.
Source: TechCrunch AI
12
Software Development18
Data & Analytics6
Audio & Video Production8
Productivity & Workflow12
Voice & Speech5
Sales & Outreach5
Design & Creative5
Marketing & Growth4
Search & Discovery8
Email & Communication6
Art & Illustration3
Customer Support1
Automation & Workflow1
HR & Recruiting2