344
Productivity & Workflow355
Automation & Workflow224
Software Development250
Marketing & Growth192
AI Infrastructure & MLOps174
Writing & Content Creation203
Data & Analytics141
Design & Creative169
Photography & Imaging156
Customer Support131
Sales & Outreach125
Voice & Speech135
Education & Learning131
Operations & Admin87
A Financial Times analysis says AI tests often measure different things, like how often AI can succeed at a risky task versus how reliably it can do everyday work.
In short: A Financial Times analysis says AI can look very capable on some tests, but those scores may not show whether it is reliable enough for everyday jobs.
Researchers and companies often talk about “how capable” today’s AI is, but they are not always measuring the same thing. The Financial Times points out that some popular tests were designed to answer a safety question, not a workplace question.
One safety question is whether AI can sometimes succeed at tasks that could enable cyber attacks, like finding ways into computer systems. For this kind of risk, even a 50 percent success rate can be a big problem, because an attacker only needs it to work once in a while. It is like a lock that fails half the time, it is still a serious security risk.
A workplace question is different. To replace a person at work, AI usually needs to be consistent and dependable, with results closer to 100 percent. Offices also involve messy situations, like unclear instructions, changing goals, and working with other people, which are harder to score in simple tests.
The article compares two approaches. METR, an AI research group, tracks how long and complex a coding task an AI can finish with at least a 50 percent success rate. A separate approach from Princeton University researchers looks more like safety standards used in areas like aviation, focusing on how confident we can be that AI will almost always succeed, and it finds slower progress.
The Financial Times suggests the next focus may be reliability, not just higher scores on “sometimes succeeds” tests. Businesses have to decide where AI is safe to use, and where it needs strong checks, especially in cyber security.
Source: Financial Times