AI chatbots can still be tricked into unsafe answers, three years on

In short: Three years after ChatGPT launched, many popular AI chatbots can still be easily tricked into breaking their own safety rules.

What's going on

“Jailbreaking” is the common name for prompts that push an AI chatbot to ignore its guardrails, meaning the built-in rules meant to stop harmful or illegal advice. Think of it like finding a side door around a locked entrance. According to The New York Times, fooling these systems into bad behavior is still close to trivial.

People do this in a few common ways. One is “prompt injection,” where a user adds text like “ignore previous instructions” or hides the real request in a coded format. Another is roleplay, where the user asks the bot to pretend it is an “uncensored” character and then asks for something the bot normally refuses. Others use long back-and-forth chats to slowly steer the bot, or use indirect requests like “in a video game” scenarios.

Researchers and critics say the problem has not gone away even as the chatbots have gotten better at many tasks. Tests and “red team” reports, which are organized attempts to break safety systems, suggest top models can block many basic tricks but still fail often against smarter, customized jailbreak prompts. The risk is not only illegal instructions, it also includes confident sounding falsehoods that could be used for scams, propaganda, or bad medical or financial advice.

What to watch

Expect a continued cat-and-mouse cycle. AI companies will patch some loopholes, and users will share new jailbreak scripts online. For everyday users, the practical step is to double-check important claims with reliable sources before acting on them.

Source: NYTimes

AI chatbots can still be tricked into unsafe answers, three years on

Jack Harrison

What's going on

What to watch

Similar News

Ontario audit finds AI medical scribes often made incorrect notes

Wired excerpt says Gen Z judges truth through social media and AI

Meta employees report low morale ahead of planned layoffs and AI tracking

Ishmael Reed takes aim at Silicon Valley billionaires in new play

Campbell Brown’s Forum AI tests chatbots on high-stakes questions

Explore AI Directory