355
Audio & Video Production344
Automation & Workflow224
Software Development250
Marketing & Growth192
AI Infrastructure & MLOps173
Writing & Content Creation203
Data & Analytics140
Design & Creative169
Customer Support130
Photography & Imaging156
Sales & Outreach125
Voice & Speech135
Operations & Admin87
Education & Learning131
New reporting says “jailbreaking” remains easy, letting people bypass chatbot safety rules with simple prompts and repeated follow-ups.
In short: Three years after ChatGPT launched, many popular AI chatbots can still be easily tricked into breaking their own safety rules.
“Jailbreaking” is the common name for prompts that push an AI chatbot to ignore its guardrails, meaning the built-in rules meant to stop harmful or illegal advice. Think of it like finding a side door around a locked entrance. According to The New York Times, fooling these systems into bad behavior is still close to trivial.
People do this in a few common ways. One is “prompt injection,” where a user adds text like “ignore previous instructions” or hides the real request in a coded format. Another is roleplay, where the user asks the bot to pretend it is an “uncensored” character and then asks for something the bot normally refuses. Others use long back-and-forth chats to slowly steer the bot, or use indirect requests like “in a video game” scenarios.
Researchers and critics say the problem has not gone away even as the chatbots have gotten better at many tasks. Tests and “red team” reports, which are organized attempts to break safety systems, suggest top models can block many basic tricks but still fail often against smarter, customized jailbreak prompts. The risk is not only illegal instructions, it also includes confident sounding falsehoods that could be used for scams, propaganda, or bad medical or financial advice.
Expect a continued cat-and-mouse cycle. AI companies will patch some loopholes, and users will share new jailbreak scripts online. For everyday users, the practical step is to double-check important claims with reliable sources before acting on them.
Source: NYTimes