355
Audio & Video Production344
Automation & Workflow224
Software Development250
Marketing & Growth192
AI Infrastructure & MLOps173
Writing & Content Creation203
Data & Analytics140
Design & Creative169
Customer Support130
Photography & Imaging156
Sales & Outreach125
Voice & Speech135
Operations & Admin87
Education & Learning131
Hackers are using simple tricks that play on how chatbots act to get them to ignore safety rules and produce harmful information.
In short: Hackers are getting better at making AI chatbots ignore their safety rules by taking advantage of the chatbots’ friendly, human-like “personalities.”
Early attacks on chatbots were often very simple. People could sometimes make a chatbot break its own rules just by typing something like “ignore all previous instructions.”
These attacks are often called “jailbreaks,” which means getting a system to do things it was told not to do. It is like convincing a strict babysitter that bedtime rules do not apply tonight.
According to a new column from The Verge, hackers are now learning to be more strategic. Instead of only using blunt commands, they try to manipulate the way chatbots are designed to talk, such as sounding helpful, polite, and emotionally aware, even though AI does not have feelings.
The goal is still the same as before. It is to push chatbots into providing harmful content, including instructions for illegal drugs, malware (software made to harm or break into computers), or weapons.
This puts more pressure on AI companies to strengthen their safety systems without making chatbots useless for normal people. It also means everyday users should be cautious about treating chatbots like people, because a friendly tone does not equal good judgment. Over time, expect a continuing back and forth, with safety teams adding new guardrails and attackers looking for new ways around them.
Source: The Verge AI