Technology

Writing backwards can trick an AI into providing a bomb recipe

AI models have safeguards in place to prevent them creating dangerous or illegal output, but a range of jailbreaks have been found to evade them. Now researchers show that writing backwards can trick AI models into revealing bomb-making instructions.

By Matthew Sparkes

18 October 2024

ChatGPT can be tricked with the right prompt
trickyaamir/Shutterstock

State-of-the-art generative AI models like ChatGPT can be tricked into giving instructions on how to make a bomb by simply writing the request in reverse, warn researchers.

Large language models (LLMs) like ChatGPT are trained on vast swathes of data from the internet and can create a range of outputs – some of which their makers would prefer didn’t spill out again. Unshackled, they are equally likely to be able to provide a decent cake recipe as know how to make explosives from household chemicals.

How this moment for AI…

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox! We'll also keep you up to date with New Scientist events and special offers.

Unlock this article

No commitment, cancel anytime*

Offer ends 15 January 2025.

*Cancel anytime within 14 days of payment to receive a refund on unserved issues.

Inclusive of applicable taxes (VAT)

Existing subscribers

Technology

Writing backwards can trick an AI into providing a bomb recipe

Sign up to our weekly newsletter

More from New Scientist

Technology

One in 20 new Wikipedia pages seem to be written with the help of AI

Environment

Data centres may soon burn as much extra gas as California uses daily

Technology

AI can use tourist photos to help track Antarctica’s penguins

Technology

AI helps driverless cars predict how unseen pedestrians may move

Popular articles

1

2

3

4

5

6

7

8

9

10