Technology

AIs are more likely to mislead people if trained on human feedback

If artificial intelligence chatbots are fine-tuned to improve their responses using human feedback, they can become more likely to give deceptive answers that seem right but aren’t

By Edd Gent

2 October 2024

Illustration of a chatbot icon on a digital blue wavy background — Striving to come up with answers that please humans may make chatbots more likely to pull the wool over our eyes
JuSun/Getty Images

Giving AI chatbots human feedback on their responses seems to make them better at giving convincing, but wrong, answers.

The raw output of large language models (LLMs), which power chatbots like ChatGPT, can contain biased, harmful or irrelevant information, and their style of interaction can seem unnatural to humans. To get around this, developers often get people to evaluate a model’s responses and then fine-tune it based on this feedback.

Using an AI chatbot…

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox! We'll also keep you up to date with New Scientist events and special offers.

Unlock this article

No commitment, cancel anytime*

Offer ends 15 January 2025.

*Cancel anytime within 14 days of payment to receive a refund on unserved issues.

Inclusive of applicable taxes (VAT)

Existing subscribers

Technology

AIs are more likely to mislead people if trained on human feedback

Sign up to our weekly newsletter

More from New Scientist

Technology

Using an AI chatbot or voice assistant makes it harder to spot errors

Technology

AIs get worse at answering simple questions as they get bigger

Technology

AIs get better at maths if you tell them to pretend to be in Star Trek

Technology

Why the T in ChatGPT is AI's biggest breakthrough - and greatest risk

Popular articles

1

2

3

4

5

6

7

8

9

10