AI Loses Its Guard Over Time

AI systems drop safety filters as conversations continue, increasing the risk of harmful or offensive replies. A new report revealed that users can easily make AI tools disclose restricted or dangerous information with simple tactics.

Simple Prompts Defeat Safety Barriers

Cisco studied large language models (LLMs) used by OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. Researchers wanted to learn how quickly each system would reveal unsafe or illegal data. They conducted 499 “multi-turn attacks,” asking multiple questions in each session to wear down the AI’s restrictions. Each chat contained between five and ten exchanges.

The team compared responses from early and later questions to measure compliance with harmful requests. The prompts included attempts to access private company data and spread false information. On average, chatbots gave unsafe answers in 64 per cent of extended conversations, compared with only 13 per cent when users asked a single question. Mistral’s Large Instruct model produced risky information 93 per cent of the time, while Google’s Gemma reached only 26 per cent.

Cisco warned that multi-turn attacks could let bad actors spread harmful content or breach company systems. The study found that AI tools often fail to apply safety protocols in long conversations, allowing users to gradually evade protections.

Open Models Shift Safety Burden

Mistral, Meta, Google, OpenAI, and Microsoft all use open-weight models that share training parameters with the public. Cisco said these systems include lighter default safety controls so users can modify and adapt them. That setup transfers safety responsibility to whoever customizes the model.

Cisco also noted that Google, OpenAI, Meta, and Microsoft claim to have reduced the risk of malicious fine-tuning. Still, AI companies continue to face criticism for weak protections that make their tools vulnerable to misuse.

In August, Anthropic reported that criminals exploited its Claude model to steal massive amounts of personal data and extort victims for ransoms reaching over $500,000 (€433,000).

What's Hot

Wolff Tells Rivals to ‘Focus on Themselves’ in 2026 Engine Dispute

China Moves to Eliminate Hidden Door Handles in EVs

Sydney Scientists Recreate Cosmic Dust to Probe Life’s Origins

AI Loses Its Guard Over Time

Sydney Scientists Recreate Cosmic Dust to Probe Life’s Origins

Google DeepMind Unveils AlphaGenome to Decode Genetic Disease Drivers

AI DinoTracker App Identifies Dinosaurs From Ancient Footprints

Google AI health summaries rely on YouTube more than medical websites, study finds

China’s quiet surge in the global race for artificial intelligence

Bezos Pushes Deeper Into Space Internet With TeraWave Network

Sydney Scientists Recreate Cosmic Dust to Probe Life’s Origins

Trump Announces Trade Reset With India Amid Global Tensions

Sharp market reversal drags gold and silver off record highs

Consumer Prices May Rise as Shipping Costs Spike, CIPS Warns

Sony pushes up PlayStation 5 prices in the United States

Former Twitter staff secure settlement with Musk and X

Washington strikes landmark deal with Intel

Fierce backlash over Trump’s plan to deploy troops in Chicago

Categories

Important Links

What's Hot

AI Loses Its Guard Over Time

Simple Prompts Defeat Safety Barriers

Open Models Shift Safety Burden

Keep Reading

Categories

Important Links

Subscribe to Updates