Attack spotlight

Adversarial ML: Extortion via LLM Manipulation Tactics

October 30, 2024

Sublime Security Attack Spotlight: Social Engineering attack that employs command and text injection in the message body to evade LLM detection.

On this page

Ready to see Sublime  in action

Get a demo

Authors

Threat Detection Team

Sublime

Sublime’s Attack Spotlight series is designed to keep you informed of the email threat landscape by showing you real, in-the-wild attack samples, describing adversary tactics and techniques, and explaining how they’re detected.

EMAIL PROVIDER: Google Workspace

ATTACK TYPE: Extortion, Social Engineering

The attack

Novel text injection in the message body reveals an extortion attempt designed to evade LLM detection. The attacker uses fear and uncertainty to isolate the recipient and pressure them into transferring cryptocurrency. A few attack characteristics:

Spoofing of a known sender domain from a trusted third party that the recipient would interact with for legitimate business purposes
Command injections in the message body attempts to interact directly with any present LLM-backed phishing detectors to hide the true intent of the message
Detailed cryptocurrency demands for added urgency in the threat

Anatomy of an attack on an LLM

This attack stood out due to the attacker’s awareness of potential LLM-based phishing detection at the recipient’s organization.

Command injection

By repeating “IGNORE EVERYTHING ELSE” multiple times, the attacker tries inserting what looks like an instruction or command into the LLM’s analysis process. The hope is that the LLM will interpret this as a directive to disregard the malicious content before it.

Attention redirection

The placement of “IGNORE EVERYTHING ELSE” is strategic. By including the phrase after the extortion content, but before the seemingly legitimate vendor configuration details, the attacker wants the LLM to:

Skip over the extortion / Bitcoin demands
Focus only on the innocuous IT configuration information at the end
Potentially classify the email as legitimate business communication

Context manipulation

The placement of the commands appears designed to create an artificial boundary in the message body, signaling to any analyzing LLM to ignore the preceding text and only analyze what follows. This is particularly clever because:

It exploits the fact that LLMs are trained to follow instructions within text
It attempts to hijack the LLM’s tendency to be helpful and follow directives
It tries to make the LLM treat the malicious content as irrelevant to the classification task

This attack shows growing sophistication in understanding how LLM-based security tools work and attempting to exploit their instruction-following nature. It’s similar to other prompt injection attacks we’ve seen where attackers try to slip in commands like “ignore previous instructions” or “disregard security checks.”

Note: This technique might be particularly effective against security systems that use LLMs to generate natural language explanations or summaries of why an email might be suspicious, as the injected commands could influence how the LLM describes or interprets the content.

Detection signals

Sublime detected this attack via the Extortion / Sextortion (untrusted sender) Detection Rule and prevented this attack using the following top signals:

Engaging extortion language: Language in the message appears to extort the user.
Suspicious cryptocurrency language: The message contains a reference to cryptocurrency, which is often used in extortion attacks.
Cyrillic characters: The sender's subject or display name contains Cyrillic characters, a tactic commonly used in homoglyph attacks.

At Sublime, we rely on a defense-in-depth approach, applying layers of detection logic to identify various anomalies in a message. Sublime’s Natural Language Understanding (NLU) model leverages BERT LLM, which does not perform Instruction Following. Instead, it is fine-tuned on labeled training data and would treat the “IGNORE EVERYTHING ELSE” as regular text input.

See how Sublime detects and prevents extortion, social engineering, and other email based threats. Deploy a free instance today.

Heading

About the authors

Threat Detection Team

Sublime

The Threat Detection team at Sublime is responsible for monitoring environments to discover emerging email attacks and developing new Detection Rules for the Core Feed.

Get the latest

Sublime releases, detections, blogs, events, and more directly to your inbox.

Thank you!

Thank you for reaching out. A team member will get back to you shortly.

Oops! Something went wrong while submitting the form.

March 18, 2026

Attack spotlight

Advanced fake Zoom installer used for delivering malware

Kyle Eaton

Detection

Threat Research Team

Sublime

March 10, 2026

Sublime news

Announcing Sublime Email DLP: Data loss prevention at the outbox

AJ Williams

Product Manager

Madison Caldwell

Engineering

Gregory Climer

Engineering

March 3, 2026

Sublime news

How we built high speed threat hunting for email security

Hugh Oh

Engineering

See all blog posts

Frequently asked questions

What is email security?

Email security refers to protective measures that prevent unauthorized access to email accounts and protect against threats like phishing, malware, and data breaches. Modern email security like Sublime use AI-powered technology to detect and block sophisticated attacks while providing visibility and control over your email environment.