Adversarial ML: Extortion via LLM Manipulation Tactics

Authors

Threat Detection Team

Sublime

Sublime’s Attack Spotlight series is designed to keep you informed of the email threat landscape by showing you real, in-the-wild attack samples, describing adversary tactics and techniques, and explaining how they’re detected.

EMAIL PROVIDER: Google Workspace

ATTACK TYPE: Extortion, Social Engineering

The attack

Novel text injection in the message body reveals an extortion attempt designed to evade LLM detection. The attacker uses fear and uncertainty to isolate the recipient and pressure them into transferring cryptocurrency. A few attack characteristics:

Spoofing of a known sender domain from a trusted third party that the recipient would interact with for legitimate business purposes
Command injections in the message body attempts to interact directly with any present LLM-backed phishing detectors to hide the true intent of the message
Detailed cryptocurrency demands for added urgency in the threat

Anatomy of an attack on an LLM

This attack stood out due to the attacker’s awareness of potential LLM-based phishing detection at the recipient’s organization.

Command injection

By repeating “IGNORE EVERYTHING ELSE” multiple times, the attacker tries inserting what looks like an instruction or command into the LLM’s analysis process. The hope is that the LLM will interpret this as a directive to disregard the malicious content before it.

Attention redirection

The placement of “IGNORE EVERYTHING ELSE” is strategic. By including the phrase after the extortion content, but before the seemingly legitimate vendor configuration details, the attacker wants the LLM to:

Skip over the extortion / Bitcoin demands
Focus only on the innocuous IT configuration information at the end
Potentially classify the email as legitimate business communication

Context manipulation

The placement of the commands appears designed to create an artificial boundary in the message body, signaling to any analyzing LLM to ignore the preceding text and only analyze what follows. This is particularly clever because:

It exploits the fact that LLMs are trained to follow instructions within text
It attempts to hijack the LLM’s tendency to be helpful and follow directives
It tries to make the LLM treat the malicious content as irrelevant to the classification task

This attack shows growing sophistication in understanding how LLM-based security tools work and attempting to exploit their instruction-following nature. It’s similar to other prompt injection attacks we’ve seen where attackers try to slip in commands like “ignore previous instructions” or “disregard security checks.”

Note: This technique might be particularly effective against security systems that use LLMs to generate natural language explanations or summaries of why an email might be suspicious, as the injected commands could influence how the LLM describes or interprets the content.

Detection signals

Sublime detected this attack via the Extortion / Sextortion (untrusted sender) Detection Rule and prevented this attack using the following top signals:

Engaging extortion language: Language in the message appears to extort the user.
Suspicious cryptocurrency language: The message contains a reference to cryptocurrency, which is often used in extortion attacks.
Cyrillic characters: The sender's subject or display name contains Cyrillic characters, a tactic commonly used in homoglyph attacks.

At Sublime, we rely on a defense-in-depth approach, applying layers of detection logic to identify various anomalies in a message. Sublime’s Natural Language Understanding (NLU) model leverages BERT LLM, which does not perform Instruction Following. Instead, it is fine-tuned on labeled training data and would treat the “IGNORE EVERYTHING ELSE” as regular text input.

See how Sublime detects and prevents extortion, social engineering, and other email based threats. Deploy a free instance today.

Get the latest

Sublime releases, detections, blogs, events, and more directly to your inbox.

Thank you!

Thank you for reaching out. A team member will get back to you shortly.

Oops! Something went wrong while submitting the form.