Combating GenAI Email Attacks with BERT LLM

Authors

Aryan Luthra

ML Research

Vivek Sharath

ML Engineering

The Persistence of Generative AI (GenAI) in Email Phishing Attacks

The increased use of Generative AI in email attacks enables a higher level of sophistication while lowering the barrier of entry for attackers, allowing them to create highly personalized phishing emails with minimal investment. We've observed a continued surge in these AI-enabled phishing attempts, especially in Business Email Compromise (BEC) and vendor impersonation campaigns.

We've identified likely AI-generated emails through a comprehensive approach:

Pattern Recognition Across Campaigns: remarkably similar and sophisticated phishing campaigns targeting multiple customers that hinted at a common, automated origin
Verification of Sender Authenticity: The supposed senders were not the genuine authors of these email threads
Utilizing GenAI Detectors: tested against freely available Generative AI detectors to assess the likelihood that these messages were machine-generated.

These AI-crafted phishing emails are actively circulating in the wild and becoming more convincing with each iteration.

*A real-world example of probable GenAI in the wild*

The Futility of Detecting Emails as GenAI

Why investigate if an email was penned by a human or conjured up by AI? It's a bit like trying to guess if a novel was written on a typewriter or a laptop—it doesn't change the story. Yes, there are techniques, like Perplexity Analysis, that attempt to measure how "surprised" a language model is by certain word choices and sequences, aiming to flag AI-generated text. The catch is that attackers can easily outsmart these methods. Simple tweaks like rephrasing sentences, shuffling word order, or tossing in a few typos can throw off the detection entirely. This focus on identifying an email as GenAI written is a red herring and not the right goal to chase.

Indicators, contextual clues, abused infrastructure, and other Tactics, Techniques, and Procedures (TTPs) reveal malicious intent. Our mission is to detect these signals, no matter who—or what—wrote the message.

At Sublime, we use a defense-in-depth strategy with Message Query Language (MQL) at the core of our detection engine, with machine learning-backed enrichment functions like Natural Language Understanding (NLU) to surface anomalous behavior. This proactive approach flags attacks before they cause any damage. NLU lets us dive deep into the language, analyzing tone, intent, and context to catch the subtler signs of malicious activity.

Leveling Up Natural Language Understanding

In a previous post, we showed how we use NLU to perform two critical tasks: Intent Classification and Named Entity Recognition. Leveling up Intent Classification, or deciphering the underlying objective of an email message, is key to identifying a GenAI attack.

With Generative AI at their fingertips, attackers can effortlessly generate thousands of email templates paraphrasing the same malicious intent, all designed to slip past traditional detectors. Naive language models that rely on spotting common word patterns often miss subtle nuances, leaving the door open for sophisticated attacks.

Rather than settle for the status quo, we're actively enhancing our NLU capabilities. We’re now better equipped to catch these shape-shifting threats by focusing on understanding tone, intent, and context—not just the words themselves. Our models are not just reading emails; they’re genuinely comprehending them.

BERT: An Adaptable, Context-Aware Language Model

We’ve released a fine-tuned BERT (Bidirectional Encoder Representations from Transformers) LLM as a part of our Natural Language Understanding engine. The BERT architecture strengthens our ability to provide users with a deeper level of contextual understanding necessary to detect sophisticated, polymorphic, language-centric email attacks.

*All the ways BERT can interpret the word “run”*

BERT models employ:

Bidirectional processing and an attention mechanism to consider both the left and right contexts of each word. This allows for the relevant portions of text to be targeted for understanding, much like a security analyst would read an email for suspicious content. Moreover, this is the foundation for capturing long-range dependencies within the text, and identifying relationships between phrases that may be far apart (a tactic in lengthier, more complex attacks).‍
Subword tokenization to break down the text and better handle out-of-vocabulary words and phrases (e.g., homoglyphs like “0utl00k”, intentional misspellings like “urgant”). The result is a language model capable of better disambiguation, with a deeper understanding of attack intent.
‍Pre-training and Transfer Learning: The base BERT model is pre-trained on a vast corpus of open-source text, giving it a broad understanding of language use. We then fine-tuned the model on a large dataset of known phishing emails. This additional training allows BERT to combine its general language understanding with domain-specific knowledge of phishing tactics, enabling it to better generalize to the latest attacks and reduce false positives.

Data Augmentation: Using GenAI to our Advantage

Why let attackers have all the fun? We're bolstering our defenses by utilizing the same GenAI techniques that adversaries employ.

We feed anonymized versions of real attack emails—stripped of any personal or sensitive information—into secure, confidential, local AI models to ensure data remains private and isn't stored. These models generate variations of these malicious messages, simulating the diverse tactics that attackers might use in the wild—even sometimes anticipating new attack patterns yet to emerge.

This process allows us to build a rich and varied dataset of potential threats that reflects the latest tricks used by cybercriminals, including experimental GenAI techniques. We use these examples to rigorously test our models and detection rules, identifying weaknesses or blind spots. Essentially, this serves as a form of AI-assisted red-teaming for our models.

We’ve taken this a step further by incorporating select samples into our training data and using the findings to finetune BERT even further and teach our models to recognize these threats. Encountering an AI-generated attack in the wild becomes much easier to handle, as our models have already been exposed to similar malicious examples, bolstering NLU’s robustness. By teaching our models new and unseen patterns, we enhance their ability to generalize to novel attack vectors, improving the detection of even the most subtle signs of malicious intent.

Why BERT? Fast & Accurate at Scale

Fast and accurate are two words that rarely go together. You might think you must choose one or the other, but BERT is different. This model delivers on both speed and accuracy, and we ran a test to illustrate this.

First, we generated thousands of synthetic email samples using the methodology described above. Then, with the help of our security and detection teams, we hand-picked the hardest ones to create a challenging benchmark dataset for our candidate models to battle it out.

We threw DistilBERT (a variant of the BERT family) into the ring with heavyweight language models like Mistral and Llama2 for an email content classification showdown. We tested a fine-tuned DistilBERT model against out-of-box Mistral and Llama2 as zero-shot classifiers. While the complete analysis deserves its own post, below are the key findings:

Fine-tuned DistilBERT significantly outperformed zero-shot classifiers, with 70% fewer misclassifications, 66% less false positives, and 67% fewer false negatives.
DistilBERT's prediction speed was, on average, ~15 times faster than other classifiers.
DistilBERT features a dramatically reduced memory footprint of just 200MB, which is 20 times smaller than the ~4GB required by Llama and Mistral models.

Sublime’s NLU engine needs to operate at scale, protecting thousands of mailboxes. DistilBERT’s ability to produce both fast and accurate classifications in its compact form factor makes it an ideal candidate for this task.

Putting It All Together: Detecting Business Email Compromise in MQL

Sample BEC Attempt

‍Subject: Outstanding Payments

Sender: Carol Smith <carol.smith@acmecorp.com>

‍

Step-by-Step Detection via Layered Defense

We use a multi-layered strategy to detect attacks like the one above, including a combination of dynamic NLU and behavioral signals defined within MQL. We designed our NLU intent classification model to analyze the email body and identify the core elements that produce a narrative of an urgent financial request disguised as a routine administrative task.

With the intent clearly identified, we layer additional detection signals to zero in on suspicious patterns and ensure comprehensive coverage against Vendor Impersonation attacks. These include:

Lookalike sender domains
Urgency in the subject line
Urgency in the message body
Financial or invoice language
Display name impersonation

Using MQL, we can detect Vendor Impersonation attempts with a simple rule:

By coalescing these indicators together, we call back to the attack's core primitives: behavioral indicators, contextual clues, vernacular, call-to-action (CTA) markers, and pattern analysis.

Each signal provides a unique vantage point allowing us to piece together subtle yet critical indicators that can be easily missed in isolation.

This holistic + heuristic approach makes our detection strategy resilient, adaptable, and highly effective, ensuring that even the most convincing impersonation attempts are surfaced.

GenAI as an Asset for Defenders

As attackers evolve their tactics and leverage tools like GenAI to lower their costs, it's clear that our defenses must adapt alongside them. By incorporating synthetic data into our defenses, we're transforming GenAI from an adversary tool into a powerful asset for security. Combining this robust dataset with our updated, more powerful NLU model helps us catch more threats, reduce false positives, and establish a strong foundation for future detections.

See how NLU detects sophisticated attacks by scheduling a platform demo.

Get the latest

Sublime releases, detections, blogs, events, and more directly to your inbox.

Thank you!

Thank you for reaching out. A team member will get back to you shortly.

Oops! Something went wrong while submitting the form.