Introduction to Message Query Language (MQL)

Ross Wolf, Engineering

March 24, 2023

Message Query Language enables defenders to share protections against email attacks and powers all rules, insights, and hunts for Sublime.

Take control of your email environment

Deploy Sublime for Free
Request Demo

Sublime is the world’s first open email security platform that lets anyone write, run, and share rules in a universal domain-specific language (DSL) to block email-borne attacks, hunt for threats, and more. In our previous post, we shared how Sublime provides protection against email attacks and enables defenders to share detection rules with others.

In this post, we’re introducing Message Query Language (MQL), the language that drives all rules, insights, and hunts for Sublime. MQL is the same language used by our Detection and ML teams to stop emerging threats like Business Email Compromise (BEC), HTML smuggling, credential phishing, and other email attacks before they cause damage.

With MQL, defenders can also write their own tailored rules for attacks they’re seeing, modify any existing rule written by the Sublime team, use rules written by peers in the community, and transparently understand why a message was flagged in the first place.

If you’re familiar with other languages for detection like YARA, Sigma, Snort/Suricata, or Event Query Language (EQL), then you’ll feel right at home with MQL. If not, don’t worry! We designed MQL to be intuitive, flexible, and easy to use.

Message Data Model

When the Sublime Platform processes an email message, it’s first seen in the archaic text EML format. This is the standard for email, but as a text standard, it’s challenging to work with. Even with standards such as RFC5322, not all email conforms, and it’s still a plain text format that makes detection logic difficult.

Instead of dealing with raw text, Sublime parses the format into a highly structured schema, the Message Data Model (MDM), specifically with detection in mind. There’s no need to wrangle complex regular expressions just to search headers or the body.

Instead, it’s easy to find and use the relevant fields. The MDM separates attachments, body, headers, recipients and various other fields into a single document that is easily represented by JSON.

For example, the MDM enables you to check whether hyperlinks have mismatched display vs target URLs or to retrieve a specific hyperlinked top-level domain (TLD):

Parsed message body on the MDM

Similarly, the MDM’s parsed headers let you easily describe SPF, DMARC, or DKIM failures to detect spoofs, or mismatched MAIL FROM and ENVELOPE FROM values:

Parsed headers on the MDM

This schema is used by MQL when writing email detections. For example, typing the MQL snippet type.inbound uses the MDM’s type object and .inbound boolean field to describe inbound email messages. More on syntax in the next section.

Type object on the MDM, which tracks inbound vs outbound vs internal messages

Syntax overview

Inbound messages that contain at least one PDF attachment over 10MiB:

type.inbound
and any(attachments,
       .file_type == "pdf"
and .size > 10 * 1024 * 1024)

We designed MQL to be simple to read and write. Let’s dissect the above query to get a feel for the syntax:

type.inbound

Retrieve the field from the MDM, type -> inbound. This is only true on incoming messages to a mailbox.

and

Boolean AND between two terms. MQL uses plain English words like and instead of symbols like &&.

any(attachments, ...)

Check if at least one attachment on the MDM matches some criteria. In MQL, there are several functions to check arrays, such as any, all, and distinct. In an array function, fields on a nested item are referenced with a preceding dot (.).

. (dot)

Access a nested item. The leading . indicates that a field is relative to a nested item, not root fields on the MDM.

.file_type == "pdf"

Has a PDF file type

.size > 10*1024*1024

Has a file size greater than 10 MiB. We can use arithmetic operations to perform calculations on the fly with MQL.

The remaining core syntax, such as strings, literals, comments, and lists are designed to be intuitive. See the MQL syntax docs for a deeper dive.

Functions

All of Sublime’s novel detection capabilities are exposed via MQL in the same way: functions. Want to search for a substring or evaluate a regular expression? There’s a function for that. Check domain age via WHOIS? Function for that. Grab a screenshot from a URL and check if it looks like credential phishing? There’s a function for that, too.

There are a handful of top-level functions for the most common operations. The remaining functions are grouped in modules, which keeps them organized and easier to find. To do something with strings, type strings. and autocomplete will list what’s available (more on the rule editor later!). As of writing, these are the functions available:

Array functions:

  • all
  • any
  • distinct
  • filter
  • map

Top level functions:

  • coalesce
  • length

File analysis functions, starting with file.:

  • file.explode
  • file.oletools

Regular expressions, starting with regex.

  • regex.contains
  • regex.icontains
  • regex.match
  • regex.imatch

Strings functions, starting with strings.:

  • strings.concat
  • strings.contains
  • strings.icontains
  • strings.ends_with
  • strings.iends_with
  • strings.levenshtein
  • strings.ilevenshtein
  • strings.like
  • strings.ilike
  • strings.starts_with
  • strings.istarts_with

Machine learning functions, starting with ml.:

  • ml.macro_classifier
  • ml.nlu_classifier

And finally, we saved a few of our favorite new functions for last, currently under beta. :

  • beta.linkanalysis
  • beta.whois

Here’s a modified snippet of MQL from a Callback phishing rule that searches a ZIP file for images or PDFs, which are scanned for text with OCR. On the scanned text, this rule performs NLU to check if it contains text resembling a callback scam with high confidence.

It might sound complicated, but it’s actually just a few lines of MQL!

type.inbound
and any(attachments, .file_extension == "zip"
and any(file.explode(.),
       .file_extension
in~ ("pdf", "jpg", "jpeg", "png")
       
and any(ml.nlu_classifier(.scan.ocr.raw).intents,
               .name == "callback_scam"
               
and .confidence == "high"
               )
       )

Lists

The Sublime Platform also maintains Lists, which are a collection of strings or items that can be accessed from any rule. Builtin lists are automatically maintained by the Sublime platform, providing immediate context globally or historically for your environment. For anything else, you can create and manage custom lists in your Dashboard or via API.

To reference a list in MQL, include it with in or an array function, such as any.

Check that a sender’s domain is in the Tranco 1 Million:

sender.email.domain.domain in $tranco_1m

Check that a sender has never sent emails to your organization before:

sender.email.email not in $sender_emails

Check for a sender domain that’s highly similar to a domain that belongs to your organization (modified from our Lookalike sender domain rule)

type.inbound
and any($org_domains,
    strings.levenshtein(sender.email.domain.domain, .) == 1
)

Automatically synced lists, automatically synced with sublime-security/static-files on GitHub:

  • $alexa_1m
  • $disposable_email_providers
  • $file_extensions_common_archives
  • $file_extensions_macros
  • $free_email_providers
  • $free_file_hosts
  • $free_subdomain_hosts
  • $majestic_million
  • $suspicious_tlds
  • $tranco_1m
  • $umbrella_1m
  • $umbrella_1m_tld
  • $url_shorteners

Dynamically maintained lists from historical messages, used to maintain patterns of communication:

  • $sender_domains
  • $sender_emails
  • $recipient_emails
  • $recipient_domains

Dynamically maintained lists, which are synced with your upstream email provider:

  • $org_display_names
  • $org_domains
  • $org_slds

In addition to strings, lists can also contain more complex objects, like users in a group from a cloud email provider. For example, $org_vips is automatically created and is easily configured to point to any Azure AD group or Google Group.

Here’s a snippet of MQL from a VIP Impersonation Rule that looks for sender display names matching someone in the VIP list, with an urgent tone, from a new sender:

type.inbound
and sender.email.email not in $sender_emails
and any($org_vips, .display_name == sender.display_name)
and any(ml.nlu_classifier(body.html.inner_text).entities,
       .name == "urgency"
)

Interactive Editor

A language is only as good as its tools, which is why we’ve deliberately designed the MQL editor for all phases of detection engineering. The MQL editor uses the same core as Visual Studio Code, which makes it familiar to users, and enables features that are crucial to development and testing.

When writing rules in Sublime, you’ll quickly find all the features you expect from a mature IDE:

  • autocompletion
  • debugger to evaluate functions
  • diagnostics to recognize possible logical errors
  • errors, hints, and warnings
  • function signature support
  • syntax highlighting

The editor puts Detection Engineering front and center. On the Rule creation page, attach or generate an EML to validate your MQL detects what it’s supposed to. It’s easy to quickly iterate with Test Rule and see the editor highlight the matching parts, indicating that they matched. If the rule resulted in a complete match, you’ll see that oh-so-satisfying Message flagged ✅ indicating that a rule is flagging the intended email.

Rule highlighted with matching clauses after running Test Rule

To ensure that your Rule doesn’t mistakenly flag the wrong message, simply pop open the Backtest tab to run the rule over the last 24 hours of messages to see any matching results. With Test Rule and Backtest, you can quickly get a sense of the efficacy of a rule without ever needing to enable it live in production.

That just scratches the surface of what the MQL editor can do.

Wrapping up

That’s a peek at some of the capabilities that set Message Query Language apart and how it was designed specifically to detect behavior in an email environment. With a low barrier to entry, and a simple syntax, MQL puts defenders in control with the tools they need to secure their email environments.

Stay tuned for more blog posts where we’ll demonstrate how to use MQL to prevent real, trending threats.

Try out Message Query Language now using the free online EML analyzer.

Back to Blog

Gain insight into the latest email security trends, the threat landscape, and detection strategies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.