Introduction to Message Query Language (MQL)

Message Query Language enables defenders to share protections against email attacks and powers all rules, insights, and hunts for Sublime.

Sublime is the world’s first open email security platform that lets anyone write, run, and share rules in a universal domain-specific language (DSL) to block email-borne attacks, hunt for threats, and more. In our previous post, we shared how Sublime provides protection against email attacks and enables defenders to share detection rules with others.

‍

In this post, we’re introducing Message Query Language (MQL), the language that drives all rules, insights, and hunts for Sublime. MQL is the same language used by our Detection and ML teams to stop emerging threats like Business Email Compromise (BEC), HTML smuggling, credential phishing, and other email attacks before they cause damage.

‍

With MQL, defenders can also write their own tailored rules for attacks they’re seeing, modify any existing rule written by the Sublime team, use rules written by peers in the community, and transparently understand why a message was flagged in the first place.

‍

If you’re familiar with other languages for detection like YARA, Sigma, Snort/Suricata, or Event Query Language (EQL), then you’ll feel right at home with MQL. If not, don’t worry! We designed MQL to be intuitive, flexible, and easy to use.

‍

Message Data Model

When the Sublime Platform processes an email message, it’s first seen in the archaic text EML format. This is the standard for email, but as a text standard, it’s challenging to work with. Even with standards such as RFC5322, not all email conforms, and it’s still a plain text format that makes detection logic difficult.

‍

Instead of dealing with raw text, Sublime parses the format into a highly structured schema, the Message Data Model (MDM), specifically with detection in mind. There’s no need to wrangle complex regular expressions just to search headers or the body.

‍

Instead, it’s easy to find and use the relevant fields. The MDM separates attachments, body, headers, recipients and various other fields into a single document that is easily represented by JSON.

‍

For example, the MDM enables you to check whether hyperlinks have mismatched display vs target URLs or to retrieve a specific hyperlinked top-level domain (TLD):

‍

‍

Similarly, the MDM’s parsed headers let you easily describe SPF, DMARC, or DKIM failures to detect spoofs, or mismatched MAIL FROM and ENVELOPE FROM values:

‍

‍

This schema is used by MQL when writing email detections. For example, typing the MQL snippet type.inbound uses the MDM’s ^type object and ^.inboundboolean field to describe inbound email messages. More on syntax in the next section.

‍

Type object on the MDM, which tracks inbound vs outbound vs internal messages

‍

Syntax overview

Inbound messages that contain at least one PDF attachment over 10MiB:

‍_type.inbound_and_any_{(attachments,
.file_type == "pdf"}_and_{.size > 10 * 1024 * 1024)}

‍

We designed MQL to be simple to read and write. Let’s dissect the above query to get a feel for the syntax:

‍

^type.inbound

Retrieve the field from the MDM, ^type -> ^inbound. This is only true on incoming messages to a mailbox.

‍

^and

Boolean ^AND between two terms. MQL uses plain English words like and instead of symbols like ^&&.

‍

^{any(attachments, ...)}

Check if at least one attachment on the MDM matches some criteria. In MQL, there are several functions to check arrays, such as any, all, and distinct. In an array function, fields on a nested item are referenced with a preceding dot (^.).

‍

^. (dot)

Access a nested item. The leading ^. indicates that a field is relative to a nested item, not root fields on the MDM.

‍

^{.file_type == "pdf"}

Has a PDF file type

‍

^{.size > 10*1024*1024}

Has a file size greater than ^{10 MiB}. We can use arithmetic operations to perform calculations on the fly with MQL.

‍

The remaining core syntax, such as strings, literals, comments, and lists are designed to be intuitive. See the MQL syntax docs for a deeper dive.

‍

Functions

All of Sublime’s novel detection capabilities are exposed via MQL in the same way: functions. Want to search for a substring or evaluate a regular expression? There’s a function for that. Check domain age via WHOIS? Function for that. Grab a screenshot from a URL and check if it looks like credential phishing? There’s a function for that, too.

‍

There are a handful of top-level functions for the most common operations. The remaining functions are grouped in modules, which keeps them organized and easier to find. To do something with strings, type strings. and autocomplete will list what’s available (more on the rule editor later!). As of writing, these are the functions available:

‍

Array functions:

^all
^any
^distinct
^filter
^map

Top level functions:

^coalesce
^length

File analysis functions, starting with ^file.:

^file.explode
^{file.oletools}

Regular expressions, starting with ^regex.

^{regex.contains}
^{regex.icontains}
^regex.match
^regex.imatch

Strings functions, starting with ^strings.:

^{strings.concat}
^{strings.contains}
^{strings.icontains}
^{strings.ends_with}
^{strings.iends_with}
^{strings.levenshtein}
^{strings.ilevenshtein}
^strings.like
^{strings.ilike}
^{strings.starts_with}
^{strings.istarts_with}

Machine learning functions, starting with ^ml.:

^{ml.macro_classifier}
^{ml.nlu_classifier}

And finally, we saved a few of our favorite new functions for last, currently under ^beta. :

^{beta.linkanalysis}
^beta.whois

‍

Here’s a modified snippet of MQL from a Callback phishing rule that searches a ZIP file for images or PDFs, which are scanned for text with OCR. On the scanned text, this rule performs NLU to check if it contains text resembling a callback scam with high confidence.

‍

It might sound complicated, but it’s actually just a few lines of MQL!

‍

_type.inbound_and_any_{(attachments, .file_extension == "zip"}_and_any_{(file.explode(.),
.file_extension}_in~_{("pdf", "jpg", "jpeg", "png")}_and_any_{(ml.nlu_classifier(.scan.ocr.raw).intents,
.name == "callback_scam"}_and_{.confidence == "high"
)
)}

‍

Lists

The Sublime Platform also maintains Lists, which are a collection of strings or items that can be accessed from any rule. Builtin lists are automatically maintained by the Sublime platform, providing immediate context globally or historically for your environment. For anything else, you can create and manage custom lists in your Dashboard or via API.

‍

To reference a list in MQL, include it with in or an array function, such as ^any.

‍

Check that a sender’s domain is in the Tranco 1 Million:‍

^{sender.email.domain.domain in $tranco_1m}

‍

Check that a sender has never sent emails to your organization before:

^{sender.email.email not in $sender_emails}

‍

Check for a sender domain that’s highly similar to a domain that belongs to your organization (modified from our Lookalike sender domain rule)

_type.inbound_and_{any($org_domains,
strings.levenshtein(sender.email.domain.domain, .) == 1
)}

‍

Automatically synced lists, automatically synced with sublime-security/static-files on GitHub:

^$alexa_1m
^{$disposable_email_providers}
^{$file_extensions_common_archives}
^{$file_extensions_macros}
^{$free_email_providers}
^{$free_file_hosts}
^{$free_subdomain_hosts}
^{$majestic_million}
^{$suspicious_tlds}
^$tranco_1m
^$umbrella_1m
^{$umbrella_1m_tld}
^{$url_shorteners}

‍

Dynamically maintained lists from historical messages, used to maintain patterns of communication:

^{$sender_domains}
^{$sender_emails}
^{$recipient_emails}
^{$recipient_domains}

‍

Dynamically maintained lists, which are synced with your upstream email provider:

^{$org_display_names}
^$org_domains
^$org_slds

‍

In addition to strings, lists can also contain more complex objects, like users in a group from a cloud email provider. For example, ^$org_vips is automatically created and is easily configured to point to any Azure AD group or Google Group.

‍

Here’s a snippet of MQL from a VIP Impersonation Rule that looks for sender display names matching someone in the VIP list, with an urgent tone, from a new sender:

‍

_type.inbound_and_{sender.email.email}_not_in_{$sender_emails}_and_any_{($org_vips, .display_name == sender.display_name)}_and_any_{(ml.nlu_classifier(body.html.inner_text).entities,
.name == "urgency"
)}

‍

Interactive Editor

A language is only as good as its tools, which is why we’ve deliberately designed the MQL editor for all phases of detection engineering. The MQL editor uses the same core as Visual Studio Code, which makes it familiar to users, and enables features that are crucial to development and testing.

‍

When writing rules in Sublime, you’ll quickly find all the features you expect from a mature IDE:

autocompletion
debugger to evaluate functions
diagnostics to recognize possible logical errors
errors, hints, and warnings
function signature support
syntax highlighting

‍

The editor puts Detection Engineering front and center. On the Rule creation page, attach or generate an EML to validate your MQL detects what it’s supposed to. It’s easy to quickly iterate with Test Rule and see the editor highlight the matching parts, indicating that they matched. If the rule resulted in a complete match, you’ll see that oh-so-satisfying Message flagged ✅ indicating that a rule is flagging the intended email.

‍

Rule highlighted with matching clauses after running Test Rule

‍

To ensure that your Rule doesn’t mistakenly flag the wrong message, simply pop open the Backtest tab to run the rule over the last 24 hours of messages to see any matching results. With Test Rule and Backtest, you can quickly get a sense of the efficacy of a rule without ever needing to enable it live in production.

‍

That just scratches the surface of what the MQL editor can do.

‍

Wrapping up

That’s a peek at some of the capabilities that set Message Query Language apart and how it was designed specifically to detect behavior in an email environment. With a low barrier to entry, and a simple syntax, MQL puts defenders in control with the tools they need to secure their email environments.

‍

Stay tuned for more blog posts where we’ll demonstrate how to use MQL to prevent real, trending threats.

‍

Try out Message Query Language now using the free online EML analyzer.

‍