This blog provides some background on the format of two common email file types, and introduces Go modules for the parsing of both. We’ve used the modules to build a tool for inspecting suspicious email files.
Inspecting Suspicious Emails
We have various customers who come to us with a range of questions and problems. A fairly common one is “I’ve got this email and I’m not sure about it”. We previously wrote a short guide on manually inspecting emails, in an effort to help people help themselves. But in some cases we carry out our own checks; so we built a tool to inspect email files.
We’re not releasing that tool (yet), but we have released the Go modules for parsing both .eml and .msg files, which are the two common formats for saved emails. Both include all the data we need for inspection, including the email headers, body and attachments.
Our inspection process boils down to:
- Check the headers for disparities in the sender’s address, and for the authentication details.
- Inspect the body for suspicious URLs.
- Inspect the attachments.
Having a tool to do it makes our lives much easier, and it’s obviously safer than some manual inspection techniques!
The Go module for parsing both .eml and .msg files is available on GitHub. Its README explains about usage; here we’ll go over some of the details on the two different file formats.
EML File Format
EML files are the default format for Outlook on MacOS, and for Outlook online. They are text based, multi-part MIME files. They start with a set of MIME headers, which are prefixed with the field name and a colon, e.g.:
From: Microsoft 365 Defender <[email protected]>
To: Red Maple Cyber Team <blah>
Subject: New vulnerabilities notification from Microsoft Defender for Endpoint
Thread-Topic: New vulnerabilities notification from Microsoft Defender for
Endpoint
Thread-Index: AQHZ2/IgwPxuK4ZI90CFadxRddgg2w==
X-MS-Exchange-MessageSentRepresentingType: 1
Date: Thu, 31 Aug 2023 10:01:28 +0000
Multi-part means we have boundaries between different parts of content. You can find the boundaries of the other parts by looking for --_, which is the boundary name prefix. E.g., here’s the boundary between two Base64 parts:
SGFoYSBuaWNlIHRyeSBpdCdzIG5vdCB0aGUgKnJlYWwqIGVtYWlsIEJhc2U2NA==
--_000_07c29e7ac729484fbadedf5e915c0792aznorthcentralusmicroso_ <--- Boundary
Content-Type: text/html; charset="utf-8"
Content-ID: <[email protected]>
Content-Transfer-Encoding: base64
SGFoYSBuaWNlIHRyeSBpdCdzIG5vdCB0aGUgKnJlYWwqIGVtYWlsIEJhc2U2NCBQdCBJSQ==
To parse them, we can use ReadMessage from the Go net/mail module:
// A Message represents a parsed mail message.
type Message struct {
Header Header
Body io.Reader
}
// ReadMessage reads a message from r.
// The headers are parsed, and the body of the message will be available
// for reading from msg.Body.
func ReadMessage(r io.Reader) (msg *Message, err error) {
tp := textproto.NewReader(bufio.NewReader(r))
hdr, err := tp.ReadMIMEHeader()
if err != nil {
return nil, err
}
return &Message{
Header: Header(hdr),
Body: tp.R,
}, nil
}
This returns a Message struct consisting of a header and a body. The body is a Reader, from which we need to fully read the bytes before closing the file. We do this to parse out any attachments, which are Base64 encoded (see attachments.go). Each attachment sits within its own boundary:
--=-XNI3F2P8aCdwwxXQDLdRmw== <--- Boundary
Content-Type: application/octet-stream; name=G026730897.pdf
Content-Disposition: attachment; filename=G026730897.pdf
<optional extra fields>
Content-Transfer-Encoding: base64
JVBERi0xLjcKJeLjz9MKMTEgMCBvYmoKPDwvQSAxMiAwIFIvQm9yZGVyWzAgMCAwXS9GIDQvUCA0 <--- Base64 blob
IDAgUi9SZWN0WzM4OC43NSA1NzEuMTYgNDY1Ljg2IDU3OS4xNF0vU3VidHlwZS9MaW5rPj4KZW5k
...
ZWYKMTI1MzgwCiUlRU9GCg==
--=-XNI3F2P8aCdwwxXQDLdRmw==-- <--- Boundary
That’s the nice format, as we can parse text and Base64 blobs pretty easily. In contrast, here’s some explanation of the .msg format…
MSG File Format
MSG files are the default format for saved emails in Outlook on Windows. Like old Office documents, they are a Compound File Binary Format, OLE. They are effectively containers around a bunch of content. This isn’t supposed to be a blog about the OLE file format, but it’s worth digging a little bit and showing some useful tools.
First off, if we open an example in 010Editor and apply the OLE template, it parses the structure:

The file starts with an OLE header, a File Allocation Table (FAT) and a second MiniFAT. The rest of the file is a largely made up of all the entries. Here’s an example, entry __substg1.0_0065001F:

You can poke through OLE files and their entries using olebrowse, which is part of the Python oletools module:

You can see this same structure if you dump an OLE file. You can do that with oledump and the plugin_msg plugin, and you’ll see all the entries from a given file:
python3 oledump.py -p plugin_msg Defender.msg
1: 80 '__nameid_version1.0/__substg1.0_00020102'
Plugin: MSG plugin
"0002 0102: BIN ? b'\\x08 \\x06\\x00\\x00\\x00\\x00\\x00\\xc0\\x00\\"
2: 360 '__nameid_version1.0/__substg1.0_00030102'
Plugin: MSG plugin
"0003 0102: BIN ?
...
54: 126 '__substg1.0_00510102'
Plugin: MSG plugin
"0051 0102: BIN ? b'EX:/O=EXCHANGELABS/OU=EXCHANGE ADMINIS"
55: 126 '__substg1.0_00520102'
Plugin: MSG plugin
"0052 0102: BIN ? b'EX:/O=EXCHANGELABS/OU=EXCHANGE ADMINIS"
56: 8 '__substg1.0_0064001F'
Plugin: MSG plugin
0064 001F: UNI Sent repr adrtype SMTP
57: 60 '__substg1.0_0065001F'
Plugin: MSG plugin
0065 001F: UNI Sent repr email [email protected]
58: 138 '__substg1.0_0070001F'
Plugin: MSG plugin
0070 001F: UNI Topic New vulnerabilities notification from Mi
59: 22 '__substg1.0_00710102'
Plugin: MSG plugin
So to process a msg file ourselves we need to run through all the entries and pull out the ones we recognise. In this module we use the Golang reader for Compound File Binary Files to extract each entry from the file.
We look for two things: entries that have the fixed prefix used for all attachment entries (__substg1.0_37), which allows us to dump out attachments; and more generically all entries with the property stream prefix (__substg1.0_).
The property stream name includes a type and format, e.g. the entry __substg1.0_00020102 has property type: 0002 and the encoding 0x0102 (binary, we also have ASCII and Unicode).
Here’s a full entry example, dumped from a msg file:
64: 24 '__recip_version1.0_#00000000/__substg1.0_3001001F'
Plugin: MSG plugin
3001 001F: UNI Display name Scott Lester
Its property name is 0x3001 (aka “Display name”), and encoding 0x001F (aka Unicode). That’s an easy one.
Scraping property name identifiers from oledump and a bunch of other places, we built a lookup map in GetPropertyName in message.go.
Some seem to be well-known defaults, others took some research. And the list isn’t complete. But it’s great to know we can encode emails that include the spouse’s name (0x3A48) and ISDN (0x3A2D), but can’t reliably extract a date.
Property Names
Here’s the most useful sources, many of which are understandably trying to map these names to MAPI properties:
- https://isc.sans.edu/diary/Nested+MSGs+Turtles+All+The+Way+Down/26668
- https://www.fileformat.info/format/outlookmsg/
- https://github.com/libyal/libfmapi/blob/main/documentation/MAPI%20definitions.asciidoc
- https://github.com/DidierStevens/DidierStevensSuite/blob/98c7aa67d1ac92a5ea79b37fa7734b183c16bd64/plugin_msg.py#L28
- https://github.com/echo-devim/pyjacktrick/blob/main/mapi_constants.py
- https://github.com/shaniacht1/content/blob/master/automation-ParseEmailFiles.yml
- https://learn.microsoft.com/en-us/office/client-developer/outlook/mapi/mapi-constants#mapi-mime-conversion-api
Other References
- https://isc.sans.edu/diary/Nested+MSGs+Turtles+All+The+Way+Down/26668
- https://www.fileformat.info/format/outlookmsg/