White shape | Hexiosec Logo
Technical Tutorials & Explainers

Parsing email files in Go

Scott Lester
26 January 2026
|
5 min Read
|
Scott Lester

This blog provides some background on the format of two common email file types, and introduces Go modules for the parsing of both. We’ve used the modules to build a tool for inspecting suspicious email files.

Inspecting Suspicious Emails

We have various customers who come to us with a range of questions and problems. A fairly common one is “I’ve got this email and I’m not sure about it”. We previously wrote a short guide on manually inspecting emails, in an effort to help people help themselves. But in some cases we carry out our own checks; so we built a tool to inspect email files.

We’re not releasing that tool (yet), but we have released the Go modules for parsing both .eml and .msg files, which are the two common formats for saved emails. Both include all the data we need for inspection, including the email headers, body and attachments.

Our inspection process boils down to:

  • Check the headers for disparities in the sender’s address, and for the authentication details.
  • Inspect the body for suspicious URLs.
  • Inspect the attachments.

Having a tool to do it makes our lives much easier, and it’s obviously safer than some manual inspection techniques!

The Go module for parsing both .eml and .msg files is available on GitHub. Its README explains about usage; here we’ll go over some of the details on the two different file formats.

EML File Format

EML files are the default format for Outlook on MacOS, and for Outlook online. They are text based, multi-part MIME files. They start with a set of MIME headers, which are prefixed with the field name and a colon, e.g.:

From: Microsoft 365 Defender <[email protected]>
To: Red Maple Cyber Team <blah>
Subject: New vulnerabilities notification from Microsoft Defender for Endpoint
Thread-Topic: New vulnerabilities notification from Microsoft Defender for
 Endpoint
Thread-Index: AQHZ2/IgwPxuK4ZI90CFadxRddgg2w==
X-MS-Exchange-MessageSentRepresentingType: 1
Date: Thu, 31 Aug 2023 10:01:28 +0000

Multi-part means we have boundaries between different parts of content. You can find the boundaries of the other parts by looking for --_, which is the boundary name prefix. E.g., here’s the boundary between two Base64 parts:

SGFoYSBuaWNlIHRyeSBpdCdzIG5vdCB0aGUgKnJlYWwqIGVtYWlsIEJhc2U2NA==

--_000_07c29e7ac729484fbadedf5e915c0792aznorthcentralusmicroso_       <--- Boundary
Content-Type: text/html; charset="utf-8"
Content-ID: <[email protected]>
Content-Transfer-Encoding: base64

SGFoYSBuaWNlIHRyeSBpdCdzIG5vdCB0aGUgKnJlYWwqIGVtYWlsIEJhc2U2NCBQdCBJSQ==

To parse them, we can use ReadMessage from the Go net/mail module:

// A Message represents a parsed mail message.
type Message struct {
	Header Header
	Body   io.Reader
}

// ReadMessage reads a message from r.
// The headers are parsed, and the body of the message will be available
// for reading from msg.Body.
func ReadMessage(r io.Reader) (msg *Message, err error) {
	tp := textproto.NewReader(bufio.NewReader(r))

	hdr, err := tp.ReadMIMEHeader()
	if err != nil {
		return nil, err
	}

	return &Message{
		Header: Header(hdr),
		Body:   tp.R,
	}, nil
}

This returns a Message struct consisting of a header and a body. The body is a Reader, from which we need to fully read the bytes before closing the file. We do this to parse out any attachments, which are Base64 encoded (see attachments.go). Each attachment sits within its own boundary:

--=-XNI3F2P8aCdwwxXQDLdRmw==   												<--- Boundary
Content-Type: application/octet-stream; name=G026730897.pdf
Content-Disposition: attachment; filename=G026730897.pdf
<optional extra fields>
Content-Transfer-Encoding: base64

JVBERi0xLjcKJeLjz9MKMTEgMCBvYmoKPDwvQSAxMiAwIFIvQm9yZGVyWzAgMCAwXS9GIDQvUCA0 <--- Base64 blob
IDAgUi9SZWN0WzM4OC43NSA1NzEuMTYgNDY1Ljg2IDU3OS4xNF0vU3VidHlwZS9MaW5rPj4KZW5k
...
ZWYKMTI1MzgwCiUlRU9GCg==

--=-XNI3F2P8aCdwwxXQDLdRmw==--   											<--- Boundary 

That’s the nice format, as we can parse text and Base64 blobs pretty easily. In contrast, here’s some explanation of the .msg format…

MSG File Format

MSG files are the default format for saved emails in Outlook on Windows. Like old Office documents, they are a Compound File Binary Format, OLE. They are effectively containers around a bunch of content. This isn’t supposed to be a blog about the OLE file format, but it’s worth digging a little bit and showing some useful tools.

First off, if we open an example in 010Editor and apply the OLE template, it parses the structure:

010 Editor parsing an MSG file structure with the OLE template

The file starts with an OLE header, a File Allocation Table (FAT) and a second MiniFAT. The rest of the file is a largely made up of all the entries. Here’s an example, entry __substg1.0_0065001F:

010 Editor hex view of MSG entry __substg1.0_0065001F

You can poke through OLE files and their entries using olebrowse, which is part of the Python oletools module:

Olebrowse listing MSG storage entries and properties

You can see this same structure if you dump an OLE file. You can do that with oledump and the plugin_msg plugin, and you’ll see all the entries from a given file:

python3 oledump.py -p plugin_msg Defender.msg
  1:        80 '__nameid_version1.0/__substg1.0_00020102'
               Plugin: MSG plugin
                 "0002 0102: BIN ?                         b'\\x08 \\x06\\x00\\x00\\x00\\x00\\x00\\xc0\\x00\\"
  2:       360 '__nameid_version1.0/__substg1.0_00030102'
               Plugin: MSG plugin
                 "0003 0102: BIN ?
...
 54:       126 '__substg1.0_00510102'
               Plugin: MSG plugin
                 "0051 0102: BIN ?                         b'EX:/O=EXCHANGELABS/OU=EXCHANGE ADMINIS"
 55:       126 '__substg1.0_00520102'
               Plugin: MSG plugin
                 "0052 0102: BIN ?                         b'EX:/O=EXCHANGELABS/OU=EXCHANGE ADMINIS"
 56:         8 '__substg1.0_0064001F'
               Plugin: MSG plugin
                 0064 001F: UNI Sent repr adrtype         SMTP
 57:        60 '__substg1.0_0065001F'
               Plugin: MSG plugin
                 0065 001F: UNI Sent repr email           [email protected]
 58:       138 '__substg1.0_0070001F'
               Plugin: MSG plugin
                 0070 001F: UNI Topic                     New vulnerabilities notification from Mi
 59:        22 '__substg1.0_00710102'
               Plugin: MSG plugin

So to process a msg file ourselves we need to run through all the entries and pull out the ones we recognise. In this module we use the Golang reader for Compound File Binary Files to extract each entry from the file.

We look for two things: entries that have the fixed prefix used for all attachment entries (__substg1.0_37), which allows us to dump out attachments; and more generically all entries with the property stream prefix (__substg1.0_).

The property stream name includes a type and format, e.g. the entry __substg1.0_00020102 has property type: 0002 and the encoding 0x0102 (binary, we also have ASCII and Unicode).

Here’s a full entry example, dumped from a msg file:

64:        24 '__recip_version1.0_#00000000/__substg1.0_3001001F'
               Plugin: MSG plugin
                 3001 001F: UNI Display name              Scott Lester

Its property name is 0x3001 (aka “Display name”), and encoding 0x001F (aka Unicode). That’s an easy one.

Scraping property name identifiers from oledump and a bunch of other places, we built a lookup map in GetPropertyName in message.go.

Some seem to be well-known defaults, others took some research. And the list isn’t complete. But it’s great to know we can encode emails that include the spouse’s name (0x3A48) and ISDN (0x3A2D), but can’t reliably extract a date.

Property Names

Here’s the most useful sources, many of which are understandably trying to map these names to MAPI properties:

Other References

Related Posts

About Scott Lester
Scott is a technical Cyber Security professional with over fifteen years' experience across a broad range of roles within the public and private sectors. With a deep understanding of cyber security, he has in his career focussed on applied cryptography, network technologies, digital forensics and security research. At Hexiosec he leads the delivery of all of our cyber security services.
Scott Lester

See your real external attack surface - without the noise

Book a demo
Book a demo