thoughts/data/converting-pst.md
Tommy Skaug 805a34f937
All checks were successful
Export / Explore-GitHub-Actions (push) Successful in 2m19s
initial migration
2024-08-05 20:24:56 +02:00

3.6 KiB

Some time ago I gave an introduction to converting Microsoft MSG files [1] to a readable RFC 2822 [2] format on Linux. In fact you will sometimes get an even kinkier format to work with: The Outlook Data File (PST) [3]. PST files is a proprietary format used by Microsoft Outlook, and is the equivalent of the mbox on Linux.

Edit August 29th: Also have a look at the more up-to-date [4].

Even though PST files are a bit harder to read than single EML files, there is hope if you only have a Linux client: libpst, and more specifically readpst. For libpst you need three libraries:

  • libgsf (i/o library that can read and write common file types and handle structured formats that provide file-system-in-a-file semantics)
  • boost (portable C++ source libraries)
  • libpst

On OS X you can install it by:

brew install libgsf
brew install boost
brew install libpst

Now if you have a pst archive, like [5] for instance, you can convert it by:

mkdir export
readpst -M -b -e -o export "Personal Folders.pst"

This should give an output like this:

Opening PST file and indexes...
Processing Folder "Deleted Items"
Processing Folder "Inbox"
Processing Folder "latest"
[...]
Processing Folder "Reports"
	"Reports" - 11 items done, 1 items skipped.
Processing Folder "Quotes"
	"Quotes" - 1 items done, 1 items skipped.
Processing Folder "Printer"
	"Printer" - 1 items done, 1 items skipped.
Processing Folder "Passwords"
	"Passwords" - 6 items done, 1 items skipped.
[...]
Processing Folder "Kum Team"
	"Kum Team" - 37 items done, 0 items skipped.
	"9NT1425(India 11.0)" - 228 items done, 1 items skipped.
Processing Folder "Jimmi"
	"Jimmi" - 31 items done, 0 items skipped.
	"Inbox" - 27 items done, 11 items skipped.
Processing Folder "Outbox"
Processing Folder "Sent Items"
	"Sent Items" - 0 items done, 1 items skipped.
Processing Folder "Calendar"
	"Calendar" - 0 items done, 6 items skipped.
Processing Folder "Contacts"
	"Contacts" - 0 items done, 1 items skipped.
[...]
Processing Folder "Drafts"
Processing Folder "RSS Feeds"
Processing Folder "Junk E-mail"
Processing Folder "quarantine"
    "My Personal Folder" - 13 items done, 0 items skipped.

Which creates a directory structure like ls -l 'export/My Personal Folder':

drwxr-xr-x   2 -  staff   68 Aug 28 21:34 Calendar
drwxr-xr-x   2 -  staff   68 Aug 28 21:34 Contacts
drwxr-xr-x  29 -  staff  986 Aug 28 21:34 Inbox
drwxr-xr-x   2 -  staff   68 Aug 28 21:34 Journal
drwxr-xr-x   2 -  staff   68 Aug 28 21:34 Sent Items
drwxr-xr-x   2 -  staff   68 Aug 28 21:34 Tasks

If you sample Inbox/Mails/, you will find:

1.eml	10.eml	11.eml	12.eml	13.eml	14.eml	15.eml	16.eml	17.eml	2.eml	3.eml	4.eml	5.eml	6.eml	7.eml	8.eml	9.eml

You can now continue with our previous post [6]. I'll also encourage you to have a look at the documentation of the Outlook PST format [7].

[1] Converting Microsoft MSG files: /2013-10-08-msg-eml.html
[2] RFC 2822: http://tools.ietf.org/html/rfc2822
[3] The Outlook Data File (PST): http://office.microsoft.com/en-001/outlook-help/introduction-to-outlook-data-files-pst-and-ost-HA010354876.aspx
[4] libpff: /converting-pst-archives-in-os-xlinux-with-libpff
[5] Example PST file: http://sourceforge.net/projects/pstfileup/files/Personal%20Folders.pst/download
[6] Reading MSG and EML Files on OSX/Linux Command Line: :4443/forensics/reading-msg-files-in-linux-command-line/
[7] The outlook.pst format: http://www.five-ten-sg.com/libpst/rn01re05.html