Saturday 24 March 2012

Parsing mailboxes using Python

  Google Apps domain administrators can use the Email Audit API to download mailbox accounts for audit purposes in accordance with the Customer Agreement. To improve the security of the data retrieved, the service creates a PGP-encrypted copy of the mailbox which can only be decrypted by providing the corresponding RSA key. When decrypted, the exported mailbox will be in mbox format, a standard file format used to represent collections of email messages. The mbox format is supported by many email clients, including Mozilla Thunderbird and Eudora. If you don’t want to install a specific email client to check the content of exported mailboxes, or if you are interested in automating this process and integrating it with your business logic, you can also programmatically access mbox files. You could fairly easily write a parser for the simple, text-based mbox format. However, some programming languages have native mbox support or libraries which provide a higher-level interface. For example, Python has a module called mailbox that exposes such functionality, and parsing a mailbox with it only takes a few lines of code:
import mailbox

def print_payload(message):
  # if the message is multipart, its payload is a list of messages
  if message.is_multipart():
    for part in message.get_payload(): 
      print_payload(part)
  else:
    print message.get_payload(decode=True)

mbox = mailbox.mbox('export.mbox')
for message in mbox:
  print message['subject']
  print_payload(message)

No comments:

Post a Comment

Share This Post