Sunday 29 May 2011

Google Apps to Snail Mail



The days of the paperless office are not yet upon us, with physical mail delivery still popular for many types of communication including marketing information, financial documents and greetings cards. Over 5000 companies in the UK now use CFH Docmail to print, stuff, and deliver their mail, resulting in over 6 million documents being sent every month. By creating Docmail Connect for Google Apps, we at UK Google focused systems integrator, Appogee, have made it simple for Google Apps users to personalize, print and mail their Google documents..

As well as allowing users to send snail mail from the Google Apps menu, we made the code library for Docmail Connect for Google Apps available in project hosting. Other Google Apps Marketplace providers, such as CRM and ERP applications, can now benefit from being able to easily integrate their applications with Docmail services.

The Challenge

In order to implement the solution we needed to take the Docmail web services and create a Python interface to them. To achieve that we needed a simple Python to Web Services layer we could rely on which would allow us to build the bridge between Google App Engine and Docmail’s API, and then we needed to make the bridge work from Google Docs to the CFH Docmail service. The final challenge we faced was achieving a seamless experience for the user which meant focusing on performance each step of the way.

The Solution (in 3 easy steps)

Step 1: Invoke a SOAP Web Service from App Engine using Python

Our development language of choice is Python. To invoke a SOAP web service we chose to use the SUDs library rather than create a web service interface from scratch. We tested it locally and seemed to work fine. When we uploaded it to App Engine with some sample code, we found some errors due to limitations in App Engine for using sockets and accessing the file system. To get around this, we extended the SUDS library so that it didn’t use the sockets module, and instead of using the file system we used the App Engine memcache API. This worked very well and we were able to send and receive soap requests from App Engine via the Docmail API. Developers may be interested in additional details:
  1. The HttpTransport class has a u2open method which uses python’s socket module to set the timeout. This is not allowed on App Engine, and the timeout setting can be set later using App Engines urlfetch module. The timeout line was removed in a new method:
    def u2open_appengine(self, u2request):
    tm = self.options.timeout
    url = self.u2opener()
    # socket.setdefaulttimeout(tm) can't call this on app engine
    if self.u2ver() < 2.6:       
      return url.open(u2request)
    else:
      return url.open(u2request, timeout=tm)
    transport.http.HttpTransport.u2open = u2open_appengine
  2. The SUDs library has an extensible cache class. The SUDs library provides various implementations of this, however we wanted to use the App Engine memcache for performance. To do this, we implemented a new class to use memcache, as shown below:
    class MemCache(cache.Cache):
      def __init__(self, duration=3600):
        self.duration = duration
        self.client = memcache.Client()
     
    
      def get(self, id):
        return self.client.get(str(id))
    
      def getf(self, id):
        return self.get(id)
    
      def put(self, id, object):
        self.client.set(str(id), object, self.duration)
    
      def putf(self, id, fp):
        self.put(id, fp)
    
      def purge(self, id):
        self.client.delete(str(id))
    
      def clear(self):
        self.client.flush_all()
    
    This was then plugged into the SUDs client class by overriding the __init__ method.
  3.  
  4. We can now use the modified SUDs client class to make SOAP calls on App Engine! The full source code for this is available in project hosting.

Step 2: Google Docs to Docmail

CFH publish a full API to allow developers to integrate their applications with Docmail, a subset of which needed to be used by Appogee to create the interface for users to create mailings from Google Docs. Not every Docmail function is enabled in the wrapper, but the principles used can easily be extrapolated to any other function, and the key to it is getting a satisfactory link from Google Docs to Docmail.

The Docmail API can accept documents in a variety of formats including doc, docx, PDF and RTF. In order to use the API, we would need to extract Google Docs content in one of these formats. The Google Docs API has support for exporting in doc, RTF and PDF. We experimented with each before settling on RTF, which, although large in file size, did work. However the Google App Engine urlfetch library has a 1MB request limit, so we were not able to send files larger than 1MB. We evaluated a number of workarounds such as cutting the files up into chunks and sending them separately, bouncing the files off another platform but our only option was to simply prevent files larger than 1 MB from being uploaded. To achieve this we used Ajax calls to check (compute) the file size in real time as they are selected and provide appropriate feedback to the user. The file size cut off is a configurable parameter, so if Google increase the limit we can adjust the app without redeploying the code.

Breaking this integration down, we used the Google Docs API together with the Docmail SOAP API, via our modified implementation of the SUDs library as described in step 1.

The Docmail API must be used in a logical order, to coincide with the wizard pages on the Docmail website. We can illustrate these steps in order:
  1. Create an instance of the Docmail client, which is an extension to the SUDs client class. The client contains the methods we need for further communication with the Docmail API:
    docmail_client = client.Client(USERNAME, PASSWORD, SOURCE)
  2. Create / Retrieve a mailing. To create a new mailing, we create an instance of the Mailing class, and pass it into the create_mailing method:
    mailing = client.Mailing(name='test mailing')
    mailing = docmail_client.create_mailing(mailing)
    To retrieve an existing mailing, we need to know the mailing guid:
    mailing = docmail_client.get_mailing('enter-your-mailing-guid')
  3. Upload a template document. Since we are retrieving documents from Google Docs to be used as the template, we need to download it first. We do this using the Google Docs API:
    docs_client = gdata.docs.client.DocsClient()
    # authenticate client using oauth (see google docs documentation for example code)
    We now need to extract the document. There are various formats you can do this in, but we found by experimenting that RTF worked best, despite being the largest in file size.
    file_content = docs_client.GetFileContent(uri=doc_url + '&exportFormat=rtf')
    And finally upload the template file to docmail:
    docmail_client.add_template_file(mailing.guid, file_content)
  4. Upload a Mailing List Spreadsheet. This is similar code to uploading a template:
    docs_client = gdata.docs.client.DocsClient()
    file_content = docs_client.GetFileContent(uri=doc_url + '&exportFormat=csv')
    docmail_client.add_mailing_list_file(mailing.guid, file_content)
  5. Submit the mailing for processing. The mailing needs to be submitted for processing. Once this has been done, a proof is available for download.
    docmail_client.process_mailing(mailing.guid, False, True)
  6. Finally, we approve and pay for the mailing:
    docmail_client.process_mailing(mailing.guid, True, False)

Step 3 - Performance

When creating a system like Docmail Connect for Google Apps, overall acceptability of the system will be driven as much by performance as by functionality, so we paid particular attention to the key components driving this.

We were conscious that some Google Apps users may have a lot of Google documents, so presenting all of them for selection to a user wasn’t an option. Instead, we load 100 at a time (the default), and send them to the browser using XHR calls so it all happens without the user having to do anything and works quite fast - the user can now select from 1000s of documents within a few seconds.

Next we had to address the connection to the Docmail API, where care must be taken to achieve acceptable throughput. Queries to the API had to be minimized and we didn't want the page submits to be slow - the user should be able to go from one page to the next as quickly as possible. To achieve this, we used the App Engine Task Queue. When a user submits a page which requires communication with the Docmail API we fire off a Task Queue to do the work for us, and then simultaneously navigate the user to the next page in the process. This means that the server is working while the user progresses the workflow. This also means handling timeout errors is easier, as the Task Queue can catch the errors and reinstate another task. But it requires some extra planning as some tasks will not complete until others finish, and process checking needs to ensure new tasks only get fired off when the appropriate time has come.

We hope the wrapper, together with the key learnings written up here will encourage others to have a go at wiring their applications to the Docmail delivery services.

No comments:

Post a Comment

Share This Post