Analysis
A parallel grid-based mail archive in the cloud
posted on 24 July 2008 13:48
You don't need to have Google's billions to take a Google-like approach to building an extreme scaleout storage infrastructure. E-mail archiver Mimecast has done and is doing it, so well that it is adding 50TB - 70TB of customer data to its storage grid, its cloud-based Unified Email Management archive service, every month.
It stores customers' e-mails in-the-cloud on a processing and storage grid. There are exactly 9 data centres: three in the UK; two in North America; two in South Africa; and two in Jersey, a small island and offshore financial centre midway between France and the UK. This geographical spread allows data to be domiciled in the territories Mimecast operates in.
Each data centre has about 600 servers. As an example, the UK data center has sets of nine storage elements across which data - customers' e-mails - are striped. Each storage element is basically a vanilla server with a multi-core CPU, 64GB of RAM, yes, 64GB, and around 9TB of disk storage. It should increase as 750GB SATA drives transition to 1TB ones.
So the nine elements have more than 100 drives and form a stripe unit with the boxes connected together by an interconnect and Mimecast protocol. There are 63 stripe units in the centre, that's almost 7,000 drives. Scaling is simple; add another storage element with its processing, memory, I/O bandwidth and capacity. Or scale up instead if Mimecast wishes.
The storage grid, front-ended by an active:active clustered controller set up, is seen as a logical storage black box by the processing grid. The processing grid is a set of processing elements, also vanilla HP servers, clustered together using dual 1GbE links running a proprietary Mimecast protocol again. Processor grid talks to storage grid via something akin to SOAP-like, HTTP-ish message protocol, the storage grid not being seen as either a block-structured SAN or a file-structured NAS. However the whole thing has a distributed file system operating across it.
Dr. James Blake, Mimecast's evangelist, says: "It's a RAIL system, a redundant array of inexpensive Linux boxes."
Mimecast's e-mail concentrates on message cleanliness, security and the reliability of its storage. The software is all Mimecast's, Blake saying: "We only buy in signatures for anti-virus checking. We don't OEM anything."
Blake says Mimecast thought about using the Panasas parallel file system but decided to go its own way. It also thought about using Amazon's S3 service but that has reliability problems, which is simply not acceptable. Their infrasture is now, he says, "incredibly low cost."
A received e-mail is passed to three separate data centers, and passed independently, not sequentially replicated from one to another. Customer's own internal e-mails are captured by the software. Incoming data is encrypted and then encrypted a second time using a customer identifier. IT is then striped across elements. If a thieves did steal a machine then they would be facing with sorting out what data on it belonged to what customer, and that information needs decrypting, to arrive at data that needs decryptying again. Essentially an individual storage element's data contents are meaningless; there is simply no context.
The performance is excellent. Blake says: "Users can search their mailboxes in the cloud faster than searching a local .PST file. It's a search engine-like response."
From the search point of view: "We store all e-mail metadata; to, from, subject fields, etc. ... and we single-instance e-mail components such as attachments and standard disclaimer text."
In effect individual e-mails get distilled down to their unique elements and a set of pointers to redundant components, using XML. A cryptographic hash is retained of the original e-mail. When a mail message is retrieved it is reconstituted from the saved XML elements and then hashed again with the new and original hashes compared to make sure that the retrieved e-mail is an exact representation of the original.
Blake says that, because of the use of XML as the underlying description language, the stored data can be represented in other ways should the original e-mail application not be present. Also: "Other data types could be dealt with in the same way."
We can look at the Mimecast offering as base platform composed of the processor and storage grids with an e-mail archiving application using the platform services. Setting aside the e-mail-specific aspects of this Mimecast has a hardware and software platform with huge scalability that can store information extremely securely, and index it, search it and retrieve it. This is a generic capability, one with aspect to a compliance officer's concerns, and clearly capable of being extended to other applications neededing unstructured and semi-structured information archiving in the cloud.
Blake said: "I couldn't possibly comment," but admitted that we could expect a Mimecast SharePoint product probably by the end of the year. Mimecast already supports Notes and GroupWise e-mails. This covers external emails, not those sent among users on the same server. The company is not looking to replace Exchange, SharePoint or other products, with Blake saying: "We believe in tight integration with on-premises software and not replacing it. We add servers (and our software) behind it in the cloud. Anything else is cloud cuckoo land."
What about offering raw cloud storage services? Blake replies: "We would never be an online raw storage service provider. We add value in storing data for governance and compliance."
He also said that historic e-mail records can have a great deal of latent value and mentioned oil and gas customers. In the past these would have investigated and then rejected potential oil fields as their development cost was not worth the return. With today's higher oil prices one company is transferring old -mails from stored tapes to the Mimecast archive and then running so-called fuzzy searches on them to find out what it once knew and has effectively forgotten.
There are about 100 employees in Mimecast, it having doubled in size in twelve months. The parallel grid structure represents a double digit millions investment. The company has gone from zero to 1,600 customers since the business was founded in 2006. These are spread between hundreds of large customers and hundreds of small customers. One of the largest business consultancies in the UK is a customer and stores all their emails in Mimcast's archive.
This company has, on the face of it, a cloud-based storage infrastructure that appears to be a deadly serious and reliable offering. Because it is integrated with Mimecast's own software which is clearly extensible to any other Microsoft Office application dealing with non-database information then it represents a plausible and possible future all-in-one semi- and unstructured data cloud-based archive with strong governance and compliance attributes.
There is a Nicholas Carr 'Big Switch' idea here too. By sharing its infrastructure costs across many clients, as a utility does, then Mimecast hopes that its service costs are lower than product costs from competitors such as Mimosa and Waterford Technology while offering equivalent or better functionality.
[Chris Mellor.]
tags: cloud email SharePoint
in Analysis
Enterprise SSDs have compelling value
CLARiiON's EqualLogic interruption
you're reading:
A parallel grid-based mail archive in the cloud
Full disk encryption spreading



