three blocks

Analysis

Archive Layer Cake

posted on 29 July 2008 13:36


Consolidating and layering archives. A single archive store requires layers of function above it

It's apparent that archive silos are consolidating and the process is exposing a need for functionally different layers to provide access from multiple applications and systems to a single consolidated archive.

Now that archiving is becoming recognised as a separate IT activity from backup, that archiving is about long-term data preservation and not primarily about data protection, then the way we go about archiving is evolving.

As it has become separated from tape and the backup process then many different archives have sprung up, multiple archive islands or silos, such as an e-mail archive, a Notes archive, a mortgage archive, a reference information archive, a medical image archive, and so on.

Backup had one great advantage for archiving; whatever the data was it was fed in through the backup software application mill. Here was a single store, however painful to use.

At different rates users of different applications have found that a backup-mediated archive loses all the metadata and structure accessibility in the original data. Tracking e-mail threads throughout a tape set was quite impractical. So businesses like Mimosa and Waterford Technologies produced e-mail archives which preserved the e-mail infrastructure in the archive as they compressed the space requirements through single instancing of e-mail objects and so forth.

They also supported different tiers of archival storage with stubbing being used to leave a track for mail mesages archived off the primary store.

Clearpace provides a combined archive backend and archive platform product - NParchive - for structured data.

EMC developed its Centera reference information store, using content-addressing to reduce space requirements for its semi-structured and unstructured contents,

But an e-mail archive knew nothing about other forms of information and Centera knew nothing about e-mail unless an e-mail archiving application was written to use Centera. The Clearpace product knows nothing about e-mail or unstructured reference information.

What we have here are the original applications that understand the structure of their information and have metadata to reflect it. We have archiving applications that understand this and preserve it as they move the data into an archive. Then we have a third layer in this archive layer cake of the archive hardware with some software that interfaces with archiving applications. Within the layers are items that don't talk to one another.

Archive Consolidation

What is slowly happening is archive consolidation, to use Plasmon CEO Steve Murphy's phrase. He says that one layer of the cake is an archive virtualisation layer. It sits - lays would be a better term - above the actual hardware and presents it as an archive pool, ideally a single logical pool, to the archiving application. This could be a general archiving application or specific one, specific to e-mail or medical images for example.

Plasmon has its NetArchive product, using NetApp storage and also a WORM optical storage option, with file movement between the two. The company is not interested in moving up the archive stack as it were; it simply wants to present an archive storage resource to applications thart need to archive data. It is in this contect that it is partnering with Mimosa.

Conceptually Mimecast is working in an equivalent way except that it is a combination of archiving application and archive storage resource, the latter being online and in the cloud. Its archive application is not generic, being e-mail-focussed, but, in one important aspect, it enforces a layering as Mimecast does not want to displace the actual e-mail, application but work with it, providing an archival backend for it that understands e-mail metadata.

Once the data has been captured by this backend then it is stored and made searchable by what are in effect Mimecast's internal archive platform services which provide indexing, data compression through deduplication, single instancing and other techniques, search, and retrieval. Conceptually, if a multi-tiered archive were supported than inter-tier migration would hapen here as it does with Plasmon's archive software.

Archive Layer Cake

So here is the archive layer cake:-

1. End-user application with rich metadata for its user's files - the data generator, such as Exchange.

2. Backend archiving application that understands the end-user application's metadata structure and captures/retrieves information for/from the archive - the data capture.

3. Archive platform services that store data in the archive using some common method, such as XML, index it, preserve the application metadata in some way, compress it and provides basic search facilities for a compliance officer or legal discoverer or other data monitor/retriever. If multiple archive hardware is supported then it is virtualized here and represened as a single archive pool.

4. The archive storage hardware which could be RAID drive arrays, MAID arrays - think Nexsan and Copan - and/or optical storage or even tape.

Cross-layer Alliances

What we seem to be seeing is the gradual emergence of products in layer two, the backend archiving application which add a new rich metadata application area to existing archives.

Thus, for example, Mimecast is adding SharePoint to its e-mail archive. Mimosa has exposed its SDK to developers so that they can build what will be, in effect, NearPoint extension modules that can use the underlying Mimosa NearPoint archive platform services.

We are also seeing alliances put in place to bridge the layers and try to build an all-in-one archive solution stack.

For example, Permabit provides an Enterprise Archive disk resource to archiving backend applications. Atempo with its Digital Archive (ADA) provides archive platform services but also provides backend archiving application functionality with modules such as ADAM - Atempo Digital Archive for Messaging.

Atempo and Permabit are partnering to help provide an all-in-one stack combining three archive layers. One aspect that we have not touched on so far is that end-user applications run on many platforms: Windows; Linux; other Unix; and the Mac. A backend archiving application needs to run on the same platforms so that it can capture that data. For a supplier like Permabit, Atempo's support of multiple O/S platforms saves it the burden of having to craft its own direct support for those platforms. Instead Atempo software captures the data and then sends it to the Permabit storage resource.

George Crump, president and senior analyst at Storage Switzerland, said: “... heterogeneous platform support is becoming a critical buying criterion for file system archiving solutions. ... There is a ... need for joint vendor solutions that support multi-platform data archiving.”

The ADA application can store a variety of data types in a single and very scalable Permabit Enterprise Archive. This Permabit archive storage hardware layer ensures long-term integrity of the stored information and reduces operating expenses by compressing the data - combining sub-file in-line deduplication with traditional compression - and allows for growth through a grid architecture.

Atempo has recently partnered with Nirvanix and Nexsan as it works to strengthen the appeal of its overall archiving offer.

Summary

By having a layered approach to archive we can hopefully return to having a single logical archive pool with common platform storage and archival functions combined with specific archiving functions that understand an end user application's rich metadata structure and can preserve it.

These layers form an archive stack and no one supplier has a complete all-in-one generic archive solution able to cope with the different types of end user applications needing archive services.

What we might see is a consolidation of suppliers in archive space as cross-layer alliances are put in place to combine technologies and features in different layers. The allying companies may come from enterprise search - Autonomy, Reccomind, enterprise content management - Documentum, content storers - Centera, Caringo - as well as from the ranks of existing archive players such as those above.

The archival industry is starting to re-organise itself and it's likely that in thre or four years time it will look rather different.

[Chris Mellor.]

 



tags:  Archive