three blocks
Datacore Software

Opinion

Size matters

posted on 27 August 2008 15:01


Particularly when it comes to databases.

A simple but clear observation is that we can’t stop generating data.

Analysts estimate that storage consumption is growing at an unsustainable 50% per annum, and even though hardware costs are falling McKinsey estimates that the proportion of IT budgets allocated to storage is increasing at 20% per year. This cost is not storage media alone, it includes and is driven by the other factors that make up the TCO of each terabyte of data retained, specifically the people, power and physical space in the data centre. These factors are increasingly dominant in determining the real costs of storage, often exceeding $45 per year for each gigabyte of Tier-1 storage.


By John Bantleman, chief executive at Clearpace


A recent study by Forrester Research of 150, billion dollar plus, companies showed that on average over 45% of storage within their data estates, which are often measured in petabytes, was allocated to databases. These figures are higher for companies in transactionally intensive industries such as financial services and telecoms, where the proportion is well over 50%. So these companies having acquired or developed 100s of business applications which generate data within the enterprise, are consuming petabytes of storage and millions of dollars to keep it.

In the middle of a credit crunch, when cash and profits are not flowing at anywhere near the rate they were just a year ago, these costs a pressing board level issue. Throwing money at the problem is no longer a viable option, therefore, a coherent strategy must be adopted to address the storage crisis.

So, if I have petabytes of data, and let's assume the majority of that data (55% as stated the survey) is unstructured in emails, files and documents, the logical place to start would be there, right? On the surface this makes sense, however, if we dig into the details we find that although unstructured data may be larger, it generally sits comfortably on lower cost tier-2/3 storage. This lower tier storage is not a place where you would ever put mission critical databases. So although databases might contribute to less than 50% of the storage consumption, it’s almost certainly consuming more expensive tier-1 storage and hence the majority of the storage budget.

The good news however, is that there has been some real innovation in areas such as deduplication that offer hard savings in managing the storage of historical data. The bad news is that much of the file and block level deduplication technology available to storage managers is focused on compressing unstructured data.

The de-duplication of databases is more complex and to get the real prize in storage optimization the storage and database team need to co-operate. A few years back a concept was introduced called Information Lifecycle Management, which really addressed the question “what the hell should I be doing with all this data”. The question is simple enough but is actually an area that in most enterprises has had little real thought, especially when it comes to structured data.

The most common thing to do is bury your head in the sand, and leave data where it is. This kind of works, but with databases growing at between 40%-100% per year it is very expensive and inefficient. As databases exceed a terabyte they begin to slow down, service levels creak and a team of DBAs and more tin is required to deliver against SLAs. There is another costly side to this approach, storage consumption; most databases are mirrored and fibre copied to a DR site and mirrored again. They are probably also copied to development and test sites. Hence a single terabyte of database storage may cost upwards of $200k per year on a fully loaded basis. And as I mentioned before, in real terms these costs aren’t falling.

Assuming its too expensive to store the 100s, and in some cases 1000s, of terabytes of data by leaving them in the database, we can revert to the second most common practice, back up to tape and delete from production. Tape allows you to restore data but the fundamental assumption is that restoration happens in a recovery situation, so this option works if you never actually need or wish to access the historical data. If you do need access, going from tape will be massively expensive and the older the data, the more costly the recovery. Sometimes a museum of historical application and database versions; together with obsolete hardware platforms, need to be maintained, or acquired, to access data from tape.

Well if it’s too expensive to keep in the database, and tape is massively inefficient as a archiving scheme, what is the alternative, delete it? With over 35,000 distinct business regulations world-wide that govern data retention, and most businesses seeing real value in the historical data they have accumulated, deletion is not a viable option.

Re-enter Information Lifecycle Management which proposes that organizations classify their data as hot (transactionally active) or cold (transactionally inactive), and determine the most appropriate and cost effective infrastructure for the data by value. A number of companies provide specialized tools to address this need by looking at the behaviour of data within major ERP and CRM applications and identifying, classifying and relocating cold data to lower storage tiers. However unless they provide a destination that addresses the storage footprint of the data without compromising accessibility, they are missing a critical component of the overall solution.

Obviously I will point to our own archive store as an answer; after all NParchive is designed specifically for the long term retention of structured data. It reduces the storage footprint by over 95% through data compression techniques, effectively de-duping the database from the inside. It also keeps all data accessible through standard SQL and BI tools, providing accessibility, auditability and, if necessary, immutability of the data for regulatory reasons.

Which ever storage is selected the fact is that for economical operation of databases size does matter. Businesses need to plan now to ensure long term manageability of their data. With more than 80% of the data stored in databases today projected to be cold data, the potential benefits are massive.
 


 

 

By John Bantleman, chief executive at Clearpace.



tags:  Archive