For hundreds of years, any organisation that needed to store information relied on one tried and tested technology: paper. But since the advent of computing and digital data storage, more and more data has been captured and stored electronically in digital archives.
But now organisations need to retain archived data for longer – for business and regulatory reasons – can storage technology keep up?
With careful management, paper archives last for decades, if not centuries. No computer system is older than 80 years, but there are industries that face the prospect of archiving data for 100 years or more.
And, with the operating lifespan of a standard hard drive at just three to five years, IT departments need to know how to store data for future generations: so-called indefinite storage.
There is no industry standard for indefinite storage, as it very much depends on the use case. In practical terms, “indefinite” need not mean “forever”. Rather, it means to hold data without a specified retention period.
In practical terms, however, most chief information officers would interpret this as beyond the lifespan of standard storage technologies. In some industries, critical data need only be kept for a few years, but in others it will mean the expected lifespan of an individual, or the predicted working life of a piece of equipment, with a few years’ margin on top.
The challenge is that few electronic storage media are designed to keep data safe and accessible for very long periods.
Expected working life of components
Manufacturers specify the expected working life of components such as hard drives or SSDs. A typical “consumer” hard drive should last for three to five years. Enterprise-grade drives might last a little longer, perhaps seven years. SSDs are theoretically more durable, with a design life of up to 20 years.
However, much will depend on how storage media is used. SSDs will wear more quickly if the application makes a lot of writes, for example.
And, as Freeform Dynamics’ Tony Lock explains, storage arrays can theoretically carry on working forever. As data is stored for longer, it becomes a question of hardware management, monitoring for faults and swapping out components as they age.
“There are lifetimes on equipment,” he says. “As the kit gets older, you have to accept there will be more chance of failure. How important is that information to you and what sort of data protection do you add?”
On-premise RAID systems are designed to add exactly that protection. And the “hyperscaler” cloud providers, which use large quantities of low-cost hardware, will even swap out whole aisles or even whole datacentres as hardware nears the end of service life.
Increasingly, this allows customers and cloud service providers to swap out traditional, but less flexible, long-term media such as optical drives or magnetic tape. Tape, in particular, needs careful physical management if used for long-term storage.
Why do we need indefinite storage?
As organisations look to extract more value from their data, and storage costs fall, there is a clear trend towards keeping more data, for longer. Firms might want to use data for advanced analytics, or to train artificial intelligence systems.
There are also regulatory demands to keep data for longer. Healthcare and financial services are just two areas where organisations can be required to keep records for the lifetime of the customer or patient, and a number of years after that.
In the UK, for example, a patient’s record must be kept for 10 years after death. Organisations that need a 360° view of the customer, under fraud prevention laws, will also need to keep data for longer.
Even education sector data, such as degree transcripts, need long-term retention. Manchester University, for example, holds electronic records for its students from 2007, and has paper records going back to before 1978.
More broadly, manufacturers, distributors and retailers need to keep product origin and safety information for longer, for environmental and safety reasons.
A design life of 40 to 50 years is not unusual in industrial equipment or transport. Operators need to access maintenance data for servicing, or in case of unexpected failures.
The IT systems used to maintain equipment in the 1980s are very different from those in use today, and those we will use in 40 years from now will be different again.
“If you look back in history to 80 years ago, we didn’t have this problem. It was a paper problem,” says Patrick Smith, field chief technology officer for EMEA at supplier Pure Storage.
“Fast-forward another 80 years, and you will expect to see several paradigm shifts in that time.”
And the need to store data for longer is coupled with growing datasets, as Smith describes it, with each subset of data, such as component, manufacturer, location, materials, manufacturing process and dates adding to the exponential increase. The challenge is to create ways to store data that can cope with that growth, as well as the typical hardware refresh cycle, without the need to move data wholesale every three to five years.
“If you look at the healthcare world, the aim is to store the data in a format that is not tied to any particular software package so we can go and retrieve it in the future,” says Smith.
This is likely to mean a further level of abstraction between hardware and data, as well as new data storage technologies.
Options for indefinite storage
Options to store data beyond the design life of current IT equipment range from the simple – good hardware management and ensuring redundancy – to cutting-edge science.
Among the more extreme options are using data etched by laser into glass, developed by Microsoft as Project Silica, and DNA-based storage. This, if it can scale, promises very high-capacity durable storage.
But in the near term, the emphasis is on improving the durability of storage media such as flash, and ensuring future applications can read data from current storage media. Even if IT teams can copy – and keep copying – data to newer media, this is of little use if the data cannot be read.
For this reason, the industry has developed common formats, such as PDF/A (which dates back to 2005) and self-declaratory data, such as the self-contained information retention format, or SIRF.
These data formats allow for software obsolescence. Chief information officers can exploit the fact that storing data for longer is becoming easier.
“If you look at data five, 10, 15, 40 or 100 years ahead, the platform is going be different, the hardware will be different, the software is going to be different,” says Freeform’s Lock. “That is even if you can physically see the bits and bytes.”