ARTICLE ARCHIVE
All taped up

Everyone thinks that tape is a dull topic -- until they lose some essential data and everyone's screaming for the backups. What are the current options for tape storage, and what changes will we see in the future?

Published in Technology & Business,
July 2002

While PC storage options have expanded rapidly in recent years with the emergence of optical formats, tape remains by far the dominant medium for serious corporate computing.

Why? Portability, high capacity, speed, and reliability are all important factors in tape's continued dominance, while new technological developments are positioning tape for a new range of applications.

Because they are robust enough to transport from site to site while still storing several servers' worth of data, tapes remain a flexible choice for companies that want to ensure essential business information isn't put at risk. The low cost and high capacity of hard drives has made them an appealing "cheap and cheerful" backup solution for individual systems, but conventional hard drive technology lacks the reliability offered by even decades-old tape systems. Just try popping a hard drive in your jacket pocket and racing to the lift.

Tape's portability factor appeal has been threatened in recent years by the emergence of optical media such as CDs and DVDs, which offer an equally robust format and can, in many cases, be read by a wider variety of hardware. However, optical formats are yet to match the volumes which can be offered by most tape systems, and -- in the case of DVD -- have been bogged down in industry debates over standards.

The earliest tape drives supported capacities of 1.4MB -- a massive amount of data half a century ago, but equal to a floppy disk today. Capacities, however, have grown rapidly since that time. Most modern formats typically offer a base size of between 40GB and 100GB, and this capacity continues to expand. IBM recently demonstrated tape technology capable of storing a terabyte of data on a single cartridge, which it hopes to develop commercially over the next few years, while Sony's plans for a Super-AIT format (S-AIT) extend to a four-terabyte capacity by 2007.

A less often discussed but equally important aspect of tape usage is speed. When dealing with large volumes of data that need to be copied in a relatively short time frame, transfer speed can prove a more important aspect than capacity (especially since auto-changers make it relatively simple to use multiple cartridges). Transfer speeds can be further enhanced with the judicious use of compression. Conversely, excessive speed may lead to unreliable data storage, especially on older equipment.

And older equipment abounds. While capacities and speed continue to improve, much tape technology remains unchanged from its origins as an audio storage format more than half a century ago. One study by tape industry research firm Freeman Reports found that nine million quarter-inch linear tape systems are in use around the world. Because reliability is a more important consideration in most backup applications than technological innovation, this lack of development is viewed positively by many tape users.

Although tape-based backup has a long history, many businesses have yet to realise the critical importance of data recovery. "Many companies do not have complete backup and recovery systems and business executives have been asking their CIOs and data centre managers what they have to do to enhance them," said Philip Belcher, managing director for storage reseller StorageTek Australia. A study conducted by StorageTek found that 75 percent of organisations believed their disaster recovery capabilities could be improved, while more than half wanted speedier restoration of backups.

Those figures demonstrate why, despite its boring reputation, storage on tape remains big business. By 2006, research firm Ovum estimates that mid-range storage systems alone will be a US$30 billion market. Estimating the overall size of the market is harder, especially given the wide range of technologies already deployed.

Despite predictions of its demise, the role of tape may well expand rather than contract in future years. "Contrary to popular belief, the storage pyramid will reflect a larger role for tape rather than less," the Aberdeen Group points out in a recent white paper. "For financial reasons (relative cost) and fiduciary reasons (safety and security through use of alternative -- and removable -- media), tape's role in backup will not decline significantly. Tape will continue to perform its essential backup function for mission-critical applications -- those database-oriented applications that make up the heart of a typical IT applications portfolio. But tape will do more. Tape matches up very well with emerging demands for the new content that is driving storage demands."

Planning your data storage

A key distinction which is often neglected when planning tape storage systems is between backups and archives. A backup is designed to contain up-to-the-minute data which can be quickly restored in the event of equipment failure; an archive generally contains older data which needs to be retained for legal or other purposes, but which isn't necessarily in active use.

This distinction uses call for different scheduling, software, and (in many cases) media. Backup systems must be able to copy large amounts of data quickly, and restore that information rapidly in the event of a system going down irrecoverably. Commonly, companies will keep multiple backups of each system; the absolute minimum recommended by most backup experts is three, while five (one for each working weekday) is a more common number.

To achieve a balance between speed and reliability, incremental backups, which copy only data which has changed since the last backup activity, are often used in conjunction with full backups, which copy the entire contents of a system. It's often convenient to schedule full backups for weekends, when network usage is lower, although this isn't always practical in companies which run continuous shifts or need to regularly connect with international subsidiaries.

For backup systems, retrieval and writing speed are critical factors, as backups will need to be conducted on a regular basis. Capacity is less critical, especially if an incremental approach is used; each incremental backup may contain only a small amount of data. Because of the cyclical nature of such backups, media can often be recycled.

Conversely, speed is not such a problem for archives, since these rarely need to be accessed with the same urgency as backups. Media reliability is, however, critical; especially if an archive tape contains the only copy of important business documents. Particularly sensitive information should ideally be stored in more than one archive, but this may not be practical because of cost considerations.

Archives tend to demand higher capacities than backups, since they contain complete sets of information. As with backup tapes, it is important to keep at least one archive copy off-site to allow information to be retrieved in the event of a major physical disaster such as flooding or earthquakes. Because archives don't need to be readily accessible on short notice and often contain critical and confidential business information, secured environments such as safe deposit boxes are often used for off-site storage.

It's important to consider likely network growth when planning a backup system. As well as being able to handle backup of current systems, your plans also need to take into account the volume of archiving that might be required, and the possible influence of company expansion and new technologies. For instance, the emergence of e-mail as a key business communications tool has significantly expanded the volume of data which needs to be archived by businesses. Such archives can take on great legal importance, as Microsoft has discovered during its long-running antitrust battle with the US Department of Justice. Other applications that may increase the volume of data to be stored include voice-over-IP systems and networked e-learning systems.

In both fields, capacity can be increased by the use of compression systems, which reduce the amount of data that needs to be stored and transferred. (When comparing transfer speeds, you need to be sure that the rate you're assessing is the raw transfer speed, not the one which incorporates compression). Typically, tape storage systems offer a maximum compression ratio of around 2:1, but this may be difficult to achieve in practice, especially if the data you are storing is already subject to routine software compression.

As tape speeds continue to increase, a third category of tape applications is also emerging: near-real time access for large repositories of data which users need to be able to access on a regular basis, but which aren't utilised frequently enough to justify the expense involved in high-speed disk storage. This trend has been particularly pronounced in multimedia applications, which have very high-volume storage requirements. In such applications, you may want to minimise the use of compression unless your users have access to systems powerful enough to handle this on the fly.

Whatever applications you see tape being used for in your organisation, it's important to make an accurate assessment of costs before installing or updating a system. This needs to include ongoing media costs. A system which has relatively cheap drives may cost you a small fortune in media, especially if the basic capacity of each cartridge is low. Conversely, a high-capacity system that isn't fully utilised may mean you're wasting money. You also need to assess the amount of staff time which will be required to keep the system up and running.

Working with software

A relatively slow innovation cycle for hardware means that software is the area in which the greatest advances have been made for tape storage in recent years. Some of these software systems are also being tooled to take advantage of non-tape media, but for the near future tape is expected to remain dominant in enterprise applications.

The market for storage software solutions continues to expand at a rapid rate. IDC estimates that the storage resource management (SRM) market was worth US$1.8 billion in 2001, and will rise to US$4.6 billion by 2006. In 2001, the overall storage software market was worth US$4.9 billion, according to Gartner Dataquest, up three percent on the preceding year.

"The change in the world economic environment has put more pressure on vendors to provide a clear and compelling market vision that effectively positions the solution against competitive offerings and provides customer references confirming the advantage," says Gartner Dataquest analyst Carolyn DiCenzo. "The opportunity for software vendors remains strong, but customers will be more careful with expenditures [so] vendors must bring more value to the table."

As well as ensuring that data is backed up safely, efficient software can cut business costs, by reducing the number of staff required to manage and monitor storage systems.

Those staff may require quite specialised knowledge. Getting to grips with enterprise storage will probably mean brushing up on your Unix skills. While most popular storage management packages have been ported to Windows 2000 in recent years, the various corporate flavours of Unix remain the dominant area for software development.

The most widely discussed software innovation in recent years has been the development of storage area networks (SANs). Definitions of the concept differ somewhat, but it's generally agreed that a storage area network allows communications between dedicated storage devices and existing servers, allowing any-to-any connections rather than restricting backups or archives to one single path. While a SAN will require both hardware and software components, it's the software element that is critical to effective operations.

To add to the confusion, systems incorporating a SAN also generally include network-attached storage (NAS). NAS is a more straightforward concept: it simply means a storage device with its own network address, rather than one which forms part of a separate application server.

The demand for SANs has emerged in part because it's becoming increasingly difficult to create effective backup systems using conventional attached storage approaches. "Networked storage of some kind is often the only feasible option since the extremely high concentrations of data can make direct attached deployments impossible," backup software vendor Legato pointed out in a recent study. "Simply put, the servers just don't have enough I/O slots into which to wire all the storage."

The increasing popularity of IP networks has also made it more feasible to create multiple server-to-storage or system-to-storage connections, rather than relying on single-point backup solutions. SAN approaches are also useful for systems which are frequently offline but which still require regular backups, such as notebooks.

One key advantage of networking storage systems is that it allows for much faster data recovery in the event of a problem, since machines are already networked to the system where data is stored. "Companies are now realising that if they lose information that is backed up on a daily or weekly basis, it represents significant commercial loss," says Ovum senior analyst Graham Titterington. "Networked storage technology allows a much greater chance of business continuity by storing all information right up to the incident."

One problem for Australian companies seeking to utilise wide-area network storage is the absence of low-cost, widely dispersed fibre optic networks to transmit the large volumes of data required by storage applications. Of course, companies which have built their own wide-area networks are better placed to meet this requirement, but even then capacity issues may be a concern.

Most storage vendors believe that network-attached storage will become the dominant approach over the next few years. "This shift is being accelerated by advanced software that takes responsibility for automating and managing the expanding volumes of information living on various types of networked storage," EMC CEO Joe Tucci comments. "The resulting cost and productivity advantages customers can realise from automated network storage will continue to marginalise the role of traditional direct-attached storage and the major server vendors who cling to that approach."

The push towards networked storage is also increasing demand for other storage technologies, such as tape backup. "The increasing adoption of SAN and NAS network storage solutions continues to be the dominant driving force in the growth of automated tape libraries," says Robert Abraham, an analyst for Freeman Reports.

Another emerging software application for tape backup is the use of disk-based tape emulators. These allow companies to create additional backups to disk, quickly and at low cost, using the same systems they already use for tape backup. Disk-based data can be restored much more quickly than tape, minimising downtime in the event of problems. However, industry watchers warn that excessive reliance on disk-based systems as a substitute for tape may cause problems. "Disk replication technologies such as mirroring are not replacements for tape backup," notes the Aberdeen Group. "Mirroring protects against a component failure, such as a hard drive crash. It does not protect against data corruption (for example, due to viruses, user errors, hacking, and latent software defects)."

A further concept to consider is hierarchical storage management (HSM). An HSM system will automatically migrate data from live systems onto tape backups or other media according to preset policies such as the date a file was created, the volume stored on a particular server or the file type (.EXE files might be excluded, for instance).

Tape has an important role to play in HSM systems, as it is normally used as the final archival medium. A typical HSM system will also utilise other technologies such as RAID to provide a complete storage solution, and will work in conjunction with existing backup and archive systems. These remain essential for general disaster recovery purposes; HSM gives greater flexibility and can help reduce storage costs by better utilising existing system capacities. Effective HSM systems will usually involve some form of storage area network as well, to enable effective communication between different devices.

A final cautionary note: whatever software solutions you use, they should be audited on a regular basis. You don't want to discover your restoration software is flaky at the same time you need to restore vital data. It's also important to make sure you have backup copies of the backup software itself, in the event you need to reinstall it.

Outsourcing

If all this is starting to sound like a lot of hassle, outsourcing may be an option. Indeed, because storage technologies are relatively stable and their basic application well understood, storage is often a prime candidate when businesses consider outsourcing all or part of their IT systems. Outsourcing also simplifies the problem of ensuring backups are stored in a separate location, since the outsourcer will automatically be maintaining a copy of relevant data in an offsite environment.

Before committing to an outsourced storage provider, however, it's important to check the company's background thoroughly. Make sure that the company has an established history in this area, and ask to inspect the facilities where backups are kept. How quickly can backups be restored in the event of a problem? Are there plans in place if one of the outsourcer's facilities can't be accessed? In some cases, companies may choose to keep one backup system on site and another externally, even though both are managed by an outsourced provider.

Ovum is predicting that outsourcing of storage will continue to grow. "This represents a change in management thinking as they realise storage is a non-core business activity with better efficiencies provided by external suppliers," the company commented in a recent white paper.

Choosing your tape format

Tape drive formats can be categorised into two broad categories: linear and helical scan. Linear systems work in a similar fashion to conventional reel-to-reel audio tape recorders, where tape is sent past the tape head in a straight line using a simple transport mechanism. Helical scan systems are more akin to video recorders, and use an angled head and more complex tape transports.

Proponents of helical scan point to its higher density and lower failure rates, but linear systems continue to be deployed in many businesses. In part, this is due to the conservative nature of many backup systems; once companies have a system in place which works, they are often reluctant to change it, and linear systems have been available for much longer.

Linear systems
DLT:Digital Linear Tape (DLT) supports capacities of up 70GB by using a special compression algorithm, DLZ1. Its successor, SuperDLT, offers capacities of up to 100GB, and transfer rates of up to 10MB/s.

LTO:Linear Tape-Open (LTO) was developed as an open standard for tape storage by Hewlett-Packard, IBM, and Seagate. This allows equipment from different manufacturers to interoperate. It incorporates two sub-formats, Accelis (which offers higher retrieval speeds and is suitable for backup applications) and Ultrium (which is slower but has higher capacities, and is designed for archiving). The format includes hardware-based compression. Accelis capacities start at 25GB, Ultrium at 100GB.

Travan:Travan drives are noted for their simple mechanics (just two moving parts in the drive), and are generally used for entry-level backup applications. They offer a maximum capacity of around 20GB and transfer rates of 2MB/s, although 2003 may see the release of higher capacity models.

Helical systems
AIT:Advanced Intelligent Tape (AIT) has evolved through three generations, starting from 50GB in AIT1 to 100GB in AIT-3 on 8mm tapes. Developed by an industry consortium including Compaq, Sony, and Veritas, it uses a compression algorithm known as adaptive lossless data compression (ADLC), and supports raw transfer speeds of up to 12MB/s. Future plans stretch to AIT-6, which will offer capacities of 800GB, while 500GB S-AIT drives are expected by the end of this year.

DDS:Digital Data Storage (DDS) was developed by Sony as a means of storing archives on the DAT tape format, originally designed for audio and video applications. There are four main flavours, offering storage capacities ranging from 2GB (DDS-1) to 40GB (DDS-4). Sony has stopped active development of the format in favour of AIT, but continues to ship DDS tapes and drives.

Mammoth:A direct competitor to AIT, Mammoth uses 8mm tape to store up to 40GB of data at a rate of up to 6MB/s, if the format's Improved Data Recording Capability (IDRC) compression algorithm is used. Its successor, Mammoth2, offers up to 60GB and transfer speeds of 12MB/s without compression.

VXA:Developed by Ecrix, VXA uses a unique method, known as the Discrete Packet Format (DPF), which breaks up data into individual chunks before storage, rather than placing it contiguously. VXA cartridges can store up to 66GB of data.

Autoloaders and tape libraries

Whatever tape format you choose to use, tape drives are available in a variety of formats. Here's a brief overview of the most common options.

Single drives: For simple backup or archiving applications, a single tape drive can be attached internally or externally to an existing server, or to a machine dedicated to running backup software. If multiple drives are required, rack-mounted systems are useful for saving space.

Autoloaders: Autoloaders generally comprise a single tape drive, which can automatically switch between multiple tape cartridges. These are useful when your backup program requires the use of more than one cartridge in each backup cycle, but don't offer the same flexibility as libraries.

Libraries: Tape libraries include multiple tape drives, all of which can access any tape cartridge currently being used by the system. This allows for faster data storage and restoration. However, libraries are an expensive solution if your backup needs aren't extremely high volume, and may suffer from mechanical failure.

BACK TO THE GUSWORLD WRITING PAGE