For the U.S. military, the answers to such questions are of more than passing interest due to advances in ISR. ISR sensors and technology in such systems as Gorgon Stare and Blue Devil generate a deluge of data every day. The video streaming into the military’s DCGS, or Distributed Common Ground System, totals over 7 terabytes daily.
“That works out to about 1,600 hours of video exploitation that we would collect on a given day. And we look at that. It’s a huge effort,” said Colonel Michael Shields, chief of the capabilities division of the Air Force ISR Agency.
That data has to be stored somewhere. The Air Force ISR Agency does so for about 30 days. After that, the information may end up in various national three-letter agencies, such as the NSA, NGA or DIA. The decision to use this approach means that bridges to these deep data stores must be in place. This helps ensure that the data ends up in the right place, without storage-devouring extra copies being produced. It also means that an analyst who wants access to older information can get it, as each bridge is a two-way conduit.
In this storage scheme, multiple systems operate at different archival levels and depend upon different technologies. Because they’re built using semiconductor memory, solid-state drives offer the fastest access of the three broad storage categories. However, they’re the most expensive and provide the smallest storage capacity. Magnetic tapes are the slowest class of storage, but are the least expensive and offer the greatest capacity. Spinning disks, familiar to any personal computer user, fall between the other two groups in terms of cost, speed and capacity.
The different storage technologies all play a part in achieving the desired result. Indeed, this is true even if the effective data retrieval rate from a technology is far slower than the means by which that information is distributed to analysts. “There are different layers or echelons of storage that could, at times, be slower than your transport mechanism. But that, hopefully, would be a conscious decision. The stuff that you need to make an immediate call, like a troops-in-contact type of decision, would be available immediately. You wouldn’t have to pull it up from archive,” Shields said.
As for the future, one clear trend is that more of all categories of ISR data storage will be needed. Argus, a next-generation ISR system, will produce the equivalent of 85 years of high-definition video in a 24-hour period. That is a lot of imagery to archive and analyze. Thus, there will be an even greater demand for storage, requiring advances in technology, architecture and methods.
Vendors are helping the military meet the challenge of handling ISR generated data. For DataDirect Networks (DDN) of Chatsworth, Calif., the issues raised have a familiar look. The 15-year-old company began its existence by creating products for another area that demands a real-time ingest and handling of high-value but short-lived events.
“We started out building systems that people could do live, on the air broadcast off of. Losing so much as a bit or missing a beat was not an option. This is the same in ISR,” said Laura Shepard, director of marketing for high-performance computing at DDN. Other areas with similar characteristics include the oil and gas industry, high energy-physics and radio astronomy. In all of these, as in ISR, an event occurs and that event cannot realistically be reproduced. Therefore, all of the data has to be ingested as it occurs, without the dropping of a single bit.
The company’s technology is found in the Navy’s new ISR aircraft. The data captured by sensors aboard the Boeing 737 derivative is held in ruggedized units that each provide more than 80 terabytes of storage with a 5.0-5.5 gigabyte per second sustained throughput. Upon landing, the mountain of collected data is moved off the aircraft by relocating drive canisters from a mobile to a land-based system for readout. This approach removes any network bottlenecks and so helps ensure a more rapid turnaround of the plane, according to David Bunker, senior director of government programs at DDN.
Speaking of avoiding bottlenecks, DDN’s success is due, in part, to technical choices that the company made in meeting its customers’ needs. For instance, in solving the original live broadcast problem the company’s engineers came up with an architecture that beefed up the capabilities on the disk side. They therefore put any bottleneck on the host side of the system. This allowed a guaranteed quality of service and provided the company’s products with the processing power to do such things as fix bad data on the fly as it is read from or written to a disk.
Thanks to this choice of architecture, the company can protect less expensive disks from the silent data corruption that makes them otherwise unusable in mission critical situations. Other features of the technology enable more efficient caching of data, boosting performance.
Bunker noted that at one time DDN used specialized chips in storage systems for processing of data. This task is now being handled by general purpose processors. That brings two benefits. The first is that it will be easier in the future for the storage company to keep up with, and take advantage of, processor improvements. The second advantage is that the use of general purpose chips enables the assigning of up to 50 percent of processor power to apps running on the storage platform itself.
“Importantly, it eliminates network latency. Now we’re able to run analytics directly inside the storage platform, almost in a real-time format. That is a very interesting area,” Bunker said.
Another technology from the company, Web Object Scaler, improves capabilities in another way. An object storage platform, it efficiently stockpiles large amounts of unstructured data. It allows local access to information without the requiring that all of it be copied across a network. The connection between remote and local objects is secure yet allows needed data access, according to Bunker.
Assessing, and Addressing, Cyber
Dave Denson, big data strategist at Sunnyvale, Calif.-based NetApp, noted that the ISR data storage challenge is actually much bigger than it initially seems. Traditionally, intelligence, surveillance and reconnaissance involved sensor and other data rooted in the physical world of air, sea, land and space. Recently, the cyber realm was added to that list. The result is another large increase in the data load.
A 14-hour Gorgon Stare mission generates 70 or so terabytes of data, Denson said. “A single 10-gigabit link for 24 hours at half speed, which is a practical, real number, is over 100 terabytes of data. And that link goes 24/7/365, and it never stops.” Denson added that packet capture of network traffic looks like video from a workload perspective. So NetApp adapted its full motion video solution to the problem.
In tackling such issues, the company exploits modern network connectivity, along with distributed computing and storage. The combination can create some remarkable outcomes. For instance, NetApp did a study comparing these cloud computing techniques to traditional solutions. The study showed an 80 percent power savings of the new as compared to old methods. Those power savings ripple through the system, since this also means that less energy is needed for cooling. What’s more, the systems that are used for data processing during the day can be used for information fusion at night, Denson said.
The infrastructure in such an approach is dynamic, with systems and storage assembled as needed to satisfy a mission. Storage demand is kept in check because a large shared file system sits at the center of the infrastructure. This minimizes storage needs by eliminating extra copies of data.
With regard to storage, there have been recent advances, such as the move from 2- to 4-terabyte drives. But that isn’t enough, Denson said, because over that same span the resolution of sensors has jumped at least twofold and the bit width of the data has also grown. The result is that storage is losing the race to sensors.
Thus, more intelligence will have to be applied to the problem of storage. Eliminating extra copies of data is one way to do this. Another is to make sure that full data is preserved close to where it is consumed but that longer-term data is increasingly abstracted. That is, what starts out as raw full motion video may be processed so that only those frames of greatest importance are ultimately preserved in an archive. The difficulty, of course, lies in determining just which images out of millions should be kept. For that, the right algorithms, storage and processing might be the solution.
The infrastructure shouldn’t be static, Denson said. “Let that composition be dynamic. Now, suddenly, you open the world up.” He noted, for instance, that processing might be done using idle virtual desktop infrastructure. Such systems have access to nearby local storage and this approach keeps the data near the processing site.
Disks that Don’t Spin
Known for its processors, chip maker Intel also offers a line of solid-state drives. The current products from the Santa Clara, Calif.-based company range in size up to 800 gigabytes and provide data transfer rates of up to 6 gigabits per second. One of Intel’s differentiators is the storage arena is the consistent performance of its products, said James Myers, applications engineering and solutions marketing manager for the non-volatile memory solutions group at Intel.
He added that the latency in fetching data out of storage and any variations in it are important. To understand why, consider the analysis of ISR sensor data or any other similar big data problem. Typically, these analytics involve a cluster of server nodes that chew through the data, process the information and then produce a result.
Data is accessed, processed, and then the node works on the next bit of information. Necessarily, there is a slight gap between a processor finishing up one task and starting another. The length of that short pause is determined, at least in part, by how long it takes the storage system to respond to a data request.
“Those waits accumulate and add up quickly. And if you have a variation in those waits, you may have other processes in this clustered node waiting. The waiting usually is limited by the slowest element. So performance consistency in itself can speed up throughput and efficiency,” Myers said.
This need to constantly feed data into a processor can actually make seemingly expensive solid-state drives cheaper than the alternative. Spinning media hard disk drives are the traditional solution to many storage problems. But hitting the needed performance requires that many hard drives be used. For example, 480 hard drives might be needed to satisfy the demands of a 500 web server application. The cost of the drives would be $150,000. In contrast, using a solid-state approach would only run $12,000 because only 12 solid-state drives would be needed, Myers said.
Thus, while an individual solid-state drive would cost three times as much as a traditional hard disk, the total solution cost of the solid-state approach would be dramatically lower. Less space and power would also be needed.
In November 2012, Intel introduced a solid-state drive aimed at data centers, such as those facilities that would host ISR analytics. The company’s products are not themselves ruggedized to the point that they can meet military specifications with regard to temperature and shock. However, solid-state drives are inherently more robust when it comes to vibration than hard disks, Myers said.
To be sure, solid-state drives are not the answer to all ISR storage needs. For instance, when storing vast amounts of data for archival purposes, other solutions, such hard disks or magnetic tape, would be more cost effective.
However, Myers noted that the cost of solid-state storage is plummeting. When Intel offered its first solid-state drive in 2008, it was a 64-gigabyte disk. Five years later, an 800 GB version can be had for about the same price. Thus, a half decade saw a more than tenfold increase in capacity. Look for that trajectory to continue, Myers said.
A Hybrid Approach
“The solution to ISR storage can be helped by recognizing that there are different dimensions to the problem,” said Geoffrey Noer, senior director of product marketing at Panasas. The Sunnyvale, Calif.-based company tackles big data storage problems, with one of its first customer being the Department of Energy’s Los Alamos National Laboratory.
One aspect of storage for ISR and similar tasks involves large data files. An example might be the video captured by an aircraft. Such data might be accessed largely from beginning to end, with various starts, stops and side trips along the way. However, there’s another storage need, and this centers on small files and metadata, or data about data. While the individual files in this category are comparatively tiny, there may be a lot of them. For this group, there’s a need to randomly handle many different files and parts of files at a relatively high speed.
Panasas’ products use a patented parallel file system in an approach that scales processing, memory, networking and storage together. One way to meet the dual needs for sequential and random access can be seen in the company’s latest offering, the ActiveStor 14. Released in September 2012, it consists of a mix of relatively expensive solid-state and significantly less costly hard disk drives. Metadata and small files are automatically routed to solid-state drives while everything else goes to hard disks. This arrangement boosts performance by matching the strength of each technology to the task at hand. In this hybrid, careful attention is paid to the ratio between the two technologies.
“We’ve typically found that having 1 percent capacity being solid-state is sufficient. That basically doesn’t throw the economics of the solution way out of the window,” Noer said.
Over time, the optimum division between hard disks and solid state drives will shift. This will happen due to changes in the relative price and evolving capabilities of the two technologies.
As illustrated in ActiveStor 14, greater use of tiered storage could be a future trend. It’s one way to put off the day of reckoning that is approaching due to data volume growth outpacing storage capacity. Another way to postpone this deadline might be to adopt technologies and practices that maximize storage utilization. Panasas’ scale-out architecture and parallel processing represent one way to achieve this, Noer said.
In addition to sheer capacity, another critical consideration now and in the future has to do with performance. After all, for ISR and other applications, what’s important is what can be derived from the data. That implies speedier storage is more valuable if it enables processing to be done more quickly.
As Noer said, “If you can shorten your time to results and get value from the data faster, obviously that improves your chance for succeeding at whatever mission you’re in.” ♦
- Issue: 4
- Volume: 3