With strong roots in diverse fields such as physical security, retail marketing and transportation, video analytics technology holds out promise for military and intelligence ISR programs struggling to cope with enormous amounts of video data.
A broad term for a variety of imagery techniques, including motion detection and facial recognition, video analytics seeks to solve the problem of a virtually unlimited amount of visual information and a limited number of human eyes by automatically indexing, characterizing and drawing conclusions from images, and then alerting an operator about its possible significance.
The driving force behind video analytics is simply the volume of video data from surveillance cameras, UAVs and other platforms, which is far more than can feasibly be observed, and so taxing to carefully scrutinize for hours that errors or oversights can easily occur. For hard-pressed operators, video analytics hold out the promise of automatic screening for potentially significant events, thus enabling them to focus on the most important times and locations.
“We don’t want to cut out the human, but we can make his or her job a lot easier by, for example, making sure that they are focused on reviewing things that are truly significant,” said Joe Santucci, president and CEO of piXlogic, an image and video search company. “The software can add an extra set of eyes. It indiscriminately examines everything in the video, even if not strictly relevant to the mission at hand, since it could be important in some mission or to another analyst and would otherwise have been missed.
“A lot of information is captured in these videos, and today there are not enough people to go through it all, so they are largely unexploited. The government has devoted a lot of resources to making sure they can store and transmit videos, but they haven’t really started to get into the analysis of the videos. But we can do that,” Santucci added.
The factor that distinguishes video analytics from other video capabilities, advocates say, is the ability to glean information out of the video, rather than just recording and storing the video and counting on the user to find the information.
“The objective of video analytics is to drive actionable intelligence out of the video itself, and automate that to the extent possible,” said Larry Bowe, president and CEO of PureTech Systems, a provider of geospatial video management software. “The idea is to automate the detection and tracking, and then present that in a way to users that they can make decisions in real time, and also aid in investigation after the fact.”
“There is too much now that we are giving to the analyst. It’s got to be metadata, video ingest and much more complex analytics that can look at all this stuff in milliseconds and make a decision and say that this is what you need to look at. The underlying analytic system needs to present to the operator what they need to see. The operator shouldn’t be figuring out what they need to see; it should be right in front of them,” said Frank McCarthy, director, solutions development video surveillance for EMC.
The recent surge in interest in video analytics represents the revival of a field that drew attention about a decade ago, but subsided when its technological challenges became apparent. Since then, however, software algorithms have become far more sophisticated, and cameras and computer hardware more powerful, enabling much more robust capabilities.
“There was a recognition of a need, especially after 9/11, because it’s so impractical for people to watch cameras constantly,” Bowe said. “But there was a lot of overhype in the market in terms of the capabilities. The market was attempting to meet demand, but the technology wasn’t quite there. Some companies threw a lot of money at it, but they weren’t able to deliver on their promises.
”Some of that was due to the software algorithms that were needed, but also because of the lack of commercial hardware that was affordable and able to run sophisticated algorithms. If you need a huge computer to run the algorithms, it’s not very practical. But as computers have advanced, we’re seeing more power at a lower price, which enables running more powerful algorithms,” he added.
As a result, backers of video analytics see a major opportunity to convert technology developed to track shoppers through a store, for example, to support military and intelligence missions. But they also warn, as is true with a lot of such consumer-focused technology, that the benefits of these programs need to be balanced with uneasiness over privacy and civil liberties.
“There is a crossroads of non-military technology and military needs. The kinds of things that are being done for retail, automotive and other purposes can provide a huge number of use cases in the military space,” said Mike Flannagan, vice president and general manager of Cisco’s Data and Analytics Business Group. “There is so much potential to do good things using these advanced analytics and machine learning techniques. But those technologies also come with the responsibility to respect people’s privacy and concern over intrusiveness. So it’s important for those using these technologies, whether in retail or defense, to find a balance between using the technology for good things without being unnecessarily intrusive.”
PureTech recently released the latest version of its PureActiv geospatial video management and video analytics software, which provides new detection capability through the addition of advanced video analytics, map-based user features, advances in metadata collection and playback and a wide range of security sensor integrations.
The company’s focus on geospatial information sets it apart from others in the field, Bowe said, as does its ability to analyze long-range video over water or land. “We focus on a high probability of detection with a low false alarm rate, which is key to success. The geospatial understanding aids in that tremendously, and gives us a significant advantage,” he noted.
“We tie the pixels in the image space to the terrain so we know where the pixels are hitting the surface of the earth. That way, we can give a range to the target detected and give an indication to the end-user of where that target is so that they can plan an intercept. The base of what we’re doing is analyzing the content of the video and deciding what rules have been violated that are of concern,” Bowe explained.
PureTech concentrates on three classes of applications: shipping ports, international borders and perimeter protection, including military bases. While the issues involved in perimeter protection are similar to those in retail surveillance, for example, the challenges are greater when studying long-distance video of ports and borders.
“For one thing, the accuracy of pointing is different at 100 meters than at five miles. Also, you have to have cameras able to reach out far. The capabilities and cost of a standard surveillance camera don’t compare with those of a camera doing border patrol,” Bowe said. “You also have challenges in the imaging processing, including being able to stabilize the image at long range and deal with the atmospheric interference. Your analytics have to be capable of consuming that information and making meaning out of it.
“With more computing horsepower comes more capability, so accuracy and capability to identify target types will continue to grow,” he continued. “Now, you can identify a person, for example. But down the road, we will be able to distinguish between two different people and be able to track one in particular.”
Another company that emphasizes the role of geospatial data is Agent Vi, an Israel-based video analytics provider. Its technology works on stationary surveillance cameras, which can detect the exact location of any point within their field of view.
“The core of what we do is to take the video stream and extract meaningful metadata in a fully automated way. It is a description of every frame in a video stream telling us the list of objects in that field of view, and different types of attributes for each object. The size, shape, speed, position and direction of movement are extracted automatically in real time,” explained Zvika Ashani, the company’s chief technology officer.
“The second stage is to analyze the metadata, such as rules-based real-time analysis, which is intended to detect events,” he said. “For each camera, a user would configure for one or more rules, such as to provide an alert when a person approaches a fence, a vehicle parks in a no-loading zone or a crowd starts to form. We have rules that the user can configure. We then analyze the data in real time, and if we discover that any object has violated a rule, we can send out an event to the user.”
The system also provides forensic search capabilities, Ashani said. “We can take the metadata, store it in a database, and enable an investigator to perform queries—rather than the normal method of reviewing days of video from multiple cameras. The user can specify all video clips with a large white van, for example. Within seconds, we can scan the metadata and find any objects that meet that criterion. When used for investigation, it’s a great time-saving tool.”
PiXlogic’s software is able to automatically process a still or video image without knowing anything about what the image contains, said Santucci, whose company has received funding from In-Q-Tel and works with the intelligence community.
“Our system can segment the contents of the images in a way that makes logical sense. If you imagine an image of someone sitting at a desk, the software will identify all the areas where we have enough contrast difference to be able to pull out the outlines of the things in the scene. It can segment out your shirt, hair, face, coffee cup or painting in the background,” he said.
The software creates descriptions on the fly, characterizing the location of the item and other properties. “We are generating a lot of metadata automatically, and as it does that, the software is also reasoning about what it sees in the image,” he said. “If it understands that it is seeing something that belongs to a set of items, it will automatically tag the item. If the sky is in a photo, it will identify the sky and tag it with a keyword.
“We call these things ‘notions’ because they are broad categories,” Santucci continued. “A car, for example, can be in many different shapes and colors. But we can understand the idea of what a car is, and recognize and tag it. It can also recognize specific items of interest to the user, such as a particular make, model and year.”
The result is a very rich set of metadata, he said. The software can also recognize faces in general as well as specific individuals, and even text in different languages. The metadata can then be exploited—for example, by searching through a large amount of video.
“Another value point is that people are storing lots of material, some of which is not so interesting. You could use the software to decide which parts of the total video are really important and interesting, saving them and getting rid of the other stuff, thus reducing storage costs,” Santucci said.
For Flannagan of Cisco, which offers a product suite called Video Surveillance Manager, some of the greatest value of video analytics comes when it is used in combination with machine learning technology.
“There are different ways that people can use video and analytics on the video frames,” he noted. “Depending on what you are trying to accomplish, there may be a variety of different things that you would do. There are some basic functions, such as the number of objects moving through a frame and whether something is present or not present, such as a high-value item on a retail shelf, for example, or the presence of a person in a place where no one is supposed to be.
“Those are what I would consider basic video analytics, which is using movement through a frame of video. But there are also more advanced analytics that are being done with video that involve things like anomaly detection. That’s not just video analytics, but also combining it with machine learning,” Flannagan said.
To illustrate the potential benefits, Flannagan used the example of video surveillance cameras on a freeway, where video analytics can answer questions such as average speed or breakdown time. But what it can’t do is tell you whether these factors are normal, or unusual enough to merit closer attention.
“With machine learning, I can tell you when, and over time, how fast traffic should be moving at this time of day, and how fast is it actually moving. From that, I can tell you if we are seeing an abnormal traffic pattern,” he said.
“The ability to detect anomalies is where video analytics starts becoming really interesting in public safety,” Flannagan noted. “Is someone normally in this space at this time of day? How many people do we normally count standing in front of a bank in the middle of the night? How many people do we normally see with huge backpacks at the finish line of a race? Those things might enable you to detect something anomalous.”
As a leading data storage and virtualization company, EMC’s focus is on the infrastructure needed to support video analytics.
McCarthy noted, for example, that he company’s “edge to core” architecture can help systems manage information from large arrays of sensors. “What we’re seeing is that as those sensors become more complex and capable of doing more at the edge sites, including cameras, then it’s important for the edge sites to have some local storage as well as pre-processing capability, so that you can massage the data at its source, put it in a format that is less network-intensive, and get it back to a central location.
“You’re using the edge sites to prepackage the data so that it’s more efficient when you get to the core,” he said. “The sizing of the infrastructure, whether a hypervisor for virtualization or a regular server or computer, is really important because you have to maintain the existing production workload, as well as handle all that preprocessing that is going on in real time or near-real time.
“When it gets back to the core, we have a highly scalable, flexible platform that offers hyper consolidation with our converged infrastructure offerings, as well as scale out storage capability. Embedded technologies that lend themselves to analytic processing help you move to the next thing. They meet today’s requirements, but they also allow you to build and add on,” McCarthy said.
EMC is also working to get other providers in the surveillance space to become more descriptive in their data sets from a metadata perspective.
“Some camera providers today can do what they call video content descriptions, where as the camera is looking at a field of view, they can run analytics in it,” he said. “Some of the higher-end cameras can run the analytics in real time, and most of them work pretty well. Some are even getting into demographics. They can be pretty comprehensive, looking at size and color, and describe what’s going on in the field of view.
“But what we’re asking is that instead of putting that out as video snapshots, they create a metadata stream that is much easier to handle as a data set. Then we can take that and massage the data so we can input it into an analytic engine. It will be important for companies like us in the future to have a platform that can run the applications that support the normalization of the data that is coming in, whether video or metadata,” McCarthy predicted.
“There are some standards out there today in the video surveillance industry, but they are pretty loose, and there is nothing about metadata creation and context,” he added. “That’s something we hope we will see in the future, and our company is pushing for those kinds of standards. That will help us simplify the whole equation of video and metadata analytics in the video surveillance environment.”
Another major player in this field is Raytheon, which has combined analytics from specialized tools usable only by select image scientists into a suite of analytics called Intersect, which is designed to be fully accessible to a much wider set of analysts.
Elements of the suite include Intersect Reveal, which automates basic full-motion video (FMV) analytic functions and fuses the resulting data using a multi-INT context accumulation engine. Reveal automates the registration, tracking, classification and indexing of video, delivering increased content with fewer analyst work hours. In parallel, advanced analytic algorithms rapidly sift through massive amounts of data to provide important context about the source video.
In addition, Intersect Dimension automates the creation of high-resolution 3-D imagery from low-cost commercial 2-D imagery, and has recently been upgraded to support 3-D video as well as still images. High-resolution 3-D then forms a geo-precise foundation upon which additional content can be added.
The capabilities increase video analyst productivity by automating common tasks. “While some analytics may not replace the human eye, they can certainly replace valuable hours the video analyst spends on mundane tasks. Oftentimes, they must manually correlate video to other data sources by looking at the time and location of the video, then searching dozens of other databases to find information associated with that time and place,” a company spokesman explained.
A key technology underlying these capabilities is precision geolocation. Raytheon worked from early in development to enable precise relative geolocation and registration of frame data. Once the data are well-registered, software extracts 3-D information from multiple images or video frames, producing 3-D data sets that can serve as a foundational layer for fusion of data from other collectors, as well as a value-added data layer enabling volumetric and line-of-sight analysis.
Unique algorithms in the system correlate video to other data sources. A feature called Intersect Relay, for example, produces location information icons directly on moving video, like a pushpin on a map.
The spokesman cited two examples of advantages provided by the system, including addressing geopositioning errors common to FMV data. For that, Raytheon developed the Video Image Photogrammetric Registration algorithm, which corrects FMV geolocations in order to make the data useful for targeting and multi-INT fusion.
“Second, we have increased the productivity of multi-INT analysts through the use of Intersect Reveal,” the spokesman added. “Reveal uses a complex ensemble of analytics, including moving vehicle track extraction, signal-to-track correlation, relevance analytics, which determine the most relevant information related to a video, and visual analytics, to improve analyst productivity—both time to decision and decision quality.” ♦
- Issue: 2/3
- Volume: 13