/ / / / GIF 2014 Volume 12 Issue 5 (July/August)

Open Source Goes Multi-Int

The unrest that swept the Middle East in the Arab Spring of 2011 not only caught the U.S. national security community by surprise, but also brought home in a dramatic way to intelligence analysts that they needed to expand their traditional sources of information by incorporating social media and other open source material into the overall intelligence picture.

Decision makers would have been better informed had intelligence analysts paid attention to what was happening on Twitter, Facebook and Instagram, while relying less on traditional human, signal, imagery and video intelligence feeds, critics suggested. Since that time, however, the intelligence community and industry have been making strides in incorporating data from social media and other open source platforms into the broader multi-INT picture.

Social media information and open source intelligence (OSint)—or “data in the wild,” as some call them—can be thought of as an additional set of sensors on the ground that complement traditional intelligence sources. Intelligence analysts can take advantage of the comments and images posted by thousands of individuals to learn of the occurrence and understand the significance of events on the ground, as well as the sentiments of strategically located populations that likely would not be picked up by traditional intelligence means.

The theory behind fusing and correlating data from multiple sources is that richer and more diverse data sets improve the analytic possibilities. The same goes for non-traditional intelligence streams such as open source.

This becomes all the more important in an era in which U.S. adversaries are as likely to include non-state actors as other nation-states. In a national security environment that includes concerns over terrorism, piracy, human trafficking and money laundering, many of the most important data feeds that contribute to analysis and targeting are not under the control of governments.

From a multi-INT perspective, social and other open source data represent one more source that can be used to complete an intelligence picture. They can be used to cue other traditional intelligence assets to focus on phenomena revealed by the analysis of social data. In some cases, such as analyzing where the focus of humanitarian assistance needs to go, social media data could very well provide the primary source of relevant information.

Beginning with operations in Southwest Asia, the U.S. military set up human terrain teams to add color to other intelligence reports. These analysts used material from social media sites, blogs and websites to supplement data from traditional intelligence sources.

Predictive Model

The ultimate value of OSint may be in contributing to a predictive model of intelligence. If analysts can anticipate the actions of potential adversaries on the ground, decision making can become that much more effective and efficient.

“In the last 15 years we have moved from a Cold War intelligence perspective, when the focus was on orders of battle, to one focused on asymmetric threats and non-conventional actors,” said Gary Raven, director of research and engineering for Textron Systems. “In such an environment, social and cultural factors need to be included in any operational picture.”

“The data generated in open sources, all of the pictures and comments that get posted, can be thought of as sensors and can act as significant force multipliers,” said Rob Smith, vice president for C4ISR at Lockheed Martin. “The world is constantly changing with new threats and risks coming about.

“Having a human sensor that you don’t have to pay is becoming more and more important. People’s habits are also changing. The first thing they do is grab their smartphones to take a picture or post a comment. There is a growing opportunity to leverage that information for the sake of making better decisions,” Smith added.

To be sure, adding vast new volumes of data to an already crowded intelligence field only exacerbates the perennial big data problems with which intelligence systems and analysts already have to deal. “You are trying to find the one or two posts or Tweets that are important to your mission,” said Mark Bowersox, strategic product manager at Exelis Visual Information Solutions. “This is naturally going to require automation of the processes involved.”

There is also the question of validating information gleaned from social media, since open sources can also be the perfect tool for disinformation. “There are a number of techniques that can be used to validate OSint,” said Peder Jungck, vice president and chief technology officer at BAE Systems Intelligence and Security. “Keeping track of the history of open source contributors is one way. Validating information with other forms of intelligence is another especially when a new actor comes online. The language of posts can also be analyzed for evidence of deception.”

One important aspect of OSint is that most content is accompanied by temporal and/or geospatial metadata, which offer an opportunity to verify accuracy because they allow for an analysis of whether the actor in question was in a position to make his claimed observations.

“The way we are dealing with OSint is by treating it as another form of intelligence,” said Mike Manzo, director for GEOINT mission processing and exploitation at General Dynamics Advanced Information Systems. “When detectives interview 10 witnesses, they typically get 10 different viewpoints of a crime, and therein lies the danger of relying on social media. That’s why we roll it up in the processing, exploitation and dissemination cycle so that it becomes a part of an evolving intelligence picture. It can help make decisions on how best to reposition other collection assets to hot spots.”

But there are certain aspects of intelligence where OSint has the potential to stand out, notably in sentiment analysis of important populations. Analyzing open source material enables the identification of how key groups feel about certain issues.

“This is something you can’t get by taking overhead images,” said Bowersox. “The density and frequency of social media contributions are indicators of what a population is seeing in a particular geographic area and how they feel about it.”

“Civil affairs and human terrain teams have as part of their mission the development of a deep understanding of the social and cultural environments in which they operate,” said Raven.

Real-Time Reports

Another attribute of social media sources is that they provide real-time reports of activities. “This can contribute to a commander’s situational awareness before we put warfighters in harm’s way,” said Jungck. “The same kind of feeds can be an indicator whether a certain operation is proceeding successfully, also in real time.”

“Every transaction on the Internet is related to one or more persons as well as multiple other transactions,” said Frank Purdy, vice president of corporate engagement at Logos Technologies. “Networks are the way organizations operate, including those not friendly to the United States The value of these data sources to solving problems is to get to an anticipatory state of understanding. It is about harvesting knowledge at the pace of the mission.”

The challenges associated with incorporating OSint into multi-INT platforms are similar to those involved in fusing other forms of intelligence. “It is very similar to running three disparate sensors on an aircraft that are looking at different things and have different requirements,” said Sean Love, director of business development for integrated intelligence systems at Northrop Grumman. “It’s the same here on the ground. You may have Twitter data combined with standard Web crawler searches, combined with open source geospatial data. They all need a way to talk to each other.”

The data from various OSint sources need to be conditioned in order to make them purposeful for analysts. “It is a question of tying all the information together,” said Love, “not just displaying them in the same bucket or showing a series of dots on a map.”

“If data from different sources don’t correlate, they won’t provide value,” added Jungck. “Data needs to be cross-referenced. If we have a Tweet coming from a specific location, we need to know what other intelligence sources are observing at that same location. That is also important in validating open source intelligence.”

Data from different data sources, including social media platforms, must be normalized so that they can all be used together, noted Bowersox. “They need to be time stamped and geographically co-registered so that data from the same time and place are presented to the analyst,” he said. “Much of this involves standardizing metadata elements.”

Incorporating OSint into the multi-INT picture has the effect of benefiting commanders, analysts and warfighters alike, according to Billy Sokol, manager of global C5ISR at MarkLogic.

“Open source data can be used to provide feedback and commander’s intent, and to tighten up targeting,” he said. “In other words, how was an operation carried out and did it have its desired effects? For analysts, adding OSint to the common intelligence picture is key to providing an intelligence product that is more meaningful. If you want to reduce the amount of kinetic warfare, you are going to need better a better picture of what is happening on the ground. For the warfighter, because the battlefield is not well delineated, understanding the human terrain is key to interpreting and executing command orders.”

Five Innovations

Five technology innovations in recent years are making OSint integration more and more plausible, according to Smith. These include the inexpensive storage of large amounts of data that has been deployed to tackle other big-data problems, as well as the related development of cloud computing infrastructures that accommodate the data analytics.

“Natural language processing is another important innovation,” said Smith. “These systems are becoming more and more accurate, and they get better by learning about the language they are analyzing as they go along.”

Visualization tools have also made strides in recent times. “Data is useless unless you can turn it into information,” said Smith. “Visualization software is making that happen by being able to understand and characterize large data sets.”

Finally, predictive analytics tools are able to make probabilistic determinations about what will happen in the future based on the analysis of the incomplete data sets they have access to.

The goal is to get systems to identify open source content that is relevant to policies, decisions and missions against a very noisy background. “At the most basic level, it comes down to trying to plot the information on a map and correlate it geospatially against other data layers, whether imagery or video,” said Bowersox. “Beyond that, there are tools that index and catalog search terms and key works that might be indicative of certain situations.”

Northrop Grumman is repurposing some existing tools for use in analyzing open source material. So, for example, a tool used to analyze video streamed from a Global Hawk has been adapted to analyze YouTube videos.

“It is a robust tool that we have tweaked for use somewhere else,” said Love. “We are also working on integrating best-of-breed tools such as those for sentiment analysis and link analysis, and getting them to talk to each other and share data effectively.”

Open source intelligence requires that analysts deal with human language, which entails an inherent level of ambiguity. “Instead of trying to derive precise intelligence from these sources, it is smarter to look at the information in a probabilistic manner,” said Raven. “This can involve correlating actions reported in open sources with known activities. Another method is to infer broadly about moods, sentiments, and the likelihood of certain activities occurring by aggregating open source content geospatially.”

This type of information, together with data from traditional intelligence sources, can contribute to a new real-time paradigm known as activity-based intelligence. Applying cloud-based infrastructures on the backend, Manzo suggested, will yield two results: Analysts will be able to absorb intelligence in real time, and intelligence users will have a greater opportunity to gain federated access across all forms of intelligence.

Such an intelligence model also transforms the jobs of analysts and makes them much more efficient and productive. “This allows them to do more analyzing because they will be spending less time finding relevant data,” said Manzo.

Geospatial Context

The technology required to incorporate social media data and other OSint into a multi-INT picture largely exists and is already being used in the commercial world. Retailers, for example, analyze social media feeds in order to understand the needs and wants of their customers at the aggregate level as well as at the individual level. But there are policy considerations that prevent military and intelligence organizations to apply the same technologies in the same manner.

“We are scraping tens of millions of sources in real-time and near real-time, including Twitter, YouTube, Facebook and many others,” said Brent Bursey, chief executive officer of Great-Circle Technologies, which specializes in the design, development and deployment of multi-INT enabled GEOINT solutions. “We have automated natural language processing and text analytics that are applied to each source in the native language. We support 30 different languages today, but have 190 different languages and dialects on our roadmap.

“We are extracting people, places and things associated with activities characterized as topics of interest. We link common concepts across multiple languages to aggregate relevant sources and conversations to describe situations or answer business-centric questions,” Bursey added.

Great-Circle Technologies can exploit the geospatial context semantically from social media sources and conflate those with more simplistic geospatial references, such as GPS-derived latitude and longitude, Bursey explained.

The company is currently exploiting a universe of structured, unstructured, and semi-structured sources that is approaching ten petabytes, Bursey added. It also has plans to integrate a wider variety of big data sensor streams that will increase this volume by an order of magnitude.

Great-Circle has also proposed establishing a global multi-lingual and multi-source commercial service that exploits data but delivers it via a fee-for-service model that data answers specific customer questions. The continued incorporation of OSint into multi-INT platforms could stimulate the growth of intelligence as a service in the military and national security realms, much as it is starting to take off in the commercial sector, because the data itself is inherently unclassified.

“That is where I think we are headed,” said BAE Systems’ Jungck. “Military and intelligence organizations are going to want to pay for intelligence, rather than data and systems. Our commercial customers want to be provided threat reports. They don’t need to take possession of the data or the analysis.

“Intelligence organizations, some of them from the U.S. government, are interested in exploring this model as well,” Jungck added. “The beauty of open source is that you are not using government systems.” ♦

Last modified on Thursday, 04 September 2014 10:32

Additional Info

  • Issue: 5
  • Volume: 12
back to top