Dark Data Explained

Ahmed Banafa 01/04/2021 3

According to Gartner, dark data is data which is acquired through information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).

Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.

Dark data is a type of unstructured, un-tagged and untapped data that is found in data repositories and has not been analyzed or processed. It is similar to big data which is large and complex unstructured data (images posted on Facebook, email, text messages, GPS signals from mobile phones, tweets, Tick Tok videos, Snaps, Instagram pictures, and other social media updates, etc.) that cannot be processed by traditional database tools, but dark data differs in how it is mostly neglected by business and IT administrators in terms of its value.

Dark data is also known as dusty data.

Dark data is data that is found in log files and data archives stored within large enterprise class data storage locations. It includes all data objects and types that have yet to be analyzed for any business or competitive intelligence or aid in business decision making. Typically, dark data is complex to analyze and stored in locations where analysis is difficult. The overall process can be costly. It also can include data objects that have not been seized by the enterprise or data that are external to the organization, such as data stored by partners or customers.

Up to 90 percent of big data is dark data.

With the growing accumulation of structured, unstructured and semi-structured data in organizations -- increasingly through the adoption of big data applications -- dark data has come specially to denote operational data that is left un-analyzed. Such data is seen as an economic opportunity for companies if they can take advantage of it to drive new revenues or reduce internal costs. Some examples of data that is often left dark include server log files that can give clues to website visitor behavior, customer call detail records that can indicate consumer sentiment and mobile Geo-location data that can reveal traffic patterns to aid in business planning.

Dark data may also be used to describe data that can no longer be accessed because it has been stored on devices that have become obsolete.

Types of Dark Data

1) Data that is not currently being collected.

2) Data that is being collected, but that is difficult to access at the right time and place.

3) Data that is collected and available, but that has not yet been productized, or fully applied.

Dark data, unlike dark matter which is a form of matter thought to account for approximately 85% of the matter and composed of particles that do not absorb, reflect, or emit light, so they cannot be detected by observing electromagnetic radiation, dark data can be brought to light and so can its potential ROI. And what’s more, a simple way of thinking about what to do with the data –- through a cost-benefit analysis –- can remove the complexity surrounding the previously mysterious dark data.

Value of Dark Data

The primary challenge presented by dark data is not just storing it, but determining its real value, if any at all. In fact, much dark data remains un-illuminated because organizations simply don’t know what it contains. Destroying it might be too risky, but analyzing it can be costly. And it’s hard to justify that expense if the potential value of the data is unknown. To determine if their dark data is even worth further analysis, organizations need a means of quickly and cost effectively sorting, structuring, and visualizing it. Important fact in getting a handle on dark data is to understand that it isn’t a one-time event.

The first step to understand the value of dark data is identifying what information is included in your dark data, where it resides, and its current status in terms of accuracy, age, and so on. Getting to this state will require you to:

Analyze the data to understand the basics, such as how much you have, where it resides, and how many types (structured, unstructured, semi-structured) are present.
Categorize the data to begin understanding how much of what types you have, and the general nature of information included in those types, such as format, age, etc.
Classify your information according to what will happen to it next. Will it be archived? Destroyed? Studied further? Once those decisions have been made, you can send your data groups to their various homes to isolate the information that you want to explore further.

Once you’ve identified the relative context for your data groups, now you can focus on the data you think might provide insights. You’ll also have a clearer picture of the full data landscape relative to your organization so that you can set information governance policies that will alleviate the burden of dark data, while also putting it to work.

Future of Dark Data

Startups going after dark data problems are usually not playing in existing markets with customers self-aware of their problems. They are creating new markets by surfacing new kinds of data and creating un-imagined applications with that data. But when they succeed, they become big companies, ironically, with big data problems.

The question many people are asking is: What should be done with dark data? Some say data should never be thrown away, as storage is so cheap, and that data may have a purpose in the future.