The Interdependence of Data Governance and eDiscovery

Two forces are at work in corporations that have a significant bearing on eDiscovery: volumes of data are increasing at an astonishing rate and emerging eDiscovery technologies are making it possible to glean more intelligence from that data. But how can the two forces be reconciled in a way that is cost effective, is beneficial to the organization overall, is manageable, and ultimately reduces the amount of data within the organization? By developing or improving corporate data governance strategies, recognizing eDiscovery and data governance as symbiotic, and applying the concept of precedence to eDiscovery, corporations will not only cut costs by reducing the quantity of information that moves into the review process, but also save money by decreasing the overall amount of data stored within their organizations.

Overwhelmed by Information

Companies are drowning in information, whether it’s on laptops, smart phones, desktops, servers or the cloud, and it’s only getting worse. According to a GIA report, information is currently multiplying at a rate of 65 percent each year and total data generated worldwide is projected to reach more than three million petabytes by the year 2020.[1]

Much of that data is email, generated by the corporate world. In 2012, the number of business emails sent and received per day totaled 89 billion. This figure is expected to grow at an average annual rate of 13 percent over the next four years, reaching more than 143 billion by year-end 2016.[2]  The average business person sends and receives 110 emails per day and only 25 percent are business-related. This means that 75 percent of emails have no business value and only a fraction would be relevant in a legal matter.[3]

Overwhelmed companies are either taking the path of least resistance and saving everything, or they are cobbling together a business intelligence system and assuming it’s doing its job. They’re typically not sure what they have, where it came from, and if and when it should be disposed of. In addition, there is a lack of consistency or accountability for managing the data.

The Breakdown

This growing mass of data not only affects the organization overall, but poor data management has a significant influence on eDiscovery. As noted by Gartner in their analysis of the eDiscovery software market, eDiscovery is a by-product of a poor information management process.[4]  But investing in tools and technologies to fix the problem without addressing the underlying processes and implementation is a recipe for failure. Too often systems are put in place with good intentions, but a “set it and forget it” mentality doesn’t build in evaluations or audits to measure and improve effectiveness.

In the case of the “save everything” strategy, it’s actually detrimental to lowering the costs and increasing the accuracy of the eDiscovery process. Critical time is wasted wading through bigger piles of useless information that’s potentially obscuring critical information. Inevitably, more information moves to review, which only increases the cost of the discovery process[5].

Not only that, but companies are failing to take advantage of valuable information they uncover during the eDiscovery process. The vetting and classification that’s done within an individual case is not being recorded or applied back to the larger data pool and valuable information is being left on the table. Companies are focusing only on finding the relevant documents that need to be produced instead of looking at discovery as a business intelligence exercise.

A New View

Tackle the Big Picture

To make sense of this unwieldy mass of data for eDiscovery, or any other area of business intelligence, companies need to look at their overall information governance (IG) strategy. In a recent IBM study[6] conducted by Forrester Consulting, Forrester defines IG as “a holistic approach to managing and leveraging information for business benefits, encompassing information quality, standardization, security and privacy, and information life‐cycle management.”

Instituting a comprehensive, actively managed, and intelligent IG process means having less, but more organized information in the company overall, which translates into less data that has to be culled during preservation – and ultimately, less data moving to processing and review.

A subset of this that’s often overlooked and has a significant impact on discovery is the capture and preservation of data and information that “lives” with individual employees. Having an IG process in place should help alleviate the issues of knowledge transfer if an employee leaves the organization.

View Discovery and Data Governance as Symbiotic

At the same time that eDiscovery benefits from a sound IG process, it is also an asset for one. A collection of data for eDiscovery is a sampling of data from across various business units within the enterprise, and as such, there’s a significant amount of value and intelligence that can be gleaned either as the basis for starting an IG process or improving overall IG within an organization.

The work product and analytics from an individual case not only apply to the case at hand, but also have tremendous value across the enterprise and overall IG approach. ARMA (Maturity Model), CGOC, and EDRM’s IGRMare leading the charge in providing guidance, models and a framework for successful IG. Gartner suggests that by 2015, eDiscovery should be a built-in part of an overall information assurance program.[7]

Apply Precedence

Companies also need to shift how they view and approach eDiscovery. In most companies, the goal of eDiscovery is to respond to individual cases, not solve for why eDiscovery is a challenge in the first place.  Cases should not be treated as isolated events, separate from each other and the overall business. Though relevant information – in general – is unique to each case, the data that’s not relevant (junk), or the data that is confidential to the company (sensitive), is often the same from case to case. Yet with each case, this information is being evaluated and reviewed over and over again.

Much like precedence is used to inform case strategy and new rulings are recorded for use in future cases, information gleaned in discovery in one case should be used to inform future cases, or ported back behind the company’s firewall into the overall data management process (if one exists).Whether a company deals with discovery routinely or deals with it occasionally, applying the findings from each case back to the overall data pool will make sourcing and reviewing data for the next case that much more efficient and effective.

Viewing each case as integral to the overall picture helps companies not only locate data for their next cases, but also budget for future discovery, infrastructure, and IT management costs as well as locate information quickly for compliance or HR inquiries and investigations.

Discovering the Details

Classifying Data

Beginning an IG strategy, or improving upon an existing one, starts with data classification. Data classification isn’t a new concept. Companies have had some form of it – whether user-driven or basic file structures – behind the firewall for years. But most companies don’t have effective classifications.

For organizations that have yet to implement technology and analytics for records management, there are two starting points. The first is to start with past discoveries – utilize the classification and work product from past discoveries to assist with laying the groundwork. The second is to start with a small classification project, and build from there.

Classification doesn’t have to start with a laundry list of 150 classifications. It’s actually a better idea to start with a broad stroke, see what turns up, and then classify further from there. Initially, data can be broken down into four super types: relevant, not relevant, sensitive and junk. Classifying sensitive and junk data is the most vital for building a sound IG process.

Classifying the same document over and over again for different cases is redundant and costly, and eliminating data classified as junk can save significant time and money across an organization. As an example, one company was so overwhelmed with data that they were on the verge of purchasing a fourth data storage server at nearly $1 million. Instead, they embarked on a small-scale data remediation project that removed their junk and duplicative data and were able to repurpose one of their three existing data storage servers and avoid spending the $1 million on a new one.

Training the System

It takes time in the beginning to train the system and develop the classifiers, but once the system is trained and has matured, it is automated. Any data that meets those criteria automatically gets classified, eliminating the need for manual classification. When handled as a systematic process that doesn’t require the end user to decide, the validation and testing can be evaluated and adjusted more effectively.

Once a team has classified data that falls into the sensitive and junk buckets, “find more like these” taxonomies can be used to automatically locate and classify any data flowing in. Junk filters can also be used to find out how much data is being stored that has no business relevance, and sensitive filters can help determine the whereabouts of sensitive business information.

A trained system will dramatically reduce the spend on eDiscovery by allowing for targeted preservation and collection of electronically stored information (ESI), which in turn will limit the flow of data moving into the review phase. In theory, information that’s classified as junk within an organization that has a sound IG practice won’t make it into discovery, making it simpler to target relevant information. By isolating junk data within the corporation, lower-end reviewers can be used to cull the junk, so that the best reviewers are reserved for the more important sensitive or relevant information.

Applying Analytics

With a business intelligence system established, the next step would be to use the findings from discovery to refine the overall IG process. Data classified as junk or sensitive should be used to inform and hone the greater data pool by verifying that the documents that came up as junk in eDiscovery are also classified as junk behind the firewall. Companies can also audit prior classifications to not only ensure that filters are accurate, but also that what’s been done prior is accurate.


Organizations that effectively manage their information in advance of discovery pave the way for future discoveries and, ultimately, lower costs. For each document that can be defensibly eliminated from the corporate environment, there’s a significant savings. In addition, there’s a tremendous financial benefit to decreasing the need for more and more expensive storage, year after year.

Gartner Group might have summed up the future of IG and eDiscovery best when they said, “The problem of determining what is relevant from a mass of information will not be solved quickly, but with a clear business driver (e-discovery) and an undeniable return on investment (deleting data that is no longer required for legal or business purposes can save millions of dollars in storage costs) there is hope for the future.”[8]

 Download Article (PDF) | View additional resources | Request a demo


1 Global Industry Analysts, Inc., Network Attached Storage (NAS) Devices: A Global Strategic Business Report, (April 2012)

3 ibid

4 Gartner Group, 2012 Magic Quadrant for E-Discovery Software, (2012)

5 Rand Corporation, Where the Money Goes, (2012)

6 Forrester Consulting, Information Governance: Turning Data Into Business Value, (October 2011)

7 Gartner Group, 2012 Magic Quadrant for E-Discovery Software, (2012)

8 Gartner Group, 2012 Magic Quadrant for E-Discovery Software, (2012)