Get Power Over Your Data By Managing Unstructured Data

  •  Enterprise Class Data Governance Platform
  •  Purpose-built, high-speed indexing for global data centers
  • Scalable platform that supports petabytes of unstructured files and email
  • Only solution to support both primary and backup data sources
  • Find, manage and govern data based on policies

Case Studies

Large scale network data indexing, ROT cleanup (EY/Qualcomm)

Index Engines and a global advisory recently completed an exploratory project with a global technology company based in California. The client wanted to perform an assessment on 2.5PB’s of their network data before they kicked off a project for all of their data which currently is over 60PB’s.  The infrastructure setup and indexing was completed in 5 weeks by 6 virtual servers, each running multiple indexing jobs at the light metadata level, and performed on an average of 100TB of indexing per day.

Prior to onset of the project the client estimated they had approximately 500 million files on the targeted shares; in reality over 11.5 Billion files were discovered and indexed during the project.

The resulting index enabled the partner to review and analyze the metadata and to make several determinations regarding ROT (Redundant, Obsolete and Trivial) data.  Specifically, on one server it was determined that 60% of the data was actually temp files, each approximately 1GB in size, which were created by an application and never purged.  From this exercise the client was able to purge all ROT data and reclaim close to 1PB of storage.

The client has since referred to this project as the single, successful IT innovation in the CIO’s tenure in the organization.

Finding data of value on the network (EY/Monster)

Index Engines and a global advisory successfully completed an initial project with a subsidiary of a market leading global beverage company. It consisted of approximately 50 TBs of uncompressed data, indexed at the deep metadata level, and was completed with 4 virtual servers in 3 countries in approximately 2 weeks.

The resulting metadata index allowed the partner to work with the client to determine that out of a starting point of 54 million files, approximately 20% of this data was redundant and could be purged, reclaiming 10TB’s of capacity. In addition, they were able to segregate out an additional 14% of data deemed critical for the Board of Directors, Sales and Finance departments.

The client and advisory are working towards the next phase, indexing and analysis of 750TBs, and a full Information Governance initiative in 2017.

Intermixed data on backup tapes (KPMG)

Index Engines worked with a Big 4 audit firm and a multi-national IT Services provider to process over 1,900 LTO tapes created with Backup Exec and 100TB’s of network data. 

The IT service provider was managing servers and backups for multiple clients and a change in their business model required them to separate and return data to each client.  The IE software was used to segregate data that was intermixed on the tapes to pull complete datasets for each entity. 

IE provided access to each type of data stored on the tapes in a different manner: unlocked full text content on all email, metadata only on files and pulling SQL backups without indexing.  The clients now have their data back and the ability to manage it going forward.

Consolidation and retirement of multiple legacy backup applications and NDMP servers (Citi)

Index Engines, with a technology partner is providing access and management of multiple legacy backup systems for a leading multinational investment banking and financial services corporation.

The client has data centers around the globe and they require long term access to their legacy backup data written by BackupExec, HPDP, ArcServe, TSM and Networker.  They also utilized NDMP for some of their backups and were forced to maintain older NetApp servers for the only purpose of being a restore target.  They found the cost to maintain support for out of warranty hardware was growing out of control and couldn’t be justified for the single value it provided.

The client used IE software and services to ingest catalog information from a portion of their backup servers and now uses IE to access and extract any data they need from those legacy environments.  For their other backup servers they created regional restore centers worldwide which enabled them to retire their older backup applications and to stop paying annual maintenance contracts.  All restores for their NDMP backup data is now written to standard, low cost storage.

Steady state archive (ADP)

Index Engines is providing a legal hold archive solution for a US-based provider of human resources management software and services.

The project started because the client realized that they had lost control of the over 1PB of email legal hold data they had stored.  The main issues were the complexity to restore and the growing cost of storage to hold backups on legal hold.  This issue was compounded because the client was forced, due to the legal hold process, to continue to save full Exchange databases when a legal hold request was made, resulting in over preservation of email data.

They initially utilized IE to capture their existing legal hold data and store that in an indexed and searchable IE archive.  Now they currently employ IE software to constantly index backups of email data stored on dedupe appliances to find and extract relevant data to the archive and stop their over preservation process.  As new custodians get added the client uses IE to gather relevant data and move that to the archive as well.  The net result was that the client was able to reduce 1PB of legal hold data down to a manageable 20TB’s.

Large scale management of legacy catalogs (Barclays)

Index Engines, with a technology partner, is providing multi-year access for a massive legacy backup catalog for an industry leading global banking and financial services company.

The client utilized TSM for decades and created over 400,000 tapes that they are required to maintain access.  They are interested in transitioning from TSM to a new DR solution due to the high cost for support renewals from IBM and the pains they had to endure every time they were required to find and retrieve data from their multi-version TSM environment.

Before deploying the IE solution the software was run through comprehensive penetration tests performed by their operations and security team and passed at the highest level recognized by the client.  After testing was completed the ingestion of the catalogs was started and now the client uses IE for any access and restores for their old tapes.  The client is also working through their catalog reports to build a plan to migrate data relevant to their long term retention policies off of tape to disk or cloud storage.

older backup applications and to stop paying annual maintenance contracts.  All restores for their NDMP backup data is now written to standard, low cost storage.

Build archive for E-discovery; custom development (Progressive)

Index Engines processed 3,000 high capacity tapes for a leading vehicle insurance company based in the US to meet their e-discovery access and retention requirements.

The client’s internal team was not meeting expectations for data retrieval for e-discovery requests and the time delays were costing them extra money; especially when they would eventually send tapes to an external services organization to extract the necessary data.

The client tasked IE with cataloging and indexing the tapes and extracting relevant data for 8,000 custodians to an indexed and searchable archive.  The client’s archive is hosted by IE and is available for any discovery requests by the client or their legal partners.  During the initial project scoping IE saw that the client used a special format (DAOS) for their Lotus Notes attachments and IE developed new functionality to address that data – something the client stated no other companies would do for them.

IE was their only option (TIAA)

Index Engines provided a solution to retire a legacy TSM environment for a Fortune 100 financial services organization.

The client needed access and management of legacy TSM backup data written to over 100,000 tapes.  The key driver for the client was their e-discovery requirements as they have over 100 active legal hold situations at any time.  After reviewing the support contracts from IBM to maintain the legacy systems it became clear that the costs would be over $1M on an annual basis.  The client added a specific requirement in the RFP they released to evaluate new data protection solutions mandating that any responses had to include a comprehensive solution to manage their legacy backups and allow them to disconnect from IBM.  IE was the only solution that allowed them to maintain access after they retired TSM which made their transition to a new DR solution possible.

As the IE solution was ingesting the legacy catalogs, the client also engaged with IE to preserve and manage data that already had been put on legal hold.  Next phase of the project will be to analyze the legacy catalogs to build a plan to extract only the data of value off the tapes to an online, indexed repository.

Valuable data insight from a catalog ingestion (Zurich Insurance Group)

Index Engines is providing valuable data for a global insurance company headquartered in Europe by ingesting legacy TSM catalogs.

After an initial ingestion project on a subset of their TSM servers, the client gained a much clearer picture of their data stored on the tape.  With TSM they lacked the ability to collect and report on the data from multiple servers.  They went through the IE reports and quickly identified 70% of the data that did not need to be retained, such as old hosts that aren’t running anymore and useless transaction logs.

Their focus has now shifted to their retention requirements, specifically for their Lotus Notes emails, and they plan to touch only the relevant tapes based on the IE catalog reports to migrate data for long term retention.

Unlocking data trapped in backup formats (Saipem)

An Index Engines solution was purchased by a global oil and gas industry contractor to unlock their backup data and make it more accessible for e-discovery and other governance requests.

The client is using CV backup software in an NDMP environment and they determined that the access time to find and retrieve data from their current system was not acceptable.  Over the years they created over 1,000 tapes and have to go back to those tapes on a regular basis to extract emails and files to fulfill request from legal and other departments. 

IE indexed the data on the tapes and now the client can search and restore what they need using metadata or full content (key word) searching in a fraction of the time compared to the CV system.  The IE system is used on an ongoing basis to process and archive legal hold data and eliminates the need for long-term tape storage.

You must be designed for scalability (Occidental Petroleum)

Index Engines provided a network indexing solution for a multinational oil and gas exploration and production company.

The client has a key datacenter in TX and they were having trouble with another indexing product, StoredIQ, to build an index for 150TB’s of important data.  The client started using the StoredIQ product and saw that performance would drop off dramatically once it reached the 10TB mark.  Due to this scalability issue, the client was looking at installing 15 or more servers just for this project.  To make matters worse they have another 500TB’s of data in other datacenters across the globe.

After a thorough evaluation of the IE solution, which performed at a rate of 5 times faster than their current solution, the client purchased the software and installed it on one virtual server to index the complete 150TB’s.  They are using the software to identify and remove ROT data and, due to regulations in their industry that require more granular data searching and extraction, they also purchased full text content unlocking capability for 13,000 users.  They plan to install IE in their other datacenters and will manage it as a single federated application.

Phase 1 – Catalog/ Phase 2 – Migrate LTR data to tape or cloud (Citadel)

Index Engines is providing a comprehensive solution for a US-based global financial institution to manage their legacy backup data to support legal and compliance requirements.

The client has been tasked by regulators to provide data on multiple occasions and have incurred significant fines for slow or incomplete responses.   Most of their legacy data is locked in over 20,000 NBU tapes and they were never able to build a reliable process to find and extract data. 

IE installed the software and ingested the NBU catalogs in 2 weeks and then ingested catalog information for an old set of 900 Networker tapes.  The client did maintain their servers and backup software at the most recent versions which enabled the IE software to work as quickly as it did.  The client is now analyzing the catalog reports to determine what subset of their tapes has data they need to retain – they want to extract the LTR data to object storage and then destroy the remaining tapes.

Hunting for PII (Rabo AgriFinance)

Index Engines provided a network indexing solution for a leading financial services and lending organization to identify and manage PII data.

The client was aware that there was PII data on their network that was not being managed properly but they were unable to find it using operating system tools and other utilities they had.  The IT and compliance teams were taken to task by their Board and were instructed to resolve the issue quickly.

They brought in IE to install the software and to index a 10TB file share and SharePoint section of their network.  After the first pass of full content indexing the client found over 5,000 instances of PII in the shares; a much higher number than they originally expected.  After getting the word out to their user community regarding proper PII management they ran another index and saw that the total was reduced to under 2,000.  After multiple indexing runs they whittled the number down to single digits and continue to strive for that goal each day.  This use case is being evaluated for their other datacenters across the globe.

The 4.9PB problem (Amerisourcebergen)

Index Engines is providing a solution for a US-based pharmaceutical wholesale company for their overgrown backup data problem.

The client had been using TSM to back up their data, including a 30TB Exchange environment, and they adopted a daily full backup process.  That process was used because they were sending the backups to multiple Data Domain which would reduce the back-end storage but they still were continually adding more DD boxes to keep up with the data.  The issue became urgent after IBM told them that their annual renewal would be based on their back-end data which grew to 4.9PB and their new annual bill would be well over $1M.

Since they are in the pharma business, which is heavily regulated, they must keep data for many years so they brought in IE to handle their legacy data in a different manner.  After analyzing the data IE determined that they could index their weekly full backups, plus data in the delete pool , which reduced the scope of the indexing to 700TB.   The data was processed as a network data indexing job since the backups were stored on DD’s and the unique data of value was migrated off to ECS storage and remains indexed and searchable.

First the tapes, then the network (Cincinnati Children’s Hospital)

Index Engines is providing tape and network indexing solutions for a pediatric hospital in the US.

The hospital selected Dell EMC to upgrade all of their IT infrastructure including their data protection solution.  Before the selection was finalized the hospital requested a plan for LTR management and access to their soon to be legacy NBU environment and the 17,000 tapes they created.

IE software was brought in to ingest and manage the NBU catalogs for the next 4 years.  The client was then able to disconnect completely from Veritas and NBU.  During the next year the client will also analyze their backups and implement a plan to extract data of value off the tapes.  Another stakeholder inside the hospital had a need to find loose files on their 100TB network for a legal and compliance requests.  The IT team was already knowledgeable on the IE technology and implemented it to index and manage their network data as well.

Migrating datacenters (Enel)

Index Engines enabled a European multinational manufacturer and distributor of electricity and gas to capture and move legacy backup data in support of a data center consolidation and migration.

The client was set to move their IT assets from an IBM datacenter in one country to their HQ datacenter in another country.  Before the plan was about to start they realized they had not accounted for the backups created by TSM and NBU and stored on Data Domain as 30,000 virtual tapes.  They needed a solution to capture the legacy catalog information, which was all held by IBM, so they can continue to access and manage the backups on the Data Domain after they were moved to their HQ.

The client engaged with IE to start by ingesting the TSM catalogs and indexing the backups to find and extract LTR data to disk.  The project was run as an Assurance program where IE remotely managed the project daily through its completion.  The indexing of the NBU data will start once they take it out of the daily backup rotation.


Index Engines, with a leading global services organization, successfully completed a massive legacy tape remediation project to support an LTR initiative inside one of the top financial services firms in the world.

The client had over 200,000 ArcServe, TSM, NBU and BackupExec tapes and needed emails for 120,000 custodians extracted to an archive.  A tape processing facility was assembled which consisted of 30 server/library pairs working in a federation plus storage and networking capacity to handle the indexing and data movement.  The search query to find responsive data required over 1.6M lines of code.