Dark N’ Dirty Data: How Poor Data Quality Is Silently Sabotaging Supply Chains and AI
- Laura V. Garcia
- Aug 27, 2025
- 6 min read
Updated: Sep 9, 2025
Data: It’s your most valuable asset, and likely, your dirtiest secret.
It’s not Beyoncé, women, or spreadsheets that run the world; it’s data—and most of it is dark and dirty, straining supply chains and eroding the health of our planet.
Simply put, most companies are sitting on piles of unused “dark” data, the data they do use isn’t clean enough, and the bill for that mess is hefty, cutting across industries without prejudice, including cybersecurity and healthcare. In 2016, IBM estimated that bad data was costing the U.S. economy $3.1 trillion annually, with dirty and unused ("dark") data causing excess energy use and environmental harm.
So where are we now? Keep reading for more on:
The Pervasiveness of Dirty N’ Dark Data
While awareness has built, dark and dirty data have proven to remain a pervasive issue, a key roadblock to advancement. In supply chain management, poor data quality drives up operational costs, skews forecasts, strains customer relationships, and creates compliance risks. On the procurement side, even small errors—such as misspelled vendor names, incomplete product codes, or inconsistent units of measure—can snowball into bad decisions and costly losses.
As companies focus on building resilient supply chains, data quality remains a weak link: according to McKinsey, just over half of supply-chain leaders rate their master data quality as “sufficient” or “high,” suggesting many organizations still have room to improve their data collection and management processes.
Below are some numbers that may serve as a wake-up call or help you structure a business case. But first, for the sake of SEO and those less familiar, let’s cover the basics.
Hold Up: What is Dirty Data and Why Should I Care?
The old coinage “Dirty data” refers to data that is incomplete, outdated, inconsistent, duplicated, miscoded, or otherwise inaccurate. In practice, this looks like mismatched customer records, duplicate SKUs or suppliers, or missing product attributes. While the symptom may seem banal, the ramifications can be brutal.
What Dirty Data Really Triggers
Bad joins: When supplier IDs or names don’t match across systems, spend gets split or duplicated, making category analysis unreliable.
Wrong totals: Duplicate SKUs or miscoded units of measure inflate or undercount inventory, throwing off demand and supply balances.
Spurious alerts: Inaccurate lead times or missing shipment updates can trigger false stock-out or delay warnings, overwhelming teams with noise and hiding the real risks.
Biased models: Forecasting and AI tools trained on incomplete or skewed data learn the wrong patterns, leading to bad predictions, distorted pricing, or unfair supplier scoring.
Garbage data doesn’t just mispredict or lead to minor hiccups; it can trigger real financial fallout and investor panic. Case in point: In Q1 2022, for Unity Technologies, bad data from a major customer corrupted their Audience Pinpoint ML tool. Rebuilding, retraining, and launch delays cost the company roughly $110 million, followed by a 37% drop in share price and bad press regarding the loss of stockholder confidence.
What About Dark Data?
Dark data is information you collect but rarely use.
Gartner describes dark data as information assets that organizations collect, process, and store during regular business activities, but generally fail to use for other purposes, such as analytics, business relationships, and direct monetizing. Think: notes fields, PDFs, logs, images, call transcripts, or survey text that sits in silos without context or metadata.
The scale of the problem is significant: 55 percent of companies say half or more of their data is dark, and one-third report that 75 percent or more goes unused, according to a global survey by Splunk.
The High Costs of Dark N’ Dirty Data
Bad data silently siphons budgets and brainpower, stalling true innovation, while dark data goes underutilized, clogging storage, adding compliance risk, and in some cases costing millions in wasted cloud capacity. Worse, the data that does get used often lacks the surrounding context—leaving AI models, dashboards, and analysts with critical blind spots.
Financial Consequences
According to earlier Experian research, dirty data costs the average business 15 to 25 percent of its annual revenue, underscoring the long-standing nature of the problem (Experian’s 2015 U.S. Data Quality Benchmark Report).
More recently, a 2024 Cleo study found that 25% of supply chain firms are losing more than $500,000 annually due to poor data integration issues, causing inefficiencies and increased costs (Vorro).
More recently, a 2023 Cleo study found that 26% of companies are losing more than $500,000 annually due to poor data integration issues, causing inefficiencies and increased costs (Cleo, 2023).
Unity Technologies lost $110 million when a single bad machine learning input corrupted its Audience Pinpoint tool, triggering delays and a 37 percent share price drop (Monte Carlo Data).
Verizon Wireless agreed to pay $25 million to the U.S. government—along with at least $52.8 million in refunds—to settle complaints over “mystery” data fees charged to customers, the Federal Communications Commission said (Reuters, 2010).
Operational Consequences
Dirty data can bias AI models, reduce predictive accuracy, and create regulatory or reputational exposure, costing organizations $12.9–$15 million annually (Gartner via Axonius).
Employee time lost: Workers spend up to 27% of their time fixing data errors, rather than creating value (Actian, 2024).
Poor integration costs: Misaligned enterprise systems drive productivity loss, frustration, and inflated operating expenses (CEO Review, 2024).
Inventory management inefficiencies: Inaccurate or inconsistent data inflates costs and disrupts supply-demand balance (GeoPostcodes).
Strategic Consequences
AI project failure rates: 85% of AI initiatives fail due to poor data quality or lack of relevant data (Gartner via Forbes, 2024).
Transformation setbacks: Approximately 30% of generative AI projects will be abandoned after proof of concept by the end of 2025, primarily due to poor data quality and other risks (Gartner, 2024).
Competitive drag: Research shows companies with mature data governance outperform peers in agility, resilience, and customer trust (MIT Sloan Management Review, 2020).
Cross-Industry Reality
Dark and dirty data are no respecters of industry. From manufacturing floors to global logistics hubs, from retail supply chains to financial systems, the consequences are the same: wasted effort, hidden costs, flawed decisions, and strategic blind spots. It touches every function, every process, and decision, undermining AI, cybersecurity, marketing, and operations:
Bad data costs an estimated $600 billion annually in procurement, logistics, and supply chains (TDWI via Rosslyn AI).
Organizations spend up to 30% of their time interpreting unclear data, leading to flawed campaigns and lost customer trust (Advertising Week).
AI and Cybersecurity: Only 25% of cybersecurity leaders fully trust the data in their security tools (Axonius Press Release).
Data Centers and Operations: Bad data drives inefficiencies and energy waste, costing the U.S. economy trillions (IBM via HBR, 2016)
What’s less quantifiable but still resilience-decaying? The strategic drag of poor-quality data: teams waste cycles debating “which number is right,” leaders hesitate to act, and collaboration with external partners erodes. Dirty data doesn’t just slow decisions—it fosters second-guessing and defensive postures, the opposite of resilience.
The solution is practical but often overlooked: assign clear ownership, establish realistic standards, continuously monitor, and promote disciplined, organization-wide data practices. As Saul Judah, Research Director at Gartner, explains, “…organizations continue to struggle with securing resources to improve the quality of their critical data. Often this is because they are unable to effectively communicate what it is that is actually broken, to the people who should care and are able to help them.” Without accountable leadership and actionable processes, poor data quality is likely to persist—undermining digital transformation and slowing AI adoption.
Where This Leaves Us
The evidence is overwhelming: dark and dirty data carry real costs. Financially, companies are losing billions in wasted spend, inefficiency, and fines. Operationally, they’re drowning in rework, inflated costs, and employee frustration. Strategically, they’re stalled by hesitation and mistrust.
The good news is that the problem is solvable. Organizations that invest in clean, governed, well-structured data consistently report stronger AI adoption, better scenario planning, and greater resilience across their value chains.
Coming Up in Part 2
Next, we’ll move from the problem to the fix, showing real-world examples of what good data management looks like. From a mid-sized manufacturer cutting duplicate records by 60% to a global logistics firm improving email deliverability by 35%, Part 2 will show actionable frameworks for companies to:
In Part 2, we’ll break down exactly how companies:
Test and measure their data quality without drowning in spreadsheets
Build cleanup and governance practices that actually stick
Embed resilience into systems so the next disruption doesn’t grind everything to a halt
Because let’s face it: you can’t build the supply chains of tomorrow on yesterday’s mess.
Comments