Supply Chains, AI, and Dirty Data: Protecting Profits with Better Data Quality
- Laura V. Garcia
- Sep 5, 2025
- 7 min read
How to test, govern, and fix dirty data
Poor data quality has evolved into a full-blown crisis.
Let’s be honest, nobody is shocked. Those in it have learned by experience that supply chains are more often than not reactive; strategies and initiatives defined by the latest and biggest threat. For the majority of companies, addressing data quality remains on a long list of to-dos, deprioritized in favor of seemingly more pressing concerns. That is, until a looming AI rollout or a high-profile initiative exposes hidden flaws and forces data quality, temporarily anyway, back to the top of the list. What ensues is a reactive scramble to clean up data, often relying on manual methods or patchwork fixes that solve the immediate need but not the root cause
Meanwhile, uncountable inefficiencies due to bad data slowly drain millions, quietly pilfering profits while stakeholder heads swivel yet again.
It’s a cycle of neglect and lack of governance that explains why businesses struggle to maintain consistently clean, reliable data.
Your company—and its initiatives—are only as solid as the foundation they are laid on. Many have poured millions into AI pilots that never made it past proof of concept because their foundation — the data — was riddled with errors, silos, and gaps.
Read on to find:
A Mini Business Case for Data Maturity Governance:
Here’s what the numbers say about just how much dirty data is costing businesses.
Revenue depletion: As we mentioned in Part 1, earlier Experian research found dirty data can erode 15 to 25 percent of revenue across industries (2015 U.S. Data Quality Benchmark Report).
The likelihood of bad data: A 2025 Ketch study found that 215 billion “unpermissioned events” (data collected without proper consent) occur each month across 134 major U.S. sites. When data is collected without proper consent, it creates “dirty data” that flows directly into AI systems.
Although the numbers make the financial hit clear, the danger runs deeper: dirty data doesn’t just drain profits, it actively blocks digital transformation and strategic growth, delaying market entry and eroding first-mover advantage. In its 2024 CDAO Agenda Survey, Gartner reported that while 89 percent of respondents said effective data and analytics governance is essential for innovation, only 48 percent had consistent policies across their data, analytics, and AI assets.
What's more, companies trying to diversify suppliers (onshoring, nearshoring, multi-sourcing) rely on accurate vendor master data and consistent specs. Dirty or siloed data makes it impossible to qualify or onboard new partners at speed. When supplier records are fragmented or inconsistent, diversification stalls—the very resilience boards are rightfully demanding in today’s, shall we say, “dynamic” climate.
Why Dirty Data Is So Dangerous
Dirty data blocks digital transformation on multiple fronts: compliance, cost, and competitiveness. One Precisely–LeBow report found that 77 percent of organizations say their analytics are hindered by poor data quality, while 56 percent believe immediate performance gains would follow from better quality. Worse, 70 percent still rely on manual processes to find and fix problems, making scale nearly impossible.
And the gaps aren’t just technical—they’re structural. According to Precisely–LeBow, only 42 percent of organizations have a centralized data-quality function, while 62 percent admit the burden falls ad hoc to analytics teams; this fragmented ownership is why bad data keeps slipping through the cracks.
Where It Hides in Supply Chains
Vendor master records (duplicates, outdated contacts)
BOMs and product specs (missing, inconsistent)
Pricing and lead time data (unsynchronized across systems)
Inventory snapshots (wrong counts, wrong locations)
Notes, logs, and PDFs (dark data with no structure)
If I were a betting woman, I would bet your organization has data silos. Which means you’ve got dirt; and with 38 percent of organizations reporting data downtime and each incident carrying that 15 percent median revenue hit we mentioned, the costs pile up fast.
What Data Maturity Looks Like
These case studies show how organizations that clean up data don’t just avoid losses, they unlock growth, agility, and even sustainability:
Company | Issue | Fix | Results | Timeline/ Cost |
|---|---|---|---|---|
Duplicate and mismatched CRM/ERP records. | Automated de-duplication and CRM–ERP integration. | 60% fewer duplicates, 50% better forecasting accuracy. | ~18 months; approx. $4M investment. | |
Outdated contact data derailed campaigns, ROI loss. | Cleansing & enrichment, automated validation tools. | 35% better email deliverability, 15% more closed deals. | ~9 months; approx. $2M investment. | |
Duplicate customer profiles and inconsistent data across in-store, e-commerce, and loyalty systems. | Implemented Stibo Systems MDM to consolidate data, automate cleansing, and enforce governance. | Customer retention up 20%, delivery errors down 90%, marketing conversion up 25% | Multi-phase enterprise rollout; cost not publicly specified. | |
Mixed progress in operationalizing ESG data, reporting complexity, and data quality challenges. | Investments in talent, tools for data management, standardized reporting processes, and ESG assurance adoption. | Leaders showed 60–80% greater maturity scores, stronger market positioning, and risk management. | 2025 survey of 1,320 executives/board members; reflects multi-year corporate journeys |
Governance Gaps
Strong data governance is essential for scaling AI; a fact that few dispute but most fail to action. Gartner’s 2024 CDAO Agenda Survey found that while 89 percent of respondents say effective governance is critical for innovation, only 48 percent have consistent policies across data, analytics, and AI assets. That disconnect highlights the gap in organizational attention and resources devoted to proper data management.
Precisely says that organizations that invested in data governance programs report benefiting from improved data quality (58 percent), improved quality of data analytics and insights (58 percent ), increased collaboration (57 percent), increased regulatory compliance (50 percent), and faster access to relevant data (36 percent).
Is Your Data Ready for AI?
Plenty of executives want to “do AI” tomorrow, but if your data foundation is weak, you’re not ready.
In Are You Ready For AI, Kearney suggests boards and CXOs test readiness across four dimensions: Think, Build, Scale, and Govern. To align with this framework, your data must play a central role. In practice, this means setting a clear AI vision tied to business value, ensuring your data and tech stack can actually support it, retooling operating models so AI doesn’t stay siloed in pilots, and putting ethics and guardrails in place from day one. Many organizations skip one or more of these steps, which is why AI programs stall or never scale.

Ketch, on the other hand, emphasizes the privacy side. Their three hygiene moves (stopping projects powered by dirty data, auditing consent tools, and putting the CTO in charge of privacy enforcement) highlight how compliance and trust are just as critical as technical capability. This is why the data itself must be clean and consent-ready. Without it, companies risk building AI on sand: eroding customer confidence, inviting regulatory penalties, and ultimately corrupting the very models they're trying to scale.
Together, these frameworks underscore that AI readiness isn’t only about the tech. It’s about vision, infrastructure, discipline—and gaining reliable, authorized data you can actually use, without risking blowback.
How to Test and Fix Your Data
Diagnostic Steps to Assess Data Health:
Validate source formats and values—ensure data matches expected patterns.
Check transformations (ETL/ELT processes) for errors or unintended changes.
Run integrity checks across datasets for consistency.
Test for completeness, duplicates, and boundary cases (extremes or outliers).
Automate monitoring—set up alerts to flag emerging issues.
Track KPIs like accuracy, completeness, timeliness, validity, and consistency.
Fixing the Foundation: Gartner’s 12 Actions
A cleanup isn’t a one-off project — it’s an operating discipline. Gartner recommends 12 specific actions to improve data quality. Here they are, condensed into a practical checklist:
Identify critical data — Focus on data tied directly to business outcomes and key performance indicators.
Define shared standards — Establish common language, formats, and definitions across business units.
Build the business case — Show how poor data impacts revenue, compliance, and risk to secure investment.
Profile your data — Analyze datasets to uncover anomalies, duplicates, gaps, and inconsistencies.
Prioritize issues — Rank problems by business impact so resources go where they matter most.
Develop improvement plans — Create step-by-step remediation strategies with timelines and ownership.
Assign stewardship — Appoint data stewards from both IT and business functions to ensure accountability.
Form governance groups — Cross-functional teams should set policies, monitor compliance, and resolve conflicts.
Automate monitoring — Use tools to flag quality issues early and continuously, reducing manual effort.
Integrate controls into workflows — Bake quality checks into day-to-day processes, not just audits.
Raise data literacy — Train employees on why data quality matters and how they can contribute.
Track lineage and impact — Map where data originates, how it changes, and who uses it to improve trust.
FAQs
Q: What’s “good enough” data quality for AI?
A: Enough accuracy and consistency that forecasts and recommendations are reliable — typically 95 percent accuracy or higher in key datasets.
Q: How do we prove ROI on cleanup?
A: Track metrics like duplicate reduction, forecast accuracy, order cycle times, and campaign response rates before and after cleanup.
Q: Is dark data the same as unstructured data?
A: Not quite. Dark data is collected but unused. It often is unstructured (emails, logs, PDFs), but the key is that it sits idle and untapped for decision-making.
Q: Does data privacy really impact AI quality?
Yes. A 2025 Ketch study found that 88 percent of companies fail to fully honor user opt-outs, meaning their AI is often trained on unpermissioned data. When data is collected without proper consent, it creates dirty data that flows directly into AI systems.
"AI's effectiveness is only as strong as the data it learns from," says Vivek Vaidya, Ketch Co-founder and CTO. "When businesses unknowingly feed their AI models with data collected without proper consent, they risk corrupting their entire AI operation–undermining insights, eroding customer trust, and exposing themselves to significant regulatory risks.”
Q: What’s the board’s role in data quality?
A: Treating data quality as a governance and risk issue, not an IT task. Boards should ensure accountability, resource allocation, and link data governance to strategy.
The Last Word
Dirty data shouldn’t be defined as an IT task, but as a board-level risk and priority.
The longer you ignore it, the longer you continue to leak profits and hinder progress. Address it, and you don’t just get cleaner, more actionable dashboards that build resilience — you set the stage for AI-powered supply chains that 10X progress and drive real competitive edge.
Comments