Use levels of Data Quality in Data Supply Chains.
Data Quality in Data Supply Chains
A Data Supply Chain describes the series of steps involved in transforming raw data into actionable information. These steps typically include operations such as acquiring and storing data, refining and enriching it, exploring and curating it, and finally distributing and managing it. In a data supply chain, raw “ingredients” are transformed into a “data stew” based on “recipes” developed by data scientists, analysts, and stewards. This stew is further refined and eventually utilized as metrics (or other insights) by senior leadership teams, finance, marketing, and engineers to drive business operations, personalize services, and enhance products.
Given the many dependencies in a data supply chain, various points of failure and risks emerge, including data security, privacy compliance, and adherence to business policies. Some use cases require an extremely high level of rigor (e.g., when reporting numbers to shareholders or calculating employee compensation). Other scenarios, such as certain marketing campaigns, may only need to assess general trends and can tolerate less precision and confidence in the numbers.
In situations that require high confidence, it is prudent for data consumers to formalize their data quality expectations with data producers. Service Level Agreements (SLAs) are commonly used to set expectations, and they can also be applied to data quality. SLAs should only be defined and measured for data that is critical to specific use cases or business scenarios. Investing in data quality for ungoverned or non-critical data is inefficient and wasteful. The cost of data quality investments can be significantly reduced when organizations apply a data quality framework solely to the data critical for their operations.
Use Levels of Data Quality
Data quality cannot be defined by a single, universal standard; it must be aligned with its intended use. In other words, the required level of data quality depends on the criticality and volume of decisions influenced by the data.
Marginal improvements in quality can yield significant business outcomes. Data quality must be measured before it can be improved, and it needs to be controlled to ensure improvements are sustained. Without consistent monitoring, data quality can deteriorate over time.
We can categorize data use into four levels (L0–L3), where the highest level of rigor is expected in L0.
By tailoring data quality expectations to the intended use, organizations can ensure resources are allocated efficiently, focusing on critical areas while maintaining flexibility for experimental or less-critical applications.
Data Quality Dimension categorization
Data Quality literature provides many terms describing features of data that can be measured and assessed against a standard, or an expectation, to quantify and describe the quality of data. DQ Dimensions are the categories or groupings into which data quality measurements are collected and reported. Use Levels provide an understanding of the degree of data quality (for example: 99.5% vs 70%), that is needed by the data to be sufficient for its intended use.
Refer to Larry English, before one can measure and improve information quality, one must be able to define it in ways that are both meaningful and measurable. In one of the Data & AI investments, data quality dimensions have been partitioned into 3 categories labeled: Intuitive, Forewarning and Empirical. Individuals who consume information from reports, dashboards and cubes for decision making, such as Financial Analysts and Senior Leaders, may wish to begin with the 2 dimensions (Accuracy and Timeliness) found in the ‘Intuitive’ category. An understanding of Accuracy and Timeliness are essential characteristics of data that must be understood before the data is put to use.
Recommended by LinkedIn
Those whose function is more closely aligned with data delivery, discovering insights from data and data curation such as Data Scientists, Data Analysts, Developers and Engineers may wish to begin with the 5 dimensions (Completeness, Uniqueness, Validity, Conformity and Precision) found in ‘Empirical’ category. Empirical dimensions are measurable, repeatable and form foundational building blocks used to support data quality claims.
The ‘Forewarning’ Category contains one dimension, Consistency, which is often used as an early warning signal to spot outliers, anomalies, and other unusual conditions.
By categorizing DQ dimensions, organizations can tailor their approach to assessing and improving data quality based on the roles and needs of various stakeholders. This framework enables a more effective and targeted strategy for maintaining high data quality standards.
Summary
Data Quality refers to the ability of data to meet an organization's stated business, system, and technical requirements. In other words, it is the fitness of data for its intended purpose. As we address the challenges of increased reliance on data and the complexities of data processing, it becomes crucial to effectively manage the new risks that arise.
Our data supply chain comprises numerous nodes that combine and refine a wide range of data, including both structured and unstructured streams. Trustworthy data must serve as the foundation for any data-driven initiative. If the data cannot be trusted, the information derived from it cannot be trusted either.
When high confidence is required, it is essential for data consumers to formalize their data quality expectations with data producers. Service Level Agreements (SLAs) are a common and effective method for managing "service" expectations, and this approach can be applied to data quality as well. Organizations should define and measure data quality SLAs for their data supply chains, focusing only on the data that is critical to their use cases and business scenarios. Investing in data quality for ungoverned and non-critical data is inefficient and unnecessary. By applying a data quality framework only to critical data, organizations can significantly reduce the costs associated with data quality investments.
Reference
Thanks for reaffirming the value of data quality and data governance
SVP of Strategic Partnerships at XtremeData
1mowhy not use our data inspection solution to understand every single value of data in your data estate and let the data governance teams distribute the metadata to the appropriate teams. Seems like a lot less steps and data scientists and engineers have all the information they need without having to write any queries. This way you get all 7 dimensions of data quality upfront
Senior Data Analyst @ Infosys | Data Governance | Data Quality SME | Data Privacy | PrivacyOps | MS Purview | AI Governance | Fabric | Alation | GIPE'22
1moVery informative! Thanks for sharing Shafiq Mannan.