Data Quality Assessment, by Arkady Maydanchik
Imagine a group of prehistoric hunters armed with stone-tipped spears. Their primitive weapons made hunting large animals, such as mammoths, dangerous work. Over time, however, a new breed of hunters developed. They would stretch the skin of a previously killed mammoth on the wall and throw their spears, while observing which spear, thrown from which angle and distance, penetrated the skin the best. The data gathered helped them make better spears and develop better hunting strategies.
Initial Data Conversion
System Consolidations
Manual Data Entry
Batch Feeds
Real-Time Interfaces
Data Processing
Data Cleansing
Data Purging
Changes Not Captured
System Upgrades
New Data Uses
Loss of Expertise
Process Automation
Summary
Data Quality Assessment
Data Cleansing
Monitoring Data Integration Interfaces
Ensuring Data Quality in Data Conversion and Consolidation
Building Data Quality Meta Data Warehouse
Summary
Project Team
Project Plan Overview
Planning Phase
Preparation Phase
Implementation Phase
Fine-Tuning Phase
Ongoing Data Quality Monitoring
Summary
Introduction to Attribute Domain Constraints
Attribute Profiling
Optionality Constraints
Attribute Format Constraints
Valid Value Constraints
Precision Constraints
Summary
Relational Data Model Basics
Identity Rules
Reference Rules
Cardinal Rules
Inheritance Rules
Summary
Introduction to Historical Data
Basic Data Quality Rules for Historical Data
Advanced Data Quality Rules for Historical Data
Data Quality Rules for Event Histories
Summary
Introduction to State-Dependent Objects
Identifying State-Dependent Entities
Profiling State-Transition Models
Rules Derived from State-Transition Diagrams
Timeline Constraints
Advanced Rules
Summary
Introduction to Attribute Dependency Rules
Identifying Dependencies through Analysis
Identifying Dependencies through Data Profiling
Identifying Dependencies Across Data Sources
Summary
Project Scope and Rule Design
Selecting Optimal Rule Design
Rule Cataloguing
Rule Coding
Summary
Rule Imperfections
Rule Fine-Tuning Process
Identifying Rule Imperfections
Analyzing Imperfection Patterns
Eliminating False Positives
Handling False Negatives
Handling Uncertainty in Error Location
Summary
Error Catalogue Basics
Recording Missing Records
Errors Affecting Multiple Records
Error Groups
Subject-Level Error Tracking
Error Messages
Summary
Introduction to Aggregate Scores
Score Tabulation Process Overview
Building Score Catalogue
Tabulating Record-Level Scores
Adjusting Scores for Rule Imperfections
Tabulating Subject-Level Scores
Summary
Data Quality Assessment Meta Data
Data Quality Scorecard
Other DQMDW Functions and Reports
Summary
Basics of Recurrent Data Quality Assessment
Data Quality Changes on Atomic Level
Adding Time Dimension to DQMDW
Executing Assessment Runs Against Production Data
Summary
Quality data is the key to any advancement, whether it’s from the Stone Age to the Bronze Age. Or from the Information Age to whatever Age comes next. The success of corporations and government institutions largely depends on the efficiency with which they can collect, organize, and utilize data about products, customers, competitors, and employees. Fortunately, improving your data quality doesn’t have to be such a mammoth task.
DATA QUALITY ASSESSMENT is a must read for anyone who needs to understand, correct, or prevent data quality issues in their organization. Skipping theory and focusing purely on what is practical and what works, this text contains a proven approach to identifying, warehousing, and analyzing data errors – the first step in any data quality program. Master techniques in:
This is one of those books that marks a milestone in the evolution of a discipline. Arkady’s insights and techniques fuel the
transition of data quality management from art to science — from crafting to engineering. From deep experience, with thoughtful structure, and with engaging style Arkady brings the discipline of data quality to practitioners.
David Wells, Director of Education, Data Warehousing Institute
Arkady Maydanchik is a recognized practitioner, author, and educator in the field of data quality and information integration. He is a frequent speaker at conferences and seminars, and teaches data quality courses through the Data Warehousing Institute and through his company, Data Quality Group LLC.
Please complete all fields.