Integrating Data, by Bill Inmon, Patty Haines, and David Rapien
Overcome the challenges, appreciate the varieties, and apply the process of data integration.
Bill Inmon, the “father of the data warehouse,” has written 60 books published in nine languages. ComputerWorld named Bill one of the ten most influential people in the history of the computer profession.
Inaccuracy of data
Lack of integration
Spider web systems
Reasons for complexity
Transformation of data
Summary
Silos of data
Types of integration
Transforming data
Summary
Components of textual integration
Textual data architecture
Preparing textual data for analytics
Performing analytics on textual data
Summary
Summary
An intersection of data
Universal common connectors
Summary
Step 1: Scope
Step 2: Model
Step 3: Map
Step 4: Create a central pool of shared data
Advantages of the plan
Summary
Step 1: Select the scope
Step 2: Find ontologies/taxonomies
Step 3: Load the taxonomies
Step 4: Ingest raw text
Step 5: Determining analytical processes
An iterative process
Summary
Aim for true data integration
Identify the fans of data integration
Determine the data integration roles
Stress the benefits of data integration
Deploy a reusable process for new sources
Update data often
Define milestones
Summary
Taxonomies and ontologies
The purpose of data models and taxonomies
Data model and taxonomy differences
Summary
Levels of commonality
Analog/IoT data
Summary
Documentation components
Summary
A merger
Challenges
Structured data
Textual data
Summary
Plan
Educate
Management support
Learn all about data integration and become a data integration hero instead of following the masses and running in the opposite direction at the mere mention of the word “integration”. Understand why organizations avoid data integration and often wind up with spider web environments containing siloed applications instead of an enterprise database which excites analysts and data scientists. Distinguish the different types of integration: database, attribute, key, index, encoding, measurement, format, definition, KPI, calculations, summarization, selection criteria, data exclusion, lineage, and timing. Apply identification, equivocation, and physical conversion levels of integration for both structured and textual data. Leverage deidentification, proximity analysis, alternate spelling, stop word resolution, homographic resolution, stemming, taxonomical resolution, inline contextualization, classification, and acronym resolution. Learn how to combine structured and textual data in the context of three levels of interaction. Follow the steps of scope, model, and map in integrating structured data. Follow the steps of scope, connect taxonomies, ingest raw text, and determine analytical processes in integrating textual data. Apply integration best practices, including identifying integration roles, developing a reusable data integration process, and documenting the integration benefits. Compare taxonomies with data models. Know how data integration helps data science.
To reinforce all of the concepts within the book, we include a detailed case study on data integration.
Patty Haines is a senior advisor and data practitioner, who provides expertise in managing and integrating data to ensure data is transformed and available to provide value to the business community.
David Rapien is an Associate Professor – Educator of Information Systems and Business Analytics at the University of Cincinnati’s Lindner College of Business.
Please complete all fields.