The Data Catalog: Sherlock Holmes Data Sleuthing for Analytics, by Daniel A. McGrath
Apply this definitive guide to data catalogs and select the feature set needed to empower your data citizens in their quest for faster time to insight.
About data catalogs
Data catalog features overview
Data catalog benefits
Key points
Data scientist
Data administrator/curator
Key points
About metadata
Example: Alation
Business glossary
Data lineage
Permissions and roles
APIs
Key points
Data prep tools
Example: Boomi Unifi
Data analyst dashboard
Creating a prep job
Working with jobs
Comparing data sets
Ingesting and crawling
Certifying data sources
Key points
Data self-service
Data curation
Data governance
Data catalog role in data governance
Example: Collibra
Data quality
Data certification
Data governance access
Policies
Privacy and risk
Key points
About the data lake
Data lake data catalog
Example: Waterline Data from Hitachi Vantara
Fishing features
Avoid siloed data catalogs
Key points
Enterprise data catalogs
Example: IBM
Example: Informatica
Reference data support
Business glossary
Term relationships
Data quality support
Policies and rules
Workflow
Key points
Portal products: CKAN
Cloud providers: Microsoft (Azure)
Data virtualization and integration tools: Denodo
Business intelligence & data visualization tools: Tableau
Data and process modeling: erwin
Self-service data prep: Paxata
API service catalog: Ignite Platform
MDM: Reltio
Key points
Lineage benefits
Lineage capture
Lineage challenge
Lineage categories
Drill down
Inferred lineage
Key points
ML to the rescue
ML and the data catalog
Knowledge graph
Similarity
ML and the human
Intelligent semantic search
Domains
Classifications
Inferred joins
Inferred lineage
Key points
Feature categories
Data catalog features
Scoring
Key points
The power of deduction
Trends and innovations
Where would we like them to go?
Key points
The data catalog may be the most important breakthrough in data management in the last decade, ranking alongside the advent of the data warehouse. The latter enabled business consumers to conduct their own analyses to obtain insights themselves. The data catalog is the next wave of this, empowering business users even further to drastically reduce time to insight, despite the rising tide of data flooding the enterprise.
Use this book as a guide to provide a broad overview of the most popular Machine Learning (ML) data catalog products, and perform due diligence using the extensive features list. Consider graphical user interface (GUI) design issues such as layout and navigation, as well as scalability in terms of how the catalog will handle your current and anticipated data and metadata needs.
O’Neil & Fryman…present a typology which ranges from products that focus on data lineage, curation and search, data governance, data preparation, and of course, the core capability of finding and understanding the data. The authors emphasize that machine learning is being adopted in many of these products, enabling a more elegant data democratization solution in the face of the burgeoning mountain of data that is engulfing organizations.
Derek Strauss, Chairman/CEO, Gavroshe, and Former CDO, TD Ameritrade
This book is organized into three sections:
Bonnie O’Neil is a Principal Computer Scientist at The MITRE Corporation and is a well-known expert on all phases of data architecture including data catalogs, data quality, business metadata, and governance. She has assisted both Fortune 500 companies and government agencies in data management projects for over 30 years. She is a regular speaker and workshop/tutorial leader at many conferences, and the author of four books.
Lowell is an independent consultant specializing in implementing data governance programs and data catalogs. He has been a speaker, practitioner, and industry leader in data governance, analytics, and data quality having hands-on experience with implementations across most industries.
Please complete all fields.