Apply this definitive guide to data catalogs and select the feature set needed to empower your data citizens in their quest for faster time to insight.
The data catalog may be the most important breakthrough in data management in the last decade, ranking alongside the advent of the data warehouse. The latter enabled business consumers to conduct their own analyses to obtain insights themselves. The data catalog is the next wave of this, empowering business users even further to drastically reduce time to insight, despite the rising tide of data flooding the enterprise.
Use this book as a guide to provide a broad overview of the most popular Machine Learning (ML) data catalog products, and perform due diligence using the extensive features list. Consider graphical user interface (GUI) design issues such as layout and navigation, as well as scalability in terms of how the catalog will handle your current and anticipated data and metadata needs.
O’Neil & Fryman…present a typology which ranges from products that focus on data lineage, curation and search, data governance, data preparation, and of course, the core capability of finding and understanding the data. The authors emphasize that machine learning is being adopted in many of these products, enabling a more elegant data democratization solution in the face of the burgeoning mountain of data that is engulfing organizations.
Derek Strauss, Chairman/CEO, Gavroshe, and Former CDO, TD Ameritrade
This book is organized into three sections:
- Chapters 1 and 2 reveal the rationale for a data catalog and share how data scientists, data administrators, and curators fare with and without a data catalog.
- Chapters 3-10 present the many different types of data catalogs.
- Chapters 11 and 12 provide an extensive features’ list, current trends, and visions for the future.
Chapter 1: Introducing Data Catalogs
About data catalogs
Data catalog features overview
Data catalog benefits
Chapter 2: A Data Worker’s Dream
Chapter 3: The “Back Story”
Permissions and roles
Chapter 4: “Data Prep”
Data prep tools
Example: Boomi Unifi
Data analyst dashboard
Creating a prep job
Working with jobs
Comparing data sets
Ingesting and crawling
Certifying data sources
Chapter 5: Data Catalog as a Data Governance Platform
Data catalog role in data governance
Data governance access
Privacy and risk
Chapter 6: Fishing in the Data Lake
About the data lake
Data lake data catalog
Example: Waterline Data from Hitachi Vantara
Avoid siloed data catalogs
Chapter 7: One-Stop Shopping
Enterprise data catalogs
Reference data support
Data quality support
Policies and rules
Chapter 8: Data Catalog “Add-ons”
Portal products: CKAN
Cloud providers: Microsoft (Azure)
Data virtualization and integration tools: Denodo
Business intelligence & data visualization tools: Tableau
Data and process modeling: erwin
Self-service data prep: Paxata
API service catalog: Ignite Platform
Chapter 9: Data Lineage
Chapter 10: Machine Learning in the Data Catalog
ML to the rescue
ML and the data catalog
ML and the human
Intelligent semantic search
Chapter 11: Data Catalog Features
Data catalog features
Chapter 12: Conclusion
The power of deduction
Trends and innovations
Where would we like them to go?
About Bonnie and Lowell
Bonnie O’Neil is a Principal Computer Scientist at The MITRE Corporation and is a well-known expert on all phases of data architecture including data catalogs, data quality, business metadata, and governance. She has assisted both Fortune 500 companies and government agencies in data management projects for over 30 years. She is a regular speaker and workshop/tutorial leader at many conferences, and the author of four books.
Lowell is an independent consultant specializing in implementing data governance programs and data catalogs. He has been a speaker, practitioner, and industry leader in data governance, analytics, and data quality having hands-on experience with implementations across most industries.