Integrating Hadoop, by William McKnight and Jake Dolezal
Integrating Hadoop leverages the discipline of data integration and applies it to the Hadoop open-source software framework for storing data on clusters of commodity hardware.
INTRODUCING HADOOP
HADOOP DISTRIBUTIONS
ASSEMBLING THE INTEGRATION TEAM
ROLES AND RESPONSIBILITIES
OVERVIEW OF WORKLOADS FOR HADOOP IN THE ORGANIZATION
DATA PREPARATION
ACTIVE ARCHIVE
ANALYTICS
DATA QUALITY/GOVERNANCE
DATA VIRTUALIZATION
DATA LAKES AND BEYOND
IDENTIFYING DATA SOURCES FOR HADOOP
NOSQL DATABASES
LEGACY/RELATIONAL DATABASES
CLICKSTREAMS
SENSORS
APIS
DATA PROFILING
ANALYZING AND PROFILING SOURCE SYSTEMS AND DATA
CONTINUED NEED FOR MORE SPEED
PREFERENCE WITH HADOOP
BRING ALL DATA TOGETHER
KEEP ALL DATA NOW (DECIDE HOW TO USE IT LATER)
IS ETL DEAD?
ADVANTAGES OF DATA INTEGRATION TOOLS
METHODS OF DATA LOADING
BATCH
REAL TIME
SQOOP
NIFI
CHANGE DATA CAPTURE
PUSH VERSUS PULL
PATH TO PRODUCTION
WORKFLOW AND SCHEDULING
SUPPORT AND TROUBLESHOOTING
HOW-TO WITH TALEND BIG DATA
ONE-TIME BATCH
SCHEDULED BATCH (OOZIE)
RELATIONAL DUMP (SQOOP)
BIG DATA ELT
TRANSFORMATIONS
“UPSERTS” WITHIN HADOOP
IMPORTANCE OF DATA QUALITY IN HADOOP
STEWARDSHIP OF BIG DATA
FOLDING INTO EXISTING DATA GOVERNANCE PROCESS
METADATA
HADOOP EXTRACTS
RELATIONAL, OPERATIONAL, AND LEGACY
NOSQL
DATA WAREHOUSE
MDM HUB/360-DEGREE VIEW
HADOOP AND SOA
ADVANTAGES OF REAL-TIME COMPUTING
SPARK
SPARK BENCHMARKS
HOW AND WHERE TO USE SPARK
HDFS
S3
FILES
DATABASES
STREAMING ANALYTICS
STREAMING DATA TECHNOLOGY DISTINCTIONS
HADOOP AND MASTER DATA MANAGEMENT
INTEGRATING WITH MASTER DATA
DATA VIRTUALIZATION
MDM AND HADOOP DISCONNECTS
1. INTEGRATING DATA WITHOUT A BUSINESS PURPOSE
2. INTEGRATING DATA INTO HADOOP FOR AN ENTERPRISE DATA REPOSITORY
3. OVEREMPHASIS ON DATA INTEGRATION PERFORMANCE TO THE DETRIMENT OF QUERY PERFORMANCE FOR DATA USAGE
4. NOT REFINING DATA TO THE POINT OF USEFULNESS
5. IMPROPER NODE SPECIFICATION
6. OVER-RELIANCE ON OPEN SOURCE HADOOP
7. ETL INSTEAD OF ELT
8. USING MAPREDUCE TO LOAD HADOOP
9. USING SPARK THROUGH HIVE TO LOAD HADOOP
10. IGNORING THE QUALITY OF THE DATA BEING LOADED
11 CASE STUDIES AND TRENDS
PAYMENT PROCESSING
HEALTHCARE
TRENDS IN HADOOP AND SUMMARY OF IDEAS
LOADING HADOOP CLUSTERS WILL CONTINUE TO BE A TOP JOB
It is packed with the need-to-know for managers, architects, designers, and developers responsible for populating Hadoop in the enterprise, allowing you to harness big data and do it in such a way that the solution:
Integrating Hadoop covers the gamut of the setup, architecture and possibilities for Hadoop in the organization, including:
William leads McKnight Consulting Group. William is an internationally recognized authority in information management. His teams provide quick, agile approaches to information challenges faced by the Global 2000 and numerous midmarket companies. His unparalleled exposure to the market infuses ideas into client programs. William is an author of several books and a very popular speaker worldwide. He is a prolific writer with hundreds of articles and white papers published and has conducted numerous published benchmarks addressing big data questions. William is a distinguished entrepreneur, a former Fortune 50 technology executive and a former software engineer. He provides clients with strategies, architectures, platform and tool selection, and complete programs to manage information and has led several clients into business success with big data platforms.
Jake Dolezal has more than 18 years of experience in the Information Management field with expertise in business intelligence, analytics, data warehousing, statistics, data modeling and integration, data visualization, master data management, and data quality. Jake has experience across a broad array of industries, including: healthcare, education, government, manufacturing, engineering, hospitality and gaming. Jake earned his Doctorate in Information Management from Syracuse University. He is also a Certified Business Intelligence Professional through TDWI with an emphasis in Data Analysis. In addition, he is a certified leadership coach and has helped clients accelerate their careers and earn several executive promotions.
Please complete all fields.