Technics Publications

Integrating Hadoop

$9.95
$24.95

Integrating Hadoop, by William McKnight and Jake Dolezal

Integrating Hadoop leverages the discipline of data integration and applies it to the Hadoop open-source software framework for storing data on clusters of commodity hardware.

Topics

1 HADOOP IN SUPPORT OF AN INFORMATION STRATEGY

INTRODUCING HADOOP
HADOOP DISTRIBUTIONS


2 PREPARING FOR INTEGRATION

ASSEMBLING THE INTEGRATION TEAM
ROLES AND RESPONSIBILITIES
OVERVIEW OF WORKLOADS FOR HADOOP IN THE ORGANIZATION
DATA PREPARATION
ACTIVE ARCHIVE
ANALYTICS
DATA QUALITY/GOVERNANCE
DATA VIRTUALIZATION
DATA LAKES AND BEYOND
IDENTIFYING DATA SOURCES FOR HADOOP
NOSQL DATABASES
LEGACY/RELATIONAL DATABASES
CLICKSTREAMS
SENSORS
APIS
DATA PROFILING
ANALYZING AND PROFILING SOURCE SYSTEMS AND DATA


3 ETL VERSUS ELT

CONTINUED NEED FOR MORE SPEED
PREFERENCE WITH HADOOP
BRING ALL DATA TOGETHER
KEEP ALL DATA NOW (DECIDE HOW TO USE IT LATER)
IS ETL DEAD?


4 LOADING DATA INTO HADOOP

ADVANTAGES OF DATA INTEGRATION TOOLS
METHODS OF DATA LOADING
BATCH
REAL TIME
SQOOP
NIFI
CHANGE DATA CAPTURE
PUSH VERSUS PULL
PATH TO PRODUCTION
WORKFLOW AND SCHEDULING
SUPPORT AND TROUBLESHOOTING
HOW-TO WITH TALEND BIG DATA
ONE-TIME BATCH
SCHEDULED BATCH (OOZIE)
RELATIONAL DUMP (SQOOP)


5 MANAGING BIG DATA

BIG DATA ELT
TRANSFORMATIONS
“UPSERTS” WITHIN HADOOP
IMPORTANCE OF DATA QUALITY IN HADOOP
STEWARDSHIP OF BIG DATA
FOLDING INTO EXISTING DATA GOVERNANCE PROCESS
METADATA


6 UNLOADING/DISTRIBUTING DATA FROM HADOOP

HADOOP EXTRACTS
RELATIONAL, OPERATIONAL, AND LEGACY
NOSQL
DATA WAREHOUSE
MDM HUB/360-DEGREE VIEW
HADOOP AND SOA


7 APACHE SPARK CLUSTER COMPUTING WITH HADOOP

ADVANTAGES OF REAL-TIME COMPUTING
SPARK
SPARK BENCHMARKS
HOW AND WHERE TO USE SPARK
HDFS
S3
FILES
DATABASES
STREAMING ANALYTICS


8 STREAMING DATA

STREAMING DATA TECHNOLOGY DISTINCTIONS


9 MASTER DATA MANAGEMENT AND BIG DATA

HADOOP AND MASTER DATA MANAGEMENT
INTEGRATING WITH MASTER DATA
DATA VIRTUALIZATION
MDM AND HADOOP DISCONNECTS


10 TOP 10 MISTAKES INTEGRATING HADOOP DATA

1. INTEGRATING DATA WITHOUT A BUSINESS PURPOSE
2. INTEGRATING DATA INTO HADOOP FOR AN ENTERPRISE DATA REPOSITORY
3. OVEREMPHASIS ON DATA INTEGRATION PERFORMANCE TO THE DETRIMENT OF QUERY PERFORMANCE FOR DATA USAGE
4. NOT REFINING DATA TO THE POINT OF USEFULNESS
5. IMPROPER NODE SPECIFICATION
6. OVER-RELIANCE ON OPEN SOURCE HADOOP
7. ETL INSTEAD OF ELT
8. USING MAPREDUCE TO LOAD HADOOP
9. USING SPARK THROUGH HIVE TO LOAD HADOOP
10. IGNORING THE QUALITY OF THE DATA BEING LOADED
11 CASE STUDIES AND TRENDS


CASE STUDIES IN BIG DATA INTEGRATION

PAYMENT PROCESSING
HEALTHCARE
TRENDS IN HADOOP AND SUMMARY OF IDEAS
LOADING HADOOP CLUSTERS WILL CONTINUE TO BE A TOP JOB

It is packed with the need-to-know for managers, architects, designers, and developers responsible for populating Hadoop in the enterprise, allowing you to harness big data and do it in such a way that the solution:

  • Complies with (and even extends) enterprise standards
  • Integrates seamlessly with the existing information infrastructure
  • Fills a critical role within enterprise architecture.

 

Integrating Hadoop covers the gamut of the setup, architecture and possibilities for Hadoop in the organization, including:

  • Supporting an enterprise information strategy
  • Organizing for a successful Hadoop rollout
  • Loading and extracting of data in Hadoop
  • Managing Hadoop data once it’s in the cluster
  • Utilizing Spark, streaming data, and master data in Hadoop processes – examples are provided to reinforce concepts.

About William and Jake

William leads McKnight Consulting Group. William is an internationally recognized authority in information management. His teams provide quick, agile approaches to information challenges faced by the Global 2000 and numerous midmarket companies. His unparalleled exposure to the market infuses ideas into client programs. William is an author of several books and a very popular speaker worldwide. He is a prolific writer with hundreds of articles and white papers published and has conducted numerous published benchmarks addressing big data questions. William is a distinguished entrepreneur, a former Fortune 50 technology executive and a former software engineer. He provides clients with strategies, architectures, platform and tool selection, and complete programs to manage information and has led several clients into business success with big data platforms.

Jake Dolezal has more than 18 years of experience in the Information Management field with expertise in business intelligence, analytics, data warehousing, statistics, data modeling and integration, data visualization, master data management, and data quality. Jake has experience across a broad array of industries, including: healthcare, education, government, manufacturing, engineering, hospitality and gaming. Jake earned his Doctorate in Information Management from Syracuse University. He is also a Certified Business Intelligence Professional through TDWI with an emphasis in Data Analysis. In addition, he is a certified leadership coach and has helped clients accelerate their careers and earn several executive promotions.

Bestsellers

Faculty may request complimentary digital desk copies

Please complete all fields.