Technics Publications

Machine Learning and Data Science


Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R, by Daniel D. Gutierrez

A practitioner’s tools have a direct impact on the success of his or her work. This book will provide the data scientist with the tools and techniques required to excel with statistical learning methods in the areas of data access, data munging, exploratory data analysis, supervised machine learning, unsupervised machine learning and model evaluation.


Chapter 1: Machine Learning Overview

Types of Machine Learning
Use Case Examples of Machine Learning
Acquire Valued Shoppers Challenge
Algorithmic Trading Challenge
Heritage Health Prize
Supply Chain
Risk Management
Customer Support
Human Resources
Google Flu Trends
Process of Machine Learning
Mathematics Behind Machine Learning
Becoming a Data Scientist
R Project for Statistical Computing
Using R Packages
Data Sets
Using R in Production


Chapter 2: Data Access

Managing Your Working Directory
Types of Data Files
Sources of Data
Downloading Data Sets From the Web
Reading CSV Files
Reading Excel Files
Using File Connections
Reading JSON Files
Scraping Data From Websites
SQL Databases
SQL Equivalents in R
Reading Twitter Data
Reading Data From Google Analytics
Writing Data


Chapter 3: Data Munging

Feature Engineering
Data Pipeline
Data Sampling
Revise Variable Names
Create New Variables
Discretize Numeric Values
Date Handling
Binary Categorical Variables
Merge Data Sets
Ordering Data Sets
Reshape Data Sets
Data Manipulation Using Dplyr
Handle Missing Data
Feature Scaling
Dimensionality Reduction


Chapter 4: Exploratory Data Analysis

Numeric Summaries
Exploratory Visualizations
Density Plots
Missing Value Plots
Expository Plots


Chapter 5: Regression

Simple Linear Regression
Multiple Linear Regression
Polynomial Regression


Chapter 6: Classification

A Simple Example
Logistic Regression
Classification Trees
Naïve Bayes
K-Nearest Neighbors
Support Vector Machines
Neural Networks
Random Forests
Gradient Boosting Machines


Chapter 7: Evaluating Model Performance

Bias and Variance
Data Leakage
Measuring Regression Performance
Measuring Classification Performance
Cross Validation
Other Machine Learning Diagnostics
Get More Training Observations
Feature Reduction
Feature Addition
Add Polynomial Features
Fine Tuning the Regularization Parameter


Chapter 8: Unsupervised Learning

Simulating Clusters
Hierarchical Clustering
K-Means Clustering
Principal Component Analysis

Machine learning and data science are large disciplines, requiring years of study in order to gain proficiency. This book can be viewed as a set of essential tools we need for a long-term career in the data science field – recommendations are provided for further study in order to build advanced skills in tackling important data problem domains.

The R statistical environment was chosen for use in this book. R is a growing phenomenon worldwide, with many data scientists using it exclusively for their project work. All of the code examples for the book are written in R. In addition, many popular R packages and data sets will be used.

Press release about the book | Book code and figures | Github with R code

About Daniel

Daniel D. Gutierrez is a practicing data scientist through his Santa Monica, Calif. consulting firm AMULET Analytics. Daniel also serves as Managing Editor for where he keeps a pulse on this dynamic industry.


Faculty may request complimentary digital desk copies

Please complete all fields.