Machine Learning and Data Science, 2nd Edition: An Introduction to Statistical Learning Methods with R, by Daniel D. Gutierrez
Build real-world machine learning solutions from scratch using R—no advanced math or prior coding experience required.
Types of Machine Learning
Use Case Examples of Data Science
Porto Seguro’s Safe Driver Prediction
Netflix
Algorithmic Trading Challenge
Heritage Health Prize
Marketing
Sales
Supply Chain
Risk Management
Customer Support
Human Resources
Google Flu Trends
Process of Data Science
Mathematics Behind Machine Learning
Becoming a Data Scientist
R Project for Statistical Computing
RStudio
Using R Packages
Data Sets
Summary
Creating Variables of Atomic Classes with the Assignment Operator
Creating Vector Objects by Default
Creating Integer Sequences
Using the c() Combine Values Function
Using the vector() Constructor Function
Coercion – Implicit Transformation of Class
Casting – Explicit Transformation of Class
Using Matrices
Using Lists
Constructing Lists and Sublists
Using Factor Variables
Using Data Frames
Using Name Attributes
Using Multidimensional Arrays
Missing Values
Subsetting a Vector
Subsetting a Matrix
Subsetting and Slicing a List
Using the subset() Function
Common Operations on a Data Frame
Removing NA Values
Examining Inf and NaN Values
The Empty String
Vectorized Operations
IF Control Structure
FOR Control Structure
WHILE Control Structure
REPEAT Control Structure
User Defined Functions
Loop Functions
SWITCH Control Structure
Date and Time Handling
Random Sampling
Summary
Managing Your Working Directory
Types of Data Files
Sources of Data
Base R Data Sets
Downloading Data Sets From the Web
Reading CSV Files
Reading Excel Files
Using Connection Objects
Reading JSON Files
SQL Databases
R SQL
SQL Equivalents in R
API Data Access
Writing Data
Summary
Feature Engineering
Data Pipeline
Revising Variable Names
Creating New Variables
Discretizing Numeric Values
Date Handling
Creating Binary Categorical Variables
Merging Data Sets
Ordering Data Sets
Reshaping Data Sets
Data Manipulation Using Dplyr
Handling Missing Data
Feature Scaling
Summary
Probability Distributions
Performing Counts
Contingency Tables
Chi-squared Statistical Test
Summary Statistics
Statistical Functions
Variance
Covariance and Correlation
Calculating a Cumulative Sum
Detecting Outliers
Summary
Histograms
Boxplots
Barplots
Density Plots
Scatterplots
QQ-Plots
Big Data Techniques
Line Graphs
Missing Value Plots
Expository Plots
Introduction to ggplot2
Summary
Simple Linear Regression
Multiple Linear Regression
Polynomial Regression
Summary
A Simple Example
Logistic Regression
Classification Trees
Naïve Bayes
K-Nearest Neighbors
Support Vector Machines
Neural Networks
Ensembles
Random Forests
Gradient Boosting Machines
XGBoost
Summary
Overfitting
Bias and Variance
Confounders
Data Leakage
Measuring Regression Performance
Measuring Classification Performance
ROC Curves
Cross Validation
Other Machine Learning Diagnostics
Summary
Clustering
Simulating Clusters
Hierarchical Clustering
K-Means Clustering
Extensive K-Means Example
Principal Component Analysis
Summary
This second edition of Machine Learning and Data Science offers an accessible, hands-on introduction to the core principles of machine learning, statistical modeling, and practical data science—without overwhelming readers with complex formulas or technical jargon. Perfect for beginners, analysts, and business professionals transitioning into data science, this book provides a complete project-based roadmap from data wrangling to model deployment using the powerful R programming language. Whether you’re analyzing marketing trends, predicting customer behavior, or detecting fraud, this book equips you with the foundation needed to solve real problems using machine learning.
Author and data scientist Daniel D. Gutierrez draws on his experience teaching at UCLA and years of industry practice to guide you through essential topics, including regression, classification, clustering, feature engineering, and model evaluation. You’ll explore supervised and unsupervised learning techniques, apply visualization strategies, and build intuitive workflows that mirror the data science process used by professionals across finance, healthcare, marketing, and more. Unlike overly theoretical texts, this guide emphasizes application—what to do, why to do it, and how to do it in R.
Inside, you’ll find step-by-step tutorials, use case examples from Kaggle competitions, and easy-to-follow code snippets that let you apply machine learning concepts immediately. Learn how to access and clean real-world data sets, implement algorithms like decision trees, random forests, logistic regression, and k-means clustering, and avoid common pitfalls such as data leakage and overfitting. Move from exploratory data analysis to powerful predictive modeling.
Whether you’re a student, aspiring data scientist, or working analyst seeking to expand your skills, this is your essential, beginner-friendly guide to statistical learning and machine learning with R.
Daniel D. Gutierrez is an independent consultant in data science, and AI industry analyst and influencer. He holds a BS degree in Mathematics and Computer Science from UCLA. His long-term background in “data science” extends far before this cool name was en vogue. His main channel is the Radical Data Science blog where he keeps a pulse on this fast paced industry. Daniel teaches the “Introduction to Data Science” class for UCLA Extension where he trains the next generation of data scientists. He’s written four “data” books including the recent 2nd edition of his popular title “Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R.”
Please complete all fields.