Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R, by Daniel D. Gutierrez
A practitioner’s tools have a direct impact on the success of his or her work. This book will provide the data scientist with the tools and techniques required to excel with statistical learning methods in the areas of data access, data munging, exploratory data analysis, supervised machine learning, unsupervised machine learning and model evaluation.
Types of Machine Learning
Use Case Examples of Machine Learning
Acquire Valued Shoppers Challenge
Netflix
Algorithmic Trading Challenge
Heritage Health Prize
Marketing
Sales
Supply Chain
Risk Management
Customer Support
Human Resources
Google Flu Trends
Process of Machine Learning
Mathematics Behind Machine Learning
Becoming a Data Scientist
R Project for Statistical Computing
RStudio
Using R Packages
Data Sets
Using R in Production
Summary
Managing Your Working Directory
Types of Data Files
Sources of Data
Downloading Data Sets From the Web
Reading CSV Files
Reading Excel Files
Using File Connections
Reading JSON Files
Scraping Data From Websites
SQL Databases
SQL Equivalents in R
Reading Twitter Data
Reading Data From Google Analytics
Writing Data
Summary
Feature Engineering
Data Pipeline
Data Sampling
Revise Variable Names
Create New Variables
Discretize Numeric Values
Date Handling
Binary Categorical Variables
Merge Data Sets
Ordering Data Sets
Reshape Data Sets
Data Manipulation Using Dplyr
Handle Missing Data
Feature Scaling
Dimensionality Reduction
Summary
Numeric Summaries
Exploratory Visualizations
Histograms
Boxplots
Barplots
Density Plots
Scatterplots
QQ-Plots
Heatmaps
Missing Value Plots
Expository Plots
Summary
Simple Linear Regression
Multiple Linear Regression
Polynomial Regression
Summary
A Simple Example
Logistic Regression
Classification Trees
Naïve Bayes
K-Nearest Neighbors
Support Vector Machines
Neural Networks
Ensembles
Random Forests
Gradient Boosting Machines
Summary
Overfitting
Bias and Variance
Confounders
Data Leakage
Measuring Regression Performance
Measuring Classification Performance
Cross Validation
Other Machine Learning Diagnostics
Get More Training Observations
Feature Reduction
Feature Addition
Add Polynomial Features
Fine Tuning the Regularization Parameter
Summary
Clustering
Simulating Clusters
Hierarchical Clustering
K-Means Clustering
Principal Component Analysis
Summary
Machine learning and data science are large disciplines, requiring years of study in order to gain proficiency. This book can be viewed as a set of essential tools we need for a long-term career in the data science field – recommendations are provided for further study in order to build advanced skills in tackling important data problem domains.
The R statistical environment was chosen for use in this book. R is a growing phenomenon worldwide, with many data scientists using it exclusively for their project work. All of the code examples for the book are written in R. In addition, many popular R packages and data sets will be used.
Press release about the book | Book code and figures | Github with R code
Daniel D. Gutierrez is a practicing data scientist through his Santa Monica, Calif. consulting firm AMULET Analytics. Daniel also serves as Managing Editor for insideBIGDATA.com where he keeps a pulse on this dynamic industry.
Please complete all fields.