Menu

*Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R**, *by Daniel D. Gutierrez

A practitioner’s tools have a direct impact on the success of his or her work. This book will provide the data scientist with the tools and techniques required to excel with statistical learning methods in the areas of data access, data munging, exploratory data analysis, supervised machine learning, unsupervised machine learning and model evaluation.

Types of Machine Learning

Use Case Examples of Machine Learning

Acquire Valued Shoppers Challenge

Netflix

Algorithmic Trading Challenge

Heritage Health Prize

Marketing

Sales

Supply Chain

Risk Management

Customer Support

Human Resources

Google Flu Trends

Process of Machine Learning

Mathematics Behind Machine Learning

Becoming a Data Scientist

R Project for Statistical Computing

RStudio

Using R Packages

Data Sets

Using R in Production

Summary

Managing Your Working Directory

Types of Data Files

Sources of Data

Downloading Data Sets From the Web

Reading CSV Files

Reading Excel Files

Using File Connections

Reading JSON Files

Scraping Data From Websites

SQL Databases

SQL Equivalents in R

Reading Twitter Data

Reading Data From Google Analytics

Writing Data

Summary

Feature Engineering

Data Pipeline

Data Sampling

Revise Variable Names

Create New Variables

Discretize Numeric Values

Date Handling

Binary Categorical Variables

Merge Data Sets

Ordering Data Sets

Reshape Data Sets

Data Manipulation Using Dplyr

Handle Missing Data

Feature Scaling

Dimensionality Reduction

Summary

Numeric Summaries

Exploratory Visualizations

Histograms

Boxplots

Barplots

Density Plots

Scatterplots

QQ-Plots

Heatmaps

Missing Value Plots

Expository Plots

Summary

Simple Linear Regression

Multiple Linear Regression

Polynomial Regression

Summary

A Simple Example

Logistic Regression

Classification Trees

Naïve Bayes

K-Nearest Neighbors

Support Vector Machines

Neural Networks

Ensembles

Random Forests

Gradient Boosting Machines

Summary

Overfitting

Bias and Variance

Confounders

Data Leakage

Measuring Regression Performance

Measuring Classification Performance

Cross Validation

Other Machine Learning Diagnostics

Get More Training Observations

Feature Reduction

Feature Addition

Add Polynomial Features

Fine Tuning the Regularization Parameter

Summary

Clustering

Simulating Clusters

Hierarchical Clustering

K-Means Clustering

Principal Component Analysis

Summary

Machine learning and data science are large disciplines, requiring years of study in order to gain proficiency. This book can be viewed as a set of essential tools we need for a long-term career in the data science field – recommendations are provided for further study in order to build advanced skills in tackling important data problem domains.

The R statistical environment was chosen for use in this book. R is a growing phenomenon worldwide, with many data scientists using it exclusively for their project work. All of the code examples for the book are written in R. In addition, many popular R packages and data sets will be used.

**Press release about the book | Book code and figures | Github with R code**

Daniel D. Gutierrez is a practicing data scientist through his Santa Monica, Calif. consulting firm AMULET Analytics. Daniel also serves as Managing Editor for insideBIGDATA.com where he keeps a pulse on this dynamic industry.

**Please complete all fields.**