Machine Learning Data processing

The Engineers Guide to Machine Learning

The definitive resource for all things Machine Learning in the galaxy

Machine learning is difficult and there is a lot going on. During a project earlier this month I was doing some simple data processing and I could not for the life of me recall the name of a bi-variate data exploration technique I read about a few months back. It was frustrating not to have a source of information right at my finger tips that I could easily get to.

After spending some time searching on the internet I wasn’t able to find anything that matched what I had in mind. So I've decided to create the engineers guide to machine learning. An all inclusive mind map with most, if not all concepts and methods that would be useful to new and experienced machine learning engineers.

The engineers guide to machine learning is broken up in to 5 different Sections

  • Machine Learning Data Processing
  • Machine Learning Concepts
  • Machine Learning Process
  • Machine Learning Mathematics
  • Machine Learning Models

Each of these sections will go in depth on topics that are included in it. I’m working on the bible now. Sections will be released every week and will be accompanied with a corresponding “Cheat Sheet” for easy reference.

Here is a taste of the first section. The only thing you might need after I publish all the sections is a towel. Let me know if I should add anything!

Data Types

  • Nominal
  • Ordinal
  • Interval
  • Ratio

Data Exploration

  • Variable Identification
  • Uni-variate Analysis
  • Bi-variate Analysis
  • Multi-variate Analysis

Feature Cleaning

  • Missing Values
  • Special Values
  • Outliers
  • Obvious inconsistencies

Feature Imputation

  • Hot-Deck
  • Cold-Deck
  • Mean-substitution
  • Regression

Feature Engineering

  • Decomposition
  • Dicretization
  • Reframe Numerical Quantities
  • Crossing

Feature Selection

  • Correlations
  • Dimensionality Reduction
  • Importance

Feature Encoding

  • Label Encoding
  • One hot Encoding

Feature Normalization

  • Re-scaling
  • Standardization
  • Scaling to unit Length

Dataset Construction

  • Training Dataset
  • Test Dataset
  • Validation Dataset
  • Cross validation

Let’s also connect on Twitter, LinkedIn, or email