Building a Machine Learning Model through Trial and Error

A step-by-step guide that includes suggestions on how to preprocess Sponsored Post.

By Seth DeLand, Product Marketing Manager,

Figure 1: Preprocessing data includes removing any outliers, i.e. data points that lie outside the rest of the data
After cleaning, divide the data into two sets. Half will be used to train the model, and the other half will be “holdout” data used for testing and cross-validation.

Derive features using the preprocessed data

Raw data must be turned into information a machine learning algorithm can use. To do this, users must derive features that categorize the content of the phone data.

In this example, engineers and scientists must distinguish features to help the algorithm classify between walking (low frequency) and running (high frequency).

Mathworks Fig2
Figure 2: Deriving features from the data type will turn raw data into high level information that can be used in a machine learning model
Build and train the model

Start with a simple decision tree.

Mathworks Fig3
Figure 3: The decision tree establishes parameters for classifications based on features characteristics
Plot the confusion matrix to observe its performance.
Mathworks Fig4
Figure 4: This matrix illustrates a model that has trouble differentiating between running and dancing
Based on the above confusion matrix, this indicates either the decision tree is not working for this type of data or that a different algorithm should be used.

A K-nearest neighbors (KNN) algorithm stores all the training data, compares new points to the training data, and returns the most frequent class of the “K” nearest points. This shows higher accuracy.

Mathworks Fig5
Figure 5: Making the change to KNN algorithm improves the accuracy – although further improvements are still possible
Another option is a multiclass support vector machine (SVM).
Mathworks Fig6
Figure 6: The SVM does very well, with 99% for nearly every activity
This proves to be better and illustrates how the goal was achieved through trial and error.

Improve the model

If the model can’t reliably classify between dancing and running, it needs to be improved. Models can be improved by either making them more complex (to better fit the data) or more simple (to reduce the chance of overfitting).

To simplify the model, the number of features can be reduced by the following means: a correlation matrix, so features not highly correlated can be removed; a principal component analysis (PCA) that eliminates redundancy; or a sequential feature reduction that reduces features repeatedly until there is no improvement. To make the model more complex, engineers and scientists can merge multiple simpler models into a larger model or add more data sources.

Once trained and adjusted, the model can be validated with the “holdout” dataset set aside during preprocessing. If the model can reliably classify activities, then it is ready for the phone application.

Engineers and scientists training machine learning models for the first time will encounter challenges but should realize that trial and error is part of the process. The workflow outlined above provides a road map to building machine learning models that can also be used in varied applications like predictive maintenance, natural language processing, and autonomous driving.

Explore these other resources to learn more about machine learning methods and examples.

  • Supervised Learning Workflow and Algorithms: Learn the workflow and steps in the supervised learning process.
  • MATLAB Machine Learning Examples: Get started with machine learning by exploring examples, articles, and tutorials.
  • Machine Learning with MATLAB: Download this ebook for a step-by-step guide providing machine learning basics along with advanced techniques and algorithms.