Are you familiar with Scikit-learn Pipelines?
They are an extremely simple yet very useful tool for managing machine learning workflows.
So there you have it; a simple implementation of Scikit-learn pipelines. In this particular case, our logistic regression-based pipeline with default parameters scored the highest accuracy.
As mentioned above, however, these results likely don't represent our best efforts. What if we did want to test out a series of different hyperparameters? Can we use grid search? Can we incorporate automated methods for tuning these hyperparameters? Can AutoML fit in to this picture somewhere? What about using cross-validation?
Over the next couple of posts we will take a look at these additional issues, and see how these simple pieces fit together to make pipelines much more powerful than they may first appear to be given our initial example.
- 7 Steps to Mastering data Preparation with Python
- Machine Learning Workflows in Python from Scratch Part 1: data Preparation
- Machine Learning Workflows in Python from Scratch Part 2: k-means Clustering