By Sachin Abeywardana, Founder of DeepSchool.io
Docker for Data Science
Docker is a tool that simplifies the installation process for software engineers. Coming from a statistics background I used to care very little about how to install software and would occasionally spend a few days trying to resolve system configuration issues. Enter the god-send Docker almighty.
Think of Docker as a light virtual machine (I apologise to the Docker gurus for using that term). Generally someone writes a *Dockerfile* that builds a *Docker Image* which contains most of the tools and libraries that you need for a project. You can use this as a base and add any other dependencies that are required for your project. Its underlying philosophy is that if it works on my machine it will work on yours.
This Dockerfile would install
python3 (as a layer) on top of the
What you essentially do is for each project you write all the
pip install etc. commands into your Dockerfile instead of executing it locally.
I recommend reading the tutorial on https://docs.docker.com/get-started/ to get started on Docker. The learning curve is minimal (2 days work at most) and the gains are enormous.
Lastly Dockerhub deserves a special mention. Personally Dockerhub is what makes Docker truly powerful. It’s what github is to git, a open platform to share your Docker images. You can always construct a Docker image locally using
docker build ... but it is always good to
push this image to Dockerhub so that the next person simply has to
pull for personal use.
Personally I have not used any of the other containerising tools, however it should be noted that Docker is independent of python and R, and goes beyond containerising applications for specific programming languages.
If you are enjoying my tutorials/ blog posts, consider supporting me on https://www.patreon.com/deepschoolio or by subscribing to my YouTube channel https://www.youtube.com/user/sachinabey (or both!). Oh and clap! :)
Bio: Sachin Abeywardana is a PhD in Machine Learning and Founder of DeepSchool.io.
Original. Reposted with permission.
- DeepSchool.io: Deep Learning Learning
- Data Science Deployments With Docker
- Jupyter+Spark+Mesos: An “Opinionated” Docker Image