By Nick Walsh
Whether you're a novice data science enthusiast setting up TensorFlow for the first time, or a seasoned AI engineer working with terabytes of data, getting your libraries, packages, and frameworks installed is always a struggle.
While containerization tools like Docker have truly revolutionized reproducibility in software, they haven't quite caught on yet in the data science and AI communities, and for good reason! With constantly evolving machine learning frameworks and algorithms, it can be tough to find time to dedicate towards learning another developer tool, especially one that isn't directly linked to the model building process.
In this blog post, I'm going to show you how you can use one simple python package to setup your environment for any of the popular data science and AI frameworks, using just a few simple steps. Datmo leverages Docker under the hood and streamlines the process to help you get running quickly and easily, without the steep learning curve.
- Install and launch Docker
- (If using GPU) Install CUDA 9.0
- (If using GPU) Install nvidia-docker (Step 3)
1. Install datmo
Just like any python package, we can install datmo from your terminal with the following:
$ pip install datmo
2. Initialize a datmo project
In your terminal, cd to the folder you want to start building models in. Then, enter the following command:
$ datmo init
You'll then be asked for a name and description for your project -- feel free to name it whatever you'd like!
3. Start environment setup
After a name and description, datmo will ask if you'd like to setup your environment -- type
y and press enter.
4. Select System Drivers (CPU or GPU)
The CLI will then ask which system drivers you'd like for your environment. If you don't plan on using a GPU, choose
5. Select an environment
Next you'll choose from one of the many pre-packaged environments. Simply respond in the prompt with the number or ID of the environment you want to use.
6. Select a language version (if applicable)
Many of the environments above have different versions depending on which language and version you plan on using.
For example, after selecting the
keras-tensorflow environment, I'd be faced with the following prompt asking whether I want to use Python 2.7 or Python 3.5.
7. Launch your workspace
You've properly selected your environment, now it's time to launch your workspace. Choose the workspace you'd like to use, and enter it's respective command in your terminal.
Jupyter Notebook --
$ datmo notebook
$ datmo jupyterlab
$ datmo rstudio (available in R-base environment)
$ datmo terminal
You're set! The first time you initialize a workspace for a new environment, it will take a bit of time as it needs to fetch all of the resources, but it will be significantly faster in consecutive runs.
Once your workspace launches, you're good to start importing packages and frameworks that were included in the environment you chose! For example, if the user selected the
keras-tensorflow environment, then
import tensorflow will work out of the box in your Jupyter Notebook!
If you're using TensorFlow, you can try this example from our docs for running your first TensorFlow graph.
If you'd like to help contribute, report issues, or request features, you can find us on GitHub here!
Bio: Nick Walsh is a developer evangelist at software engineer at Datmo, building developer tools to help make data scientists more efficient. He also mentors at student hackathons across the country as a coach for Major League Hacking.
- Top 10 roles in AI and data science
- Autoregressive Models in TensorFlow
- Torus for Docker-First Data Science