Packaging Python Dependencies Using Conda
For the past few days I’ve been practicing a bit of Python while implementing some basic data structures, as well as reading about Decision Trees in Tom Mitchell’s Machine Learning text (which I’ll get to in an upcoming post). After brushing up on trees and search/traversal algorithms in general I’m about ready to jump in to implementing the decision tree.
I say “about” because as I’m thinking of how to implement one, I realize I’ll probably use Numpy or Pandas to accept input data. In order to do that I need to create a Python project and package its dependencies. I use Conda, so I figured out how to do it with Conda.
As of writing this post, I can start building the decision tree (or any other Python project) in a sandbox Python environment with all dependencies packaged and with automated tests! It’s taken me a few extra days to get to this point, with quite a few detours from the goal of learning ML stuff (including filing what I believe to be a small bug in conda. In the long-term, getting acquainted with Python tooling and developing a functional workflow is worth the time invested.
Packaging Dependencies
Write an environment.yml
file
Mine looks like this:
We’ll use Python 3 and have numpy
, and pytest-xdist
for testing.
Create the conda env
Running conda env create
in the same directory as the above environment.yml
file will create a conda environment named ml-algorithm
. Here’s a link more info on managing environments with conda.
Now we run conda env activate ml-algorithms
to activate the new environment.
Ensure Tests Run In Correct Environment
After going through the above steps I performed this sanity check: I wrote a test that imported Pandas – which should not be available. To my surprise the test did not fail. I learned that in order to run Pytest with the current version of Python rather than the system version we have to run it as a script by using python -m pytest
rather than pytest
as our command. Running Pytest as a script resolved this issue, and now the test that imported the non-existent Pandas package was failing.
Optimization
Having worked with RVM in the Ruby world I was used to automatically switching environments using the rvm_autoinstall_bundler_flag
when entering a directory that specified one. I wanted a lightweight solution to automatically change Conda envs in a similar way, but found that no off-the-shelf solution seemed to exist. I found this script and modified it slightly to function similarly to how RVM does: switch to an environment if there’s an environment.yml
file and otherwise use the base
environment. Switching to the base
environment is cheap enough, so in the .bashrc
file I modified the cd
command do both using the script: