For the past few days I’ve been practicing a bit of Python while implementing some basic data structures, as well as reading about Decision Trees in Tom Mitchell’s Machine Learning text (which I’ll get to in an upcoming post). After brushing up on trees and search/traversal algorithms in general I’m about ready to jump in to implementing the decision tree.

I say “about” because as I’m thinking of how to implement one, I realize I’ll probably use Numpy or Pandas to accept input data. In order to do that I need to create a Python project and package its dependencies. I use Conda, so I figured out how to do it with Conda.

As of writing this post, I can start building the decision tree (or any other Python project) in a sandbox Python environment with all dependencies packaged and with automated tests! It’s taken me a few extra days to get to this point, with quite a few detours from the goal of learning ML stuff (including filing what I believe to be a small bug in conda. In the long-term, getting acquainted with Python tooling and developing a functional workflow is worth the time invested.

Packaging Dependencies

Write an environment.yml file

Mine looks like this:

name: ml-algorithms
dependencies:
- python=3
- numpy
- pytest-xdist # makes pytest -f option available

We’ll use Python 3 and have numpy, and pytest-xdist for testing.

Create the conda env

Running conda env create in the same directory as the above environment.yml file will create a conda environment named ml-algorithm. Here’s a link more info on managing environments with conda.

Now we run conda env activate ml-algorithms to activate the new environment.

Ensure Tests Run In Correct Environment

After going through the above steps I performed this sanity check: I wrote a test that imported Pandas – which should not be available. To my surprise the test did not fail. I learned that in order to run Pytest with the current version of Python rather than the system version we have to run it as a script by using python -m pytest rather than pytest as our command. Running Pytest as a script resolved this issue, and now the test that imported the non-existent Pandas package was failing.

Optimization

Having worked with RVM in the Ruby world I was used to automatically switching environments using the rvm_autoinstall_bundler_flag when entering a directory that specified one. I wanted a lightweight solution to automatically change Conda envs in a similar way, but found that no off-the-shelf solution seemed to exist. I found this script and modified it slightly to function similarly to how RVM does: switch to an environment if there’s an environment.yml file and otherwise use the base environment. Switching to the base environment is cheap enough, so in the .bashrc file I modified the cd command do both using the script:

  source ~/.cfg/conda_auto_env.sh

  function cd () {
      builtin cd "$@"    # perform the actual cd
      conda_auto_env
  }