Tuesday, June 2, 2015

Getting deep learning going with Python 3.4 and Anaconda

I wanted to test out how hard (or easy) it would be to re-create prior results using two technologies I've been itching to try -- Python 3.4 and Anaconda. Python 3.4 is, obviously, where things are headed. Up to date, I have never succeeded in getting all the relevant packages installed that I would like to use.

Anaconda is an alternative Python distribution produced by Continuum Analytics. They provide various commercial products, but that's okay. They make something free and super-useful for developers, and their commercial products solve enterprise-relevant problems.

The 'big sell' of Anaconda as opposed to using the standard distribution is the ease of installation of scientific packages on a variety of platforms. Spending a day trudging through getting the relevant base OS packages installed and the Python libraries effectively using them all is pretty dull work.

I set out to install Keras and Ipython notebook. That is pretty much the end-game, so if that works, there's a valid path. Short answer: it worked out well, with only a few stumbles.

There are two operating system packages to install. Anaconda itself, obviously. OpenBLAS was the one remaining (or, I think, any other BLAS installation). There were still some imperfections, but everything went far, far better than the same process went for the standard Python approach.

Achieving success depending, somewhat strangely, on the order of installation of the packages. My end game was to have the Keras library up and running. That's not in the Anaconda world, so you need to use pip to get the job done. A simple 'pip install keras' didn't work for me -- there were various complaints, I think it said there was no cython.  Let's Redo From Start:

Take One
conda create -p ./new numpy
source activate ./new 
python setup.py install (yes I know I should use pip but :P )
... much compiling ...
warning: no previously-included files matching '*.pyo' found anywhere in distributionCould not locate executable gfortranCould not locate executable f95Could not locate executable f90Could not locate executable f77Could not locate executable xlf90Could not locate executable xlfCould not locate executable ifortCould not locate executable ifcCould not locate executable g77Could not locate executable g95Could not locate executable pgfortrandon't know how to compile Fortran code on platform 'posix'error: Setup script exited with error: library dfftpack has Fortran sources but no Fortran compiler found 

Take Two

pip install theano
... much compiling ...
"object of type 'type' has no len()" in evaluating 'len(list)' (available names: [])
    error: library dfftpack has Fortran sources but no Fortran compiler found 
Take Three 

conda create -p ./new scipy <-- block="" blockquote="" finally="" i="" realised="" scipy="" stumbling="" the="" was="">
source activate ./new 
python setup.py install 
... much compiling ... 
SUCCESS!!! W00t!

The same route on pure Python, last time I tried it, was much more involved. When I tried it, I found that scipy didn't necessarily install with the relevant Fortran support, which a lot of science packages depend on. Getting the base libraries to get up and running, and finding out what they even were, was a real mission.

Now, I'm not 100% sure anyone is to blame here. There will be reasons for each of the various packaging decisions along the way, and also I haven't necessarily taken the time to understand my own environment properly. I'm just doing what every practically-minded person does: try to just do an install of the thing and see what pops.

Fewer things pop with Anaconda. I now have a functional Python 3.4 environment with all the latest and greatest machine learning tech that Python has to offer, and that is awesome.

Also, I haven't included the bit where I discovered I had to install OpenBLAS through macports rather than through Anaconda. I've saved the reader from that.

Happy hacking!