Monday, April 27, 2015

Interlude: Kaggle Scripting

I don't like to support specific companies, in general I'm more interested in how people can be productive as independently and autonomously as possible.

However, Kaggle just did something pretty amazing.

They've launched a free online scripting environment for machine learning, with access to their competition data sets. That means that everything I've been writing about so far -- loading data, setting up your environment, using notebooks, can be easily sidestepped for instant results. The best thing -- it supports Python!

I haven't yet determined where the limitations are, I've simply forked one of the sample scripts that I found, but it worked wonderfully. I found that the execution time of the script was similar to running it locally on my 2011 Macbook Air, but that's not bad for something totally free that runs on someone else's infrastructure. I was able to sidestep data downloading entirely, and "leap into the middle" my forking an existing script.

Here's a direct link to one:

http://www.kaggle.com/users/320654/cs822-am/otto-group-product-classification-challenge/benchmark

Note, I'm logged into Kaggle, so I'm not sure how this will go for people who aren't signed in.

My expectation is that for solutions which demand a lot of computational resources (or rely on the GPU for reasonable performance) or use a unusual libraries, a self-service approach to computing will still win out, but this is an amazing way to get started.