Sunday, July 5, 2015

Quick example: A heat map of pedestrian counts


It might not look like it, but I have been super-busy lately working on data science and machine learning tech. I've been going on a bit of a vision quest trying to wrap my head around the whole thing. You know what -- I'm pretty lost. I've learned a lot of things, but I can also see how much deeper the rabbit-hole goes.

While that will bear fruit in time, I've decided to add a series of 'shorts' to the blog. Things which I can genuinely do more easily, and never mind if that risks being too simple to be of wider interest. The point here isn't to blaze a trail, but rather to keep up my exercise.

The plot above was generated by this code (link goes to a notebook).

The City of Melbourne provides quite fine-grained pedestrian count information for major locations in my home town -- see http://www.pedestrian.melbourne.vic.gov.au/. I really applaud this effort. I'm very excited about anything which reflects the physical world into the digital. This data updates in near-real-time as well, which is just wonderful.

Down the road I hope to use this to do some interesting prediction software, but for now I just want to explore the data. I'm also learning how to plot things.

Python has a number of libraries for this. My favourite in terms of API design is without doubt Seaborn, but it's slowwww. For speed, I recommend Bokeh, but I find it much clumsier to use. I'm also not a fan of its interactive javascript tools, because I think it's too easy to accidentally scroll away from the data entirely or otherwise misnavigate the chart. Please share your views on plotting, I'd really like to build up some more knowledge about the range of opinions on this tool.