Saturday, 9 January 2016

BALLCODE 2.0





UPDATE Jan 30 2016: The Kimono Labs scrape stopped working and Kimono provided no support. So I've had to ditch that, and my buddy Andreas helped me out with a Python script for pulling down the data. Thanks Andreas!

If you are a regular and long time reader of my blog (there must be at least 3 of you out there) you may remember a viz I made a few years a go called BALLCODE. It looked like the picture below and was a summary of the whole NBA 2013/14 season. At the time I thought I had come up with an original concept, but it turned out that wasn't the case. None the less I was very proud of this viz and it was even awarded 'most beautiful' tip from Tableau Public.


It also proved to be a very popular viz among other Tableau Public authors and has been the inspiration of a few great vizzes out there (and the lovely authors were kind enough to reference me). See in order of when they came out work by Chris Jones, Matt Chambers and Craig Wortman.


I'm very flattered that these guys enjoyed my viz and wanted to experiment with using the idea and take it in different directions and for different sports. If there's any other examples out there, please let me know.

Now the MLB and NFL are all very good, but they aren't a real sport like basketball, so I wanted to do my Ballcode viz again. In fact I intended to do it again at the end of last season but forgot. And now with the Golden State Warriors tearing up the league in an historic way, I didn't want to wait to the end of the season to start showing their streak.

So..... for Ballcode 2.0 I am going to be updating the viz as the season goes on. BUT there is quite a bit of data manipulation required and I didn't want to be copying and pasting data every week and pratting about in Excel, so I decided to create an automated data prep process. To do this I finally pulled my finger out and had a proper go with Alteryx.

The first thing I needed to do was to set up a connection to grab my data from the web. To do this I found the following very helpful - this blog post from Chris Love and this video from Alteryx.

I first used Kimono Labs to build an API that scrapes the data from basketballreference.com and turns it into a CSV. This was pretty easy to do, and is the first time I've actually managed to make a web scraper do what I want in a reasonable amount of time (but it is a very simple task). It looks like this when in action:



 I then built out my Alteryx module to parse the data, tidy it and transform it to get two rows of data per game, one from the perspective of the winning team and one from the perspective of the losing team. Here's a screenshot of my module. I'm sure Alteryx experts will probably laugh at it, but I'm pretty chuffed. (Aside - I'm ill today and this is basically keeping me occupied while I sit on the sofa drinking orange juice).

It spits out a TDE, so hopefully I should have no trouble running this every week and keeping the viz up to date. We'll see how that goes.

This time I also went to the effort of getting the official colours for each team from this great website, and created a new color palette.


BTW - I'm pretty confident in GSW breaking 72 wins.



No comments:

Post a Comment

Note: only a member of this blog may post a comment.