Wednesday, 25 May 2016

ReViz - Tornadoes!

I recently joined the wonderful ReViz project with Matt Chambers and Alex Duke, trying to fill the very large shoes left by Nelson Davis.

If you're not familiar with the ReViz project I encourage you to check out the page and see the work that's been done so far.

Matt recently kicked off a new ReViz by taking the Torndao dataset used in the 2012 Iron Viz contest,  which was won by Anya A'Hearn and finding a new story which showed the five deadliest single days of Tornado outbreaks recorded in the USA.

When I got a hold of the data, I have to be honest I found it really hard to come up with another engaging story within the dataset that hadn't yet been covered. So instead I've created something that I hope is more of a striking visual. Taking inspiration from the famous US wind map (see below) I wanted to show visually the volume and power of the tornados and how they are clustered around specific parts of the country.

Wind Map:



I also wanted to show both just how common tornadoes are but also the degree to which fatalities from tornadoes come from a small number of very big instances. I also included some light interaction to let people zoom into their state, or to look at how things change over time.

Unfortunately the resulting map is pretty slow to load, and so to mitigate that a bit I only included the most recent 10 years of data. The result is below:




Thursday, 12 May 2016

The Usefulness of Unions in Tableau

Recently in some of my projects at work I've been making use of creating data unions before I bring the data into Tableau for ease of use and efficiency. This topic also came up at a recent NY Tableau User Group and seems to be a popular approach, so I thought I'd write up a quick how to on using 'unioned' data to your benefit in Tableau, with a couple of examples.

Some of this will become less relevant when Tableau introduces cross database joins, and cross data source filters. But none the less it might be helpful for a while longer. It can help a lot getting around issues you may have with data blending, or with one to many joins or with avoiding having to resort to Parameters (which don't update with the data).

Example 1 - Dealing with data at different levels of aggregation

Let's say you have two lots of sales data

A. Sales per day for the last 6 weeks
B. Sales per month for the last 3 years

And you want to build a dashboard that allows you to switch between the two and/or you want to see data at both levels but have quick filters apply to both sets of data. Well the answer is easy - union that data!


So take the data from this:

A.

B.



(note American date format mm/dd/yy)

To something like this:





And then you can use 'Date Type' as a filter that lets you switch between the two:



And if you use 'exact date' as your date field, you can easily switch between Daily and Monthly views in a time series chart:






The main benefit to this is flexibility to the user, but also as you are using a single data source you don't need to worry about blends or joins, and you know that any filters will work with all charts.

Another benefit is that this can be a very efficient way of working when you are dealing with very large databases. Joins can be costly from a performance point of view, and blends can be both costly and cause functionality constraints, so using a single table with a filter can speed things up.


Example 2 - Dealing with data at different stages or that has some shared fields

In this case imagine you have two sets of data for application for, and sales of, mortgages. The data shares some fields.

A. Application data
B. Sales data

A.

B.


Again, there are different ways you could deal with this data to create a dashboard, or series of dashboards. But again using blending or joins might cause some difficulty with data source filters, or if you end up with one to many joins. So a possible approach again is to Union the data and use similar fields together, and just leave blank values that don't apply to that part of the data. The resulting data might look something like this:



And again, you can use your Type field to easily switch between views, or to show both views on the same dashboard or view with a single filter for Product that doesn't rely on (undynamic) Parameters.






And that's it! You can easily extend the logic here to build out more complex data sets for different scenarios. This approach can also help with performance against 'big' data, especially if you make you Type field numeric and use it as a sorted index in your database.

Just be careful to make sure you know how you are using the Type, or similar, field in each view so that you don't double count.