Sunday 16 February 2014

Tableau Tip - The End of Time.... series based calculations

OK time for another little Tableau tip which might come in useful. This is one I now use all the time. The idea of using an index to hide time based values was first brought to my attention about 18 months ago by Tom Brown of The Information Lab (if I remember correctly - there is a chance it was Craig Bloodworth or Andy Cotgreave but I'm pretty sure it was Tom).

The situation is this: You've got a time series based table calculation going on that you are interested in showing, something like a running total, YTD total or difference from last month, however you don't want to show all the data - instead you only want to show the latest month.

Your starting position might look something like this below, with year, month, sum of sales, monthly difference in sales and running total.



Now lets say you only want to show the latest month, in this case December 2013. Well the first thing that springs to mind might be to filter, but that messes with the calculations PLUS its not going to automatically update to the latest month when the data updates:



Now the next idea you have might be to HIDE the 'non latest month' data, and that will solve the issue of the filters messing up the table calculations, but its still going to leave you with a problem when you get a new months worth of data you want to automatically show. So this is what you can do....

First create an index field:



now make this discrete


then pop the index field into your viz just after the 'month' pill



now comes the clever bit. To show just the latest month you want to hide everything else. To do this you need to make the Index value along side the latest month be equal to something unique and that will stay constant as the data updates, and then hide everything else. One way to do this is to sort year and month descending, and then highlight all the index fields that are not 1 and click hide. This can work fine for calculations of difference along time BUT does not work for running sums as it flips the year around to calculate from December running to January. So the safe bet is to take the following step:

Create another calculated field that looks like this



this creates a Boolean True/False variable that you can now add to your view and hide the Falses:



then unclick 'Show Header' for both 'index' and 'index match'



and you are left with the view you want, which will automatically update to the latest month as your data refreshes:



And then you can do what you want with that view, change the aliases, add it to a dashboard etc....


If anyone has an alternative approach to this I would love to hear what it is, as with so many things in Tableau there are probably multiple ways to achieve the same goal.



20 comments:

  1. Hi Peter,

    There are definitely multiple ways to go about this, here's the one that I use: A table calc with the formula LAST()==0, the Compute Using is set the Date, and then put on the Filters Shelf to filter for True.

    FIRST(), LAST(), and SIZE() are really useful table calcs. FIRST() provides the offest to the first address (aka row) in the partition, LAST() provides the offset to the last address in the partition, and SIZE() returns the number of addresses in the partition. LAST() returns 0 for the last address in the partition, and with Tableau's default date sort being ascending order, the last address will be the latest date, so LAST()==0 will only return True for the latest date.

    I use the Filters Shelf rather than Hide to make the viz more discoverable to other Tableau Desktop users. There's no visual indicator that anything's been hidden, but we can always see the pills on the Filters Shelf and interrogate them.

    Also, for views like this another thing that I do is set up a Relative Date filter on the date dimension in the view, to pull up as many dates as I need and a little bit more for insurance, and more importantly to *not* pull up dates that I don't need. This is because table calc filters like LAST()==0 are applied after the queries have been made to the data source, and Tableau is faster when it's not retrieving data we don't need. For example, I might have a data set that is updated monthly with three years of data in it, but have a view that shows month over month changes for the past 12 months. I'll set up a Relative Date filter for the last 14 months, 12+1 to make sure the month over month calculations are accurate, and an extra month just in case the data load is a bit late. This filter is processed as part of a WHERE clause in the data source, and then the LAST()==0 filter is only applied to the reduced data set.

    Jonathan

    ReplyDelete
    Replies
    1. Really useful post Pete and great comments as ever Jonathan. I flipped the LAST()==0 to hide incomplete months by adding it to filters and it will hide incomplete months. Again, there is probably an easier way to find this but really quick and simple method to stop a line chart of volumes over time not showing a massive dip when only part way through a month or week.

      Delete
    2. Would it be possible for you do walk through the necessary actions using the Last()==0 method like Peter has done to show how it works?

      I am a new user and would like to try this out.

      I am basically in this exact situation… “…I might have a data set that is updated monthly with three years of data in it, but have a view that shows month over month changes for the past 12 months. I'll set up a Relative Date filter for the last 14 months”

      Delete
    3. Hi,

      Hi, I'm just getting to this thread...here's a link to the KB article on relative date filters: http://downloads.tableausoftware.com/quickstart/feature-guides/relative_dates.pdf.

      And here's a quick implentation of a relative date filter combined with a LAST() filter on Tableau Public: https://public.tableausoftware.com/views/relativedateandlastfilter/AddLastFilter.

      The steps are:

      1) Set up your worksheet.
      2) Add the relative date filter on your date dimension, using the KB article. Validate that the results of the worksheet that you want to display are still unchanged.
      3) Create a calculated field with the formula LAST()==0.
      4) Drag that field onto the Rows Shelf, and filter for True. Assuming that your date dimension and your date dimension is the only dimension on Columns, Tableau's default Compute Using will work for you.***
      5) Validate that the result for that last date is correct, then drag the pill from Rows to the Filters Shelf, and Filter for True.

      *** If your view has a different arrangement of dimensions, then the Compute Using of the LAST()==0 pill will need to be set differently and exactly what that will be is dependent on that arrangement. You can experiment to find out what might work, or you can get more help by posting a packaged workbook with some sample data to the Tableau community http://community.tableausoftware.com/ or Tableau Talk http://tableautalk.com/.

      Cheers,

      Jonathan

      Delete
  2. Hi Jonathan, thank you for the comment. I really love seeing the different ways in which people have worked around problems, and your method requires one less step which is definitely useful! By the way to make mine flexible to choose other end dates I use separate date filter.

    I have a question for you about filters. The reason I used Hide was my (wrong) assumption that using a filter would muck up the table calcs, the same way that filtering on date would, or any other filter for that matter. So my question is, if LAST()==0 is FALSE for all but one date, and you filter on LAST()==0 =TRUE then how are the other date values still around to run the table calculations running sum, difference etc....?

    ReplyDelete
    Replies
    1. oh and filtering on the index match on my example also works. It confuses me why that keeps the calc in tact but if I used a filter like "if month([Order Date])=12 and year ([Order Date])=2013 then 1 else 0" end it would not.

      Delete
    2. The short answer is that not all filters are applied at the same time, so sometimes filtering will muck up table calcs and sometimes it won't, depending on what kind of filter you are using. Here's a rough order of operations:

      Drag a pill or do something to trigger a view update
      …Tableau builds the query(ies) to the data source(s), trying to do as much of the computation there as possible…
      Apply data source filters (added to the WHERE clause of every query in SQL data sources)
      Apply context filters (creates a temp table with the data needed)
      Apply Top & conditional filters on discrete dimensions (includes Top & Conditional Sets) (uses a sub-query that is joined to the results)
      Apply regular filters on dimensions, including general & wildcard filters for discrete dimensions & Sets (& Filter Actions), and range filters on continuous dimensions. (WHERE clause)
      Calculating results for calculated dimensions & aggregate measures, including computation of aggregate measures & calculated fields that don’t depend on data blends
      Apply filters on aggregates (that don’t depend on blends) (HAVING clause)
      ...data is returned to Tableau...
      Data blending, including lining up the domains of the linking dimension(s) which can have a "filtering" effect on the data.
      Densification, including domain padding and domain completion
      Computation of calculated fields that do depend on data blends
      Apply filters using calculated fields that do depend on data blends
      Computation of table calculations & R scripts
      Applying filters based on table calcs & R scripts
      (in a separate pipeline, computation of grand totals, including table calcs inside grand totals)
      (in yet another separate pipeline, forecasting)
      …lots more stuff for mapping, Hiding data, view layout, z-ordering of the displayed elements…
      Displaying the view

      [As of last count, there are over 100 separate steps in the Tableau computation & rendering pipeline. I've explored this pretty deeply for the book Joe and I are writing, and I'm amazed at how much Tableau can do in that split-second between us dropping a pill and the view being rendered).

      Using an IF statement like the one you described is a row-level calculation that will be computed in the data source, and therefore Tableau won't have the needed data to compute a result, and it fails. On the other hand, the table calculation filter is applied *after* the table calcs have been computed, so it returns an accurate result. For this reason, some folks call table calc filters "late" filters, because they are later in the pipeline. Does that make sense?

      With this knowledge, another alternative to have a fixed end date for filtering would be to do something like LOOKUP(DATETRUNC('month',MIN([Order Date])),0) == #2013-12-01#, where the date could be replaced with a parameter. LOOKUP() is a great calc for wrapping other results in a table calculation and then using those in a filter, one other nice feature of LOOKUP() is that Tableau will let us filter table calc discretes (table calc blue pills), while it doesn't let us filter on regular aggregate discretes (regular blue pills). The potential downside is that as a table calculation, LOOKUP() requires an aggregate or constant value as an argument and therefore is dependent on having the right level of granularity in the view.

      A couple of other useful posts on this general topic are:

      http://community.tableausoftware.com/message/139603#139603 - Joe's original post on the order of operations
      http://www.theinformationlab.co.uk/2013/08/12/cascading-quick-filters-in-tableau/ - A fantastic post by Andrew Ball about nesting filters

      Jonathan

      Delete
    3. thanks Jonathan, that was a seriously helpful answer and is going to come in useful for loads of things, and I really appreciate it! I am really looking forward to yours and Joe's book to help get to the bottom of things like this. Unfortunately I still come across things where I just have to use trial and error to get it to work as certain actions can seem inconsistent.

      I guess one of the disadvantages of Tableau's user interface is that it can be so easy on the surface you don't (or at least I don't!) think about things in the same structured and ordered way as I would writing SQL. This deeper level of understanding is definitely where I need to go next to become more efficient.

      Delete
    4. This technique is SO CLOSE to what I need to do! I know my last day of data in my data source, and I have a billion table calcs going on. One line (on a multiple-line-graph viz) is historical data, and it contains a full year's worth of data whereas current year's data runs to some date that is less than today(). (Oh, would that make things easier for me if it were always current!) If I try adding one more element to filtering such that I try to chop off historical data beyond this-year's last-date in the history's range, the calc to sum up the history data doesn't function properly, just as Jonathan noted might happen.

      If there was a way to use LOOKUP to display only data less-or-equal to the maxDate calc I create, I'd be there.

      Delete
    5. Joe,

      If you post something to the Tableau forums and ping me I'll talk a look, I'm quite sure we can get what you need.

      Delete
  3. Very helpful! I have a data set by date and only am looking for the end of the series - perfect timing!

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. This was great. I used for a running sum viz of new customers by day.

    Two comments:
    1) I was able to get a trailing 90 days with the slight index match formula tweal of [index]>=window_max([index])-90. And the 90 value here can be turned in to parameter.
    2) When I added index and index match pills to the worksheet, my viz was altered in a bad way. Luckily, I get the same desired filtering effect by just adding them to the DETAIL shelf

    ReplyDelete
    Replies
    1. Hi Ryan, I'm really pleased to hear that this helped you, and your 90 day tweak sounds like a nice twist on the idea. On point 2, I suspect this may be to do with where the index dimension pill was placed in relation to the other dimension pills on the shelf (i.e. before or after). But anyway, detail did the trick so all good!

      Delete
  6. Hi PeterG/Jonathan,

    Thanks for post and comments lot of leaning.

    I have quick a question i want to give date field in filter and when ever user select the month we should be able to view month value for selected month and as well as YTD for selected. I tried doing this but when filter the month table calculation on YTD is lost and end up with selected month value

    Thanks in Advance for any help.

    Regards
    Bhujang

    ReplyDelete
  7. May I ask if it would be possible to get a detailed explanation of applying this principle to a different type of data?

    For example, I would like to see the US Sales totals, and have the ability to filter it to a US state without the ability to select a US territory (Guam, Puerto Rico, etc), but to have the US territory sales remain in the US national totals. How could I do this?

    ReplyDelete
    Replies
    1. If you could include detailed step by step instructions, I would greatly appreciate it. Thank you in advance for your assistance.

      Delete
  8. I've actually been working on a blog post on this very topic, I'll have it up later this week and will link to it.

    ReplyDelete
  9. Hi! is there a way to show last 12 month only in a moving 24 month dara base?
    Thanks!

    ReplyDelete

Note: only a member of this blog may post a comment.