Wednesday, 7 June 2017

London Data17 - Bethany Lyons - Data Prep for Time based analysis

Basically this is awesome scaffolding techniques

Data often captures events rather than the historical setting (ie you have a data point on Mon and Wednesday but you don’t have a data point on Tues - where were you? Probably where you were on Monday as you haven’t travelled)

Aim of the session is to determine whether a problem is solved by creating data or calculating new data. There is a trade-off between performance in Tableau vs time to create data

Scenario 1: A subscription company - how much money is earned each month?
Well the data is about when the subscription is sold. So taking one row and then multiplying it across multiple rows.
Solution is creating a cross database join against an Excel sheet. Create a file that has 24 rows (longest subscription) and then join on a ‘key field’ that is just the number 1.
Create calc that takes the month of the start date and uses DATEADD() of the duration (in months) of the contract. Create a filter calc [number of months] <= duration. Use this filter as an extract filter to limit the amount of data.

Comparison to future revenue: self-join the excel scaffold file again and this time look at when the revenue is earned compared to current month (watch the talk to see this technique)

Scenario 2: what’s the maximum number of employees we have at anyone time if we know when we hire and fire employees

You need to understand the status of each employee at anyone time. The data would be set up as one row per employee, hire date and fire date (three columns in total)

Solution: you need to look at this on a date basis. Cumulatively, look at how many have been hired by a certain date, how many were employed and when they were fired.
Pivot you data to have employee ID, event date and what the event was
This allows you to use running total to achieve this

If you want to look at turnover rate then you need to create a row per employee per date. Bethany use a record per day. Use a filter to look at the days between hire date and fire date. This method allows you to capture people who worked within the year at any point. (not everyone is fired on the first day of the year). So now you can look fired employees divided by countd of employees within that year

Scenario 3: Can’t use above option due to data size. Internet of Things creates too many rows. So need to know time the device was on / off or idle.

Solution: Calc seconds in full days and then look at seconds in the partial days


  1. Hi, Thanks for sharing these great techniques. How can we get the video of Bethany Lyons - Data Prep for Time based analysis?

  2. Hi Yalin, if you attended the conference, the material is made available to attendees but sadly if you didn't attend these won't be in the public domain unless Bethany writes a blog post.

  3. Thanks for the reply. I understand that. I just wanted to learn detail info about Scenario 2. I have an ongoing project similar to that.