Wednesday, 7 June 2017

London Data17 - Bethany Lyons - Data Prep for Time based analysis

Basically this is awesome scaffolding techniques

Data often captures events rather than the historical setting (ie you have a data point on Mon and Wednesday but you don’t have a data point on Tues - where were you? Probably where you were on Monday as you haven’t travelled)

Aim of the session is to determine whether a problem is solved by creating data or calculating new data. There is a trade-off between performance in Tableau vs time to create data

Scenario 1: A subscription company - how much money is earned each month?
Well the data is about when the subscription is sold. So taking one row and then multiplying it across multiple rows.
Solution is creating a cross database join against an Excel sheet. Create a file that has 24 rows (longest subscription) and then join on a ‘key field’ that is just the number 1.
Create calc that takes the month of the start date and uses DATEADD() of the duration (in months) of the contract. Create a filter calc [number of months] <= duration. Use this filter as an extract filter to limit the amount of data.

Comparison to future revenue: self-join the excel scaffold file again and this time look at when the revenue is earned compared to current month (watch the talk to see this technique)

Scenario 2: what’s the maximum number of employees we have at anyone time if we know when we hire and fire employees

You need to understand the status of each employee at anyone time. The data would be set up as one row per employee, hire date and fire date (three columns in total)

Solution: you need to look at this on a date basis. Cumulatively, look at how many have been hired by a certain date, how many were employed and when they were fired.
Pivot you data to have employee ID, event date and what the event was
This allows you to use running total to achieve this

If you want to look at turnover rate then you need to create a row per employee per date. Bethany use a record per day. Use a filter to look at the days between hire date and fire date. This method allows you to capture people who worked within the year at any point. (not everyone is fired on the first day of the year). So now you can look fired employees divided by countd of employees within that year

Scenario 3: Can’t use above option due to data size. Internet of Things creates too many rows. So need to know time the device was on / off or idle.


Solution: Calc seconds in full days and then look at seconds in the partial days

Tuesday, 6 June 2017

What’s Next for Tableau - Andrew Beers


“This next hour isn’t about me, it’s all about you”

Three principles and how Tableau is answering them:

  1. People who know the data should ask the questions
Profile: bring domain knowledge rather than the SQL-coding skills to get to the data to answer the questions

2. Software should be designed for deeper thinking
Profile: Not simplistic products, simple ways to do complex things
After the first five minutes of simplicity there has to be more [complexity]...to let you have the impact you want to have

3. Analytics at Scale can drive change
Profile: Analytics at scale can solve the big problems in the world

Last 12 months in the Tableau world has gone from 10.0 to 10.3
Half of the 130 improvements came from ideas on the Ideas forum
Loving that Tableau is on a ‘quarterly’ cadence

So how does Tableau help us achieve this?

Connect
Data Prep: “If you can’t connect to the data then you can’t analyse it”
So need to connect to all data (pdf, cloud files, Eloqua, Service Now etc)
Hybrid engine: Fast Live & In-memory, Cross DB Federation
Data Prep: Self Service, export to csv

PDF - shows all the tables in the pdf when loaded into Tableau. Can pivot
JSON - Tableau detects the structure. When connection to JSON, if you have multiple rows then Tableau will create the LoD measures removing the duplication for you

Answer More Questions
More ways to change lines (regular, stepped or jump lines)
Tooltip selections: Using dimensions in the tooltip to enable highlighting other other marks that allow the common elements to be highlighted.
Dashboard element sizing - ‘Distribute Evenly’
Python / R integration - use SCRIPT_REAL function

Collaboration at Scale
Sharing / Governance
Discussion
Altering
Anywhere Web & Mobile
Tableau is now watching how the whole community is using the common joins done to ‘recommend’ what tables to join on
Alerting - can schedule regularity of alerts

Big changes coming soon:
Linux
Hyper
Maestro

Linux: cost saving / ease of management / technical skills of IT / more security control
Hyper: support for very large data sizes (billions of rows), faster queries, much faster extract creation, scalable (mid summer public beta)
Maestro: Complete visibility for immediate understanding (coordinated views aid understanding)



Saturday, 25 March 2017

Iron Viz - We can't breathe

It’s that time again with IronViz feeder competitions starting up again. Every year, I take this opportunity to explore a technique I haven’t used before, or alternatively, a subject that I care about.

As a keen cyclist in London, Air Quality is a really important subject to me. With increased cycling infrastructure, the move to electronic cars and banning older vehicles from the city centre, I was expecting to download a set of data that showed dramatic falls in the levels of pollutants that are all around us Londoners. Nothing could be further from the truth.



Using data from londonair.org.uk, I looked at the Borough level of detail as there are only three or four monitoring points per Borough. The data proved a challenge with sporadic measurement across an inconstant set of pollutants. This meant that finding trends and understanding geographical patterns using the new Spatial file connector in Tableau was a good challenge.

The biggest break-through I had was whilst researching the subject, was coming across the European legal limits that have come in to force across different pollutants, at various points since 2005. This is shown as the reference line on the trend data to show how often the limits are broken across London.

The answer in many cases is the daily average often doesn’t fall below some of the limits. Worrying stuff!

The spatial mapping allows the reader to see the impact on the city centre. Although Lambeth, outside of the true centre of the City, seems to be hit particularly hard by pollution levels.

As with any IronViz entry, it’s a chance to use a little imagination and have some fun with charting. When I played with some charts, it looked like exhaust fumes so, I kept it and features it at the top of my visualisation.

Overall, the visualisation has led me to want to dig in further in to this subject and what I can do to start making a difference as currently there isn’t any improvement of any note.