Wednesday, 7 June 2017

London Data17 - Bethany Lyons - Data Prep for Time based analysis

Basically this is awesome scaffolding techniques

Data often captures events rather than the historical setting (ie you have a data point on Mon and Wednesday but you don’t have a data point on Tues - where were you? Probably where you were on Monday as you haven’t travelled)

Aim of the session is to determine whether a problem is solved by creating data or calculating new data. There is a trade-off between performance in Tableau vs time to create data

Scenario 1: A subscription company - how much money is earned each month?
Well the data is about when the subscription is sold. So taking one row and then multiplying it across multiple rows.
Solution is creating a cross database join against an Excel sheet. Create a file that has 24 rows (longest subscription) and then join on a ‘key field’ that is just the number 1.
Create calc that takes the month of the start date and uses DATEADD() of the duration (in months) of the contract. Create a filter calc [number of months] <= duration. Use this filter as an extract filter to limit the amount of data.

Comparison to future revenue: self-join the excel scaffold file again and this time look at when the revenue is earned compared to current month (watch the talk to see this technique)

Scenario 2: what’s the maximum number of employees we have at anyone time if we know when we hire and fire employees

You need to understand the status of each employee at anyone time. The data would be set up as one row per employee, hire date and fire date (three columns in total)

Solution: you need to look at this on a date basis. Cumulatively, look at how many have been hired by a certain date, how many were employed and when they were fired.
Pivot you data to have employee ID, event date and what the event was
This allows you to use running total to achieve this

If you want to look at turnover rate then you need to create a row per employee per date. Bethany use a record per day. Use a filter to look at the days between hire date and fire date. This method allows you to capture people who worked within the year at any point. (not everyone is fired on the first day of the year). So now you can look fired employees divided by countd of employees within that year

Scenario 3: Can’t use above option due to data size. Internet of Things creates too many rows. So need to know time the device was on / off or idle.

Solution: Calc seconds in full days and then look at seconds in the partial days

Tuesday, 6 June 2017

What’s Next for Tableau - Andrew Beers

“This next hour isn’t about me, it’s all about you”

Three principles and how Tableau is answering them:

  1. People who know the data should ask the questions
Profile: bring domain knowledge rather than the SQL-coding skills to get to the data to answer the questions

2. Software should be designed for deeper thinking
Profile: Not simplistic products, simple ways to do complex things
After the first five minutes of simplicity there has to be more [complexity] let you have the impact you want to have

3. Analytics at Scale can drive change
Profile: Analytics at scale can solve the big problems in the world

Last 12 months in the Tableau world has gone from 10.0 to 10.3
Half of the 130 improvements came from ideas on the Ideas forum
Loving that Tableau is on a ‘quarterly’ cadence

So how does Tableau help us achieve this?

Data Prep: “If you can’t connect to the data then you can’t analyse it”
So need to connect to all data (pdf, cloud files, Eloqua, Service Now etc)
Hybrid engine: Fast Live & In-memory, Cross DB Federation
Data Prep: Self Service, export to csv

PDF - shows all the tables in the pdf when loaded into Tableau. Can pivot
JSON - Tableau detects the structure. When connection to JSON, if you have multiple rows then Tableau will create the LoD measures removing the duplication for you

Answer More Questions
More ways to change lines (regular, stepped or jump lines)
Tooltip selections: Using dimensions in the tooltip to enable highlighting other other marks that allow the common elements to be highlighted.
Dashboard element sizing - ‘Distribute Evenly’
Python / R integration - use SCRIPT_REAL function

Collaboration at Scale
Sharing / Governance
Anywhere Web & Mobile
Tableau is now watching how the whole community is using the common joins done to ‘recommend’ what tables to join on
Alerting - can schedule regularity of alerts

Big changes coming soon:

Linux: cost saving / ease of management / technical skills of IT / more security control
Hyper: support for very large data sizes (billions of rows), faster queries, much faster extract creation, scalable (mid summer public beta)
Maestro: Complete visibility for immediate understanding (coordinated views aid understanding)

Saturday, 25 March 2017

Iron Viz - We can't breathe

It’s that time again with IronViz feeder competitions starting up again. Every year, I take this opportunity to explore a technique I haven’t used before, or alternatively, a subject that I care about.

As a keen cyclist in London, Air Quality is a really important subject to me. With increased cycling infrastructure, the move to electronic cars and banning older vehicles from the city centre, I was expecting to download a set of data that showed dramatic falls in the levels of pollutants that are all around us Londoners. Nothing could be further from the truth.

Using data from, I looked at the Borough level of detail as there are only three or four monitoring points per Borough. The data proved a challenge with sporadic measurement across an inconstant set of pollutants. This meant that finding trends and understanding geographical patterns using the new Spatial file connector in Tableau was a good challenge.

The biggest break-through I had was whilst researching the subject, was coming across the European legal limits that have come in to force across different pollutants, at various points since 2005. This is shown as the reference line on the trend data to show how often the limits are broken across London.

The answer in many cases is the daily average often doesn’t fall below some of the limits. Worrying stuff!

The spatial mapping allows the reader to see the impact on the city centre. Although Lambeth, outside of the true centre of the City, seems to be hit particularly hard by pollution levels.

As with any IronViz entry, it’s a chance to use a little imagination and have some fun with charting. When I played with some charts, it looked like exhaust fumes so, I kept it and features it at the top of my visualisation.

Overall, the visualisation has led me to want to dig in further in to this subject and what I can do to start making a difference as currently there isn’t any improvement of any note.

Thursday, 10 November 2016

Data16 - Keep it simple stupid by Chris Love

Keep it simple stupid

“Our world is unbounded by complexity”. Tolomy was living at a time of rapid development and built up an enourmous amount of data. He looked at the data and he mapped planetary movements. But he had the earth at the centre of the movement. It made sense.

The helio-centric model made more sense but faced more significant challenges due to the challenges to the culture this theory made.

Made originally but an engineer captain Sankey to look at steam power and energy loss throughout the process of a steam engine.
Minard’s Napolean’s march into Russia is probably the first that is largely used and most infamous

Sankeys are tough to build in Tableau. There are a lot of steps and a lot of techniques. Chris uses the example of Pablo’s Spanish migration and how a basic set of small multiple histogram show up the trends a lot more clearly

It's not just Pablo. A lot of people have used and built a lot of sankeys. It's not wrong but is it the best way to achieve what you are trying to achieve.

Chris highlights Joe Radburn as he is looking at complex subjects but is visualising them simply. This is arguable just a challenging skill to master.

Kuhn - new discoveries are only made when you don't have a preconception as to what the answer is and the best tool to achieve that preconception. Only exploration will lead to discovery.

Data sources should allow people to explore for themselves, or dashboards to let them answer their curiosity, story points are heavily guided.

Design for mobile is making us think more simply about the message we are trying to convey.

Wednesday, 9 November 2016

Pimp my Viz: Tokyo Drift - Jewel Loree

Jewel Loree

Taking crazy tips on how to create cool dashboards

Viz 1: Jewel’s Pokemon Go viz
For device designer, text doesn't resize but images do so take images of your titles
Helper buttons are great and useful - create a basic view of just a shape and the customise the tooltip
Jewel’s way to build unit charts is great - she's going to post the calcs on her blog and I will add a photo later [add photo here - remind me if I don’t]
Bring custom shapes in in the same order as the dimension so you can just click assign palette to save having to allocate them all individually
Mapbox used to create a similar map style to match Pokemon Go’s maps
To have different vizzes on your device specific dashboards, then just have the floating element just catching the edge of the dash (please make this easier tableau)

Viz 2: Data Breaches by Marc Schonwadt
There is a lot of custom formatting that you can do with text.
Check out the layout tab to see how someone has built their viz
Hiding index menus using layout containers (set up the hidden element as sitting at negative whatever the width of the layout container.
Jewel just copied across the dashboard which pulled in Robert Rouse’s technique dash to take his helper data with it. Make sure you don’t fit the width as it no longer collapse.

Everyday Pimping
Use custom colours
Do more with the Marks Card
Create custom headers
Always fix your tooltips
Thoughtful interactions

Putting a label on the bar and take away the header. Put the rows on the label and align the text to the left. Then fatten your bars.
Sort your bubble charts but sorting your dimension and sort on what is setting the size
Create custom header called Canva

Data16 - Developers on Stage

Automatic drill - level of detail goes deeper as you zoom in
Map scaling added in to the maps
Spatial file connector coming in 10.2 beta
Python integration - using python scripts in calculated fields
Tooltip selection - click on the categorical fields in the tooltip to highlight by that selections
Date filters - filter to latest date rather than being stuck on what you originally published as.
Step and Jump lines - squarer trend lines
Advanced conditional formatting

Dashboards and Stories
Distribute evenly to space out your objects on a dashboard
Can add margin around all objects on your dashboard. No more blanks to separate your charts
Expressive text editor - add images and URLs in to text objects (including tooltips)
Web authoring has more right click functionality than before
Story points on the web
Full screen viz on the web

Direct linking from subscription emails and condition warnings
Smooth tooltips for mobile. Also easier selection of small marks in a movie cuz. Like selecting where to enter text on a phone when holding down your finger.
Commenting on the go (not just in Desktop and Server)
Offline interactivity

Joins on calculated fields
Database unions
New data connectors: pdfs, JSON

Server client library makes it easier to write simple scripts
RET API: JSON and CORS support
GetData() 3rd Party Charting libraries
Mobile App bootstrap - on GitHub now

Tuesday, 8 November 2016

Cross Database Joins - Bethany Lyons and Alex Ross

Bethany Lyons and Alex Ross

The unexpected solution to many tough analytics problems

Bethany has looked forward to delivering this session for a long time as CDJ (Cross Database Joins) can be used to solve so much

Often most analysts have read only access so don't have the chance to create data
Identify (create vs deriving)
Understand (complexity and volume vs performance)
Use - focus on CDJs as the solution

Example 1
Taking a 12 month subscription from just one row and spreading it across the 12 months (on a monthly basis) to show the monthly revenue. Use a simple excel scaffold of month number and a key of 1 to create the product join.

To create the filter of finding when the contracts will earn revenue up to, Bethany used a Boolean filter calf but then added it to the data source filters to cut down on the processing done locally on your machine.

V10.2 adding joins based on calcs

Example 2

Counting staff employed at any single point in time - use missing values to fill the gaps on the table calcs

A scaffold of all dates is needed though and the just return those that are after hire and pre-termination date.

What if the scaffold creates huge amounts of rows?
I.e. If you have seconds a tool is active, you can scaffold on a day level and the create a calc that then counts a full day’s worth of seconds (86,400) but if a partial day then use a datediff() calc