3 Peaks 2014 – as told by charts

(To see the charts clearly, please click to expand)

3 Peaks 2014 was held over the weekend, in much more pleasant weather than last year’s scorching heat.

Here are some charts that show a bit about this year’s ride, and also how it compared to last year’s.

This year’s 235km ride attracted 1922 entrants, up from 1543 last year. This chart shows that, despite the larger field this year, there were a lot fewer DNFs in 2014. The weather no doubt played a role in this. A total of 1639 people finished.

finishers1
In percentage terms you can see that the completion rate was about 10% higher this year.
finishers2
Broken down by gender you can see that women who entered had a slightly higher completion rate than men. Men were more likely to DNS and women were more likely to DNF.
finishers3

On to finishing times: Here’s a chart showing finish times by 30 minute intervals. Six absolute weapons managed to crack the 8 hour mark and 75 cracked the 9 hour mark.

times1

And the accompanying density plot”
times2

Times by Gender (stacked histogram) – You can see here that a lot more men participated than women.

times3

times4

The (ecdf) chart below shows the cumulative distribution of times by gender.

times5

And on to the coveted average speed section. A total of four people broke the holy grail of 30km/h (including stops!). I was lucky enough to suck one of these gents’ wheels in the Sydney Rapha Gentlemen’s Race last year – I cannot imagine how painful attempting that would have been on Sunday.

times5

The ride results contain data on how long it took to do the three monster climbs. I subtracted the time spent climbing the three major climbs from the total time and called this the time ‘descending’ (optimistic, I know). While there is a positive relationship between time spent climbing and ‘descending’, the dispersion of the points shows the enormous amount of variation.

prop1

If you divide the time spent climbing by the time ‘descending’ you can see this variation just presented in a different way.

prop21

The following charts investigates how a rider’s time up Tawonga Gap relates to their overall performance. Nobody who took more than 30 minutes to get up Tawonga Gap broke 9 hours and nobody who took more than 35 minutes broke 10 hours. Events like this are all about finding the right bunch to be a part of, so you have to make sure you don’t miss the bus on the first climb.

predict1

predict2

If we focus on just the fastest people up Tawonga Gap (those who do it sub 28 mins) you can see that the top finishers at the end of the day weren’t the fastest up TG. They all rode together while a few others went up the road (and potentially blew up). EDIT: Andy’s comment tells us nobody went up the road from the front bunch on TG so the  few posting faster times must have been slower on the Falls descent and were trying to catch the lead group or they started in a later wave.

predict31

Now, let’s have a look how people who participated last year and came back to face the challenge again this year.

The following two charts show how last year’s performance is related to this year’s. The first chart shows proportional ranking and the second shows a rider’s placing (there were a lot more entrants this year, so a given place is better (proportionally) in 2014).

dual1

Three Peaks 2014

This chart compares performance up Tawonga Gap in 2013 vs 2014. On average people improved over the year but the effect size is small.

dual3

70 people who did not finish in 2013 came back to tackle the event again in 2014. 9 of these 70 didn’t show up on Sunday but the good news story is that 52 of the 70 finished. The chart below groups people into their race  ‘status’ (finished, DNF, DNS) from last year to see what happened to them this year. Those that finished last year were very likely to finish again this year. Those that DNS-ed last year were more likely to DNS in 2014.

dual4

Three peaks is a massive event, drawing people from all over Australia.

The following chart shows which state participants come from. Being held where it is, the biggest representation obviously comes from Victorians but A LOT of people come from NSW for this event, most from Sydney as we’ll see in a second. I know that 20+ come down from my old club  alone.

travel1

Using suburb names I could determine how far people travelled (as the crow flies) to get to 3 peaks. In short, people travelled a long way to put themselves through hell. The four distinct lumps in the following density plots show people travelling from Melbourne, Sydney, Brisbane and WA respectively.

travel1

Reading OECD.Stat into R

EDIT (17-8-14): I no longer use the XML2R package to pull SDMX data. There is a new package called RSDMX which makes the task described below a lot easier. You can find it and some examples here – https://github.com/opensdmx/rsdmx 

OECD.Stat is a commonly used statistics portal in the research world but there are no easy ways (that I know of) to query it straight from R. There are two main benefits of querying OECD.Stat straight from R:

1. Create reproducible analysis (something that is easily lost if you have to download excel files)

2. Make tweaks to analysis easily

There are three main ways I could see to collect data from OECD.Stat

  1. Find the unique name for each dataset and scrape the html from the dataset’s landing page (e.g. the unique URL for pension operating expenses is http://stats.oecd.org/Index.aspx?DataSetCode=PNNI_NEW). This probably would have been the easiest way to scrape the data but it doesn’t offer the flexibility that the other two options do.
  2. Use the OECD Open Data API . This was the avenue I explored initially but it doesn’t seem that the API functionality is fully built yet.
  3. Use the SDMX query provided under the export tab on the OECD.Stat site. This URL query can be easily edited to change the selected countries, observation range and even datasets.

I went with option 3, using the SDMX query.

To get the query that you need to use in R, navigate to your dataset and click

export -> SDMX (XML) (as per picture below)

sdmx2

Then, copy everything in the ‘SDMX DATA URL’ box

Query

In the example below I am using the trade union density dataset.

Getting the SDMX URL as described above for the trade union dataset gives us a very long URL as it contains a lot countries. I cut it down in this example for clarity to:

http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/UN_DEN/AUS+CAN+FRA+DEU+NZL+GBR+USA+OECD/OECD?startTime=1960&endTime=2012

The important parts of the URL above are

  1. UN_DEN – This is the Trade Union Density dataset code
  2. AUS+CAN+FRA+DEU+NZL+GBR+USA+OECD – Unsurprisingly, this is the list of countries we are querying, you can delete countries or if you know the ISO country codes you can add to it.
  3. startTime=1960&endTime=2012 – Change the date range as you please.

Note that many datasets have a lot more options on offer so there is usually a bunch more junk after the dataset code in the URL.

The following code R code creates a melted data frame, ready for use in analysis or ggplot or rCharts. I make use of Carson Sievert’s XML2R package. All you need to do is paste your own SDMX URL into the relevant spot.

library(XML2R)

file <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/UN_DEN/AUS+CAN+FRA+DEU+NZL+GBR+USA+OECD/OECD?startTime=1960&endTime=2012"

obs <- XML2Obs(file)
tables <- collapse_obs(obs)

# The data we care about is stored in the following three nodes
# We only care about the country variable in the keys node
keys <- tables[["MessageGroup//DataSet//Series//SeriesKey//Value"]]
dates <- tables[["MessageGroup//DataSet//Series//Obs//Time"]]
values <- tables[["MessageGroup//DataSet//Series//Obs//ObsValue"]]

# Extract the country part of the keys table
# Have to use both COU and COUNTRY as OECD don't use a standard name
country_list <- keys[keys[,1]== "COU" | keys[,1]== "COUNTRY"]
# The country names are stored in the middle third of the above list
country_list <- country_list[(length(country_list)*1/3+1):(length(country_list)*2/3)]

# Bind the existing date and value vectors
dat <- cbind.data.frame(as.numeric(dates[,1]),as.numeric(values[,1]))
colnames(dat) <- c('date', 'value')

# Add the country variable
# This code maps a new country each time the diff(dat$date)<=0 ...
# ...as there are a different number of readings for each country
# This is not particularly robust
dat$country <- c(country_list[1], country_list[cumsum(diff(dat$date) <= 0) + 1])
#created this as too many sig figs make the rChart ugly
dat$value2 <- signif(dat$value,2)

head(dat)

This should create a data frame for nearly all annualised OECD.Stat data. You will need to do some work with the dates if you want to use more frequently reported data (e.g. quarterly, monthly data).

Once the data is set up like this it is very easy to visualise.

library(ggplot2)
ggplot(dat) + geom_line(aes(date,value,colour=country))

Rplot

Or just as easy but a bit cooler, use rCharts to make it interactive

# library(devtools)
# install_github('rCharts', 'ramnathv', ref = 'dev')
library(rCharts)

n1 <- nPlot(value2 ~ date, group = "country", data = dat, type = "lineChart")
n1$chart(forceY = c(0))

#To publish to a gist on github use this code, it will produce the url for viewing
#(you need a github account)
n1$publish('Union Density over time', host = 'gist')

Link to the interactive is here (or click on the image below)

Union Density