RStudio shortcuts (Windows) – for cleaner and faster coding

RStudio has a number of keyboard shortcuts that make for cleaner and faster coding. I put all the Windows shortcuts that I use onto a single page so that I can pin them next to my computer.

RStudio shortcuts

(PDF)

You can also access the list of shortcuts with Shift + Alt + K so the sheet may be redundant to many. However, I found that having a physical copy next to my computer helped a lot while I was still learning the shortcuts.

Some favourites of mine are:

Using code sections/chunks – Use Ctrl+Shift+R to insert a code section and a popup box will appear for you to name that section. Ctrl+Alt+T runs the current code section. When you are done working on a code section you can ‘fold’ it up to improve the readability of your file (Alt+L is fold current code section, Alt+O is fold all sections).

Re-running code quickly – Ctrl + Shift + P will execute the same region of code that was just previously run with the changes made since then.

Deleting/moving stuff faster – Ctrl+D deletes an entire line. Ctrl + backspace deletes the current word as in most word processing software. Alt + up/down moves code up and down lines in the console while Shift+Alt+up/down copies lines up/down.

Switch between plots – To toggle between plots use Ctrl+Shift+PgUp/PgDn (It’s a lot faster than using the arrows above the plots!)

Reading OECD.Stat into R

EDIT (17-8-14): I no longer use the XML2R package to pull SDMX data. There is a new package called RSDMX which makes the task described below a lot easier. You can find it and some examples here – https://github.com/opensdmx/rsdmx 

OECD.Stat is a commonly used statistics portal in the research world but there are no easy ways (that I know of) to query it straight from R. There are two main benefits of querying OECD.Stat straight from R:

1. Create reproducible analysis (something that is easily lost if you have to download excel files)

2. Make tweaks to analysis easily

There are three main ways I could see to collect data from OECD.Stat

  1. Find the unique name for each dataset and scrape the html from the dataset’s landing page (e.g. the unique URL for pension operating expenses is http://stats.oecd.org/Index.aspx?DataSetCode=PNNI_NEW). This probably would have been the easiest way to scrape the data but it doesn’t offer the flexibility that the other two options do.
  2. Use the OECD Open Data API . This was the avenue I explored initially but it doesn’t seem that the API functionality is fully built yet.
  3. Use the SDMX query provided under the export tab on the OECD.Stat site. This URL query can be easily edited to change the selected countries, observation range and even datasets.

I went with option 3, using the SDMX query.

To get the query that you need to use in R, navigate to your dataset and click

export -> SDMX (XML) (as per picture below)

sdmx2

Then, copy everything in the ‘SDMX DATA URL’ box

Query

In the example below I am using the trade union density dataset.

Getting the SDMX URL as described above for the trade union dataset gives us a very long URL as it contains a lot countries. I cut it down in this example for clarity to:

http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/UN_DEN/AUS+CAN+FRA+DEU+NZL+GBR+USA+OECD/OECD?startTime=1960&endTime=2012

The important parts of the URL above are

  1. UN_DEN – This is the Trade Union Density dataset code
  2. AUS+CAN+FRA+DEU+NZL+GBR+USA+OECD – Unsurprisingly, this is the list of countries we are querying, you can delete countries or if you know the ISO country codes you can add to it.
  3. startTime=1960&endTime=2012 – Change the date range as you please.

Note that many datasets have a lot more options on offer so there is usually a bunch more junk after the dataset code in the URL.

The following code R code creates a melted data frame, ready for use in analysis or ggplot or rCharts. I make use of Carson Sievert’s XML2R package. All you need to do is paste your own SDMX URL into the relevant spot.

library(XML2R)

file <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/UN_DEN/AUS+CAN+FRA+DEU+NZL+GBR+USA+OECD/OECD?startTime=1960&endTime=2012"

obs <- XML2Obs(file)
tables <- collapse_obs(obs)

# The data we care about is stored in the following three nodes
# We only care about the country variable in the keys node
keys <- tables[["MessageGroup//DataSet//Series//SeriesKey//Value"]]
dates <- tables[["MessageGroup//DataSet//Series//Obs//Time"]]
values <- tables[["MessageGroup//DataSet//Series//Obs//ObsValue"]]

# Extract the country part of the keys table
# Have to use both COU and COUNTRY as OECD don't use a standard name
country_list <- keys[keys[,1]== "COU" | keys[,1]== "COUNTRY"]
# The country names are stored in the middle third of the above list
country_list <- country_list[(length(country_list)*1/3+1):(length(country_list)*2/3)]

# Bind the existing date and value vectors
dat <- cbind.data.frame(as.numeric(dates[,1]),as.numeric(values[,1]))
colnames(dat) <- c('date', 'value')

# Add the country variable
# This code maps a new country each time the diff(dat$date)<=0 ...
# ...as there are a different number of readings for each country
# This is not particularly robust
dat$country <- c(country_list[1], country_list[cumsum(diff(dat$date) <= 0) + 1])
#created this as too many sig figs make the rChart ugly
dat$value2 <- signif(dat$value,2)

head(dat)

This should create a data frame for nearly all annualised OECD.Stat data. You will need to do some work with the dates if you want to use more frequently reported data (e.g. quarterly, monthly data).

Once the data is set up like this it is very easy to visualise.

library(ggplot2)
ggplot(dat) + geom_line(aes(date,value,colour=country))

Rplot

Or just as easy but a bit cooler, use rCharts to make it interactive

# library(devtools)
# install_github('rCharts', 'ramnathv', ref = 'dev')
library(rCharts)

n1 <- nPlot(value2 ~ date, group = "country", data = dat, type = "lineChart")
n1$chart(forceY = c(0))

#To publish to a gist on github use this code, it will produce the url for viewing
#(you need a github account)
n1$publish('Union Density over time', host = 'gist')

Link to the interactive is here (or click on the image below)

Union Density

Mapping Australian electoral divisions with ggplot2

I’ve seen some creative visualisations of issues surrounding the Australian election recently though not as many maps as I expected. ‘ggplot2’ is the go-to package for plotting in R so I thought I’d see if I could plot the Australian electoral divisions with ggplot2. By using the Australian Electoral Commission’s GIS mapping coordinates and mutilating Hadley Whickam’s tutorial it was a pretty easy process.

1. Download the AEC boundary GIS data (warning 24mb).

2. Extract the file to your R working directory.

3. Run code…

The data.frame this process creates has 2.5m observations so mapping can take a while. I’m sure there are much more effective ways to map GIS data but I wanted to stick to ggplot2 in this instance.

require("rgdal") # requires sp, will use proj.4 if installed
require("maptools")
require("ggplot2")
require("plyr")
require("rgeos")

#I upped my memory limit as the file we are going to map is pretty large
memory.limit(6000)

australia = readOGR(dsn=".", layer="COM20111216_ELB_region")
australia@data$id = rownames(australia@data)
#This step isn't in the tutorial, need to do this due to a couple of errors in the AEC GIS data.
australia.buffered = gBuffer(australia, width=0, byid=TRUE)
australia.points = fortify(australia.buffered, region="id")
australia.df = join(australia.points, australia@data, by="id")

#This will show you the variables in the dataset
head(australia@data)

ggplot(australia.df) +
aes(long,lat,group=group,fill=ELECT_DIV)+
#Don't want a legend with 150 variables so suppress the legend
geom_polygon(show_guide = FALSE ) +
  geom_path(color="white") +
  #for some reason it maps too much ocean so limit coords (EDIT: due to Christmas Island)
  coord_equal(xlim=c(110,155))

This gives you

austr

While it’s a nice picture, it’s of little use as it is impossible to see small electorates.

State by state mapping. might be more useful Here is some code to map the ACT. I suggest anyone experimenting should play around with mapping the ACT data as it doesn’t take long to process.

ggplot(subset(australia.df, STATE == "ACT")) +
  aes(long,lat,group=group,fill=ELECT_DIV)+
  geom_polygon() +
  geom_path(color="white") +
  #include limits to remove Jervis bay plotting
  coord_equal(xlim=c(148.5,149.5))

Which gives:
act

To include your own data for mapping just add it to the australia@data data.frame, merging by australia@data$ELECT_DIV. The charts look good, but to make them really eye-catching I suggest you take them into inkscape.