Rhea Myers

Exploring Art Data 3

Let’s look at how much the “Grants For The Arts” programme of Arts Council England (ACE) gives to each region.

First of all we’ll need the data. That’s available from data.gov.uk under the new CC-BY compatible Crown Copyright here. It’s in XLS format, which R doesn’t load on GNU/Linux, but we can convert that to comma-separated values using OpenOffice.org Calc.

Next we’ll need a map to plot the data on. Ideally we’d use a Shapefile of the English regions, which R would be able to load and render easily, but there isn’t a freely available one. There’s a public domain SVG map of the English regions here, but R doesn’t load SVG either. We can convert the SVG to a table of co-ordinates that we can plot from R using a Python script:

from BeautifulSoup import BeautifulStoneSoup
import re
# We know that the file consists of a single top-level g
# containing a flat list of path elements.
# Each path consists of subpaths only using M/L/z
# So use this knowledge to extract the polylines
# Convert svg class names to gfta region names
names = {"east-midlands":"East Midlands", "east-england":"East of England",
"london":"London", "north-east":"North East",
"north-west":"North West", "south-east":"South East",
"south-west":"South West", "west-midlands":"West Midlands",
"yorkshire-and-humber":"Yorkshire and The Humber"}
svg = open("map/England_Regions_-_Blank.svg")
soup = BeautifulStoneSoup(svg)
# Get the canvas size, to use for flipping the y co-ordinate
height = float(soup.svg["height"])
# Get the containing g
g = soup.find("g")
# Get the translate in the transform
transform = re.match(r"translate\((.+), (.+)\)", g["transform"])
transform_x = float(transform.group(1))
transform_y = float(transform.group(2))
# Get the paths in the g
paths = g.findAll("path")
for path in paths:
# Get the id and convert to region name
region_name = names[path["id"]]
# Get the path data to process
path_d = path["d"]
# Split around M commands to get subpaths
path_d_subpaths = path_d.split("M")
# Keep a count of the subpaths within the id so we can identify them
subpath_count = 0
for subpath in path_d_subpaths:
# The split will result in a leading empty string
if subpath == "":
subpath_count = subpath_count + 1
# Split around the L commands to get a list of points
# The first M point already has its command letter removed
points = subpath.split("L")
for point in points:
# Remove trailing z if present
cleaned_point = point.split()[0]
# Split out the point components and translate them
(x, y) = cleaned_point.split(",")
transformed_x = float(x) + transform_x
flipped_y = height + (height - float(y))
transformed_y = flipped_y + transform_y
# Write a line in the csv
print "%s,%s,%s,%s" % (region_name, subpath_count, transformed_x,

Now we can load the grants data and the map into R, calculate the total value of grants for each region, and colour each region of the map accordingly.

Here’s the R code:

<tt>## The data used to plot a map of the English regions
colClasses=c("factor", "integer", "numeric", "numeric"))
## Plot the English regions in the given colours
## See levels(england$region) for the region names
## colours is a list of region="#FF00FF" colours for regions
## range.min and range.max are for the key values
## main.title is the main label for the plot
## key.title is the title for the key
plotEnglandRegions<-function(colours, range.min, range.max, main.title,
## Reasonable values for the window size
plot.window(c(0, 600),
c(0, 600))
## For each regionname
if (region %in% levels(england$region)){
## For each subpath of each region
lapply(1:max(england$subpath[england$region == region]),
## Get the points of that subpath
subpath.points<-england[england$region == region &
england$subpath == subpath,]
## And colour it the region's colour
polygon(subpath.points$x, subpath.points$y,
## Colour Scale
## Turn off scientific notation (for less than 10 digits)
## Sort the colours so they match the values
## The by is set to fit the number of colours and the value range
legend("topright", legend=seq(from=range.min, to=range.max,
by=((range.max - range.min) / (length(colours) - 1))),
## Load the region award data
colClasses=c("integer", "character", "character", "character",
"character", "factor", "factor", "factor",
"factor", "factor"))
## region$Award.amount contains commas
region$Award.amount<-gsub(",", "", region$Award.amount)
## And we want it as a number
## Get the totals by region
region.totals<-tapply(region$Award.amount, list(region$Region), sum)
## But we don't want the "Other" region
region.totals<-region.totals[names(region.totals) != "Other"]
## Calculate the range of colours
## The minimum value, to the nearest lowest million
## The highest vvalue, to the nearest highest million
## The darkest colour (in a range of 0.0 to 1.0)
## How to get the range of colours between that and 1.0
colour.multiplier<-(1.0 - colour.base) / (value.max - value.min)
## Make the colour levels
colour.base + (i - value.min) * colour.multiplier})
colours<-rgb(levels, 0, 0)
## Add the region names to the colours
## Plot each region in the given colour
plotEnglandRegions(colours, value.min, value.max, "Grants For The Arts 2009/10",
"Total awards in £")</tt>

And here’s the resulting map:

gtfa.png Who can point out the methodological flaw in this visualisation? ;-)