Google Maps Overlays with TileMill QGIS and R

Check it out:

I had been meaning to replicate the cool maps that Ben Boyer made when he was at the Chicago Tribune (population density map of Chicago: .

I'm glad to be able to say that I successfully completed his tutorial a few weeks ago, with Australian data. I finally found a few minutes to upload the modified code onto Amazon's S3 Buckets service. The main obstacle to following the tutorial for anyone interested was trying to generate the images using Mac OSX 10.8.2. As most of the pieces of software are constantly being updated, it was easiest to fire up Ubuntu and run invar, an open source python script to generate the images, rather than trying to install it on the Mac....

No surprises as to what population density looks like in Australia, but it's pretty nonetheless. Extensions would be look at other kinds of data, perhaps at the postcode level. Also, implementing the layering with dots as per some of the Chicago Tribune's other maps would be great too.

Update: So the time has come for me to replicate these maps with new data. I'll document as I go this time round.

1. Merging spatial data with shapefiles can be done more easily than using postgres with either R ( or the mmqgis plugin in QGIS.

2.Carto to conver tilemill mml  file to an xml file

3. install pip and then invar

  • sudo apt-get install python-mapnik mapnik-utils libmapnik0.7
  • sudo apt-get install python-pip
  • sudo apt-get install python-greenlet
  • sudo pip install invar
  • ivtile -h
  • ivtile project.xml . -9.3 96.9 -43 159 10 16 -p 2


ggmap is a fantastic package on R. I've previously blogged about rgooglemaps but ggmap supercedes (and appears to even build on it - looks to be the case when you install it). It's the equivalent of ggplot for maps and it is very easy to use. In this example I overlay some data points on a map. You may have seen David Kahle's earlier work using ggplot with Houston's crime data here.


Choropleth Map Time Lapse with R statistics, ggplot and ffmpegx

Melbourne Unemployment 2008 to 2011 - Revisited I have previously created a choropleth map showing unemployment rates for Victoria. After seeing Drew Conway's animated clock via R-bloggers, I thought I'd give it a go with my map on unemployment rates. Not surprisingly, unemployment rates remain low in the inner-east of Melbourne, while the north and west, and outer south east tend to have higher rates.


I recommend you go have a look at my post on ggmap. It's the best mapping package on R for showing points on a map.

Simon Jackman's maps on Australia's 2010 federal election was one of the reasons I wanted to learn R. His paper "The Spatial Concentration of the Green Vote" has a nice explanation of how he created them. Percentage voting Labour on a 2 party preferred basis, is shown for metropolitan Sydney below. You can generate different variations on his website.


Unfortunately since I'm still a novice, I've had to start at the beginning so I've been going through Markus Loecher's rgooglemaps tutorial. Here's where I'm at, plotting 10 schools in Melbourne:

Early days, but at least it feels like a start since I lost a few hours as I had longtitude and latitude around the wrong way. The other gotcha was that the variable names in mymarkers had to be "lat" and "lon" (I was using "lng").

schools <- read.dta("C:/schools/schools.dta")
mymarkers <- schools[1:20, c("lat","lng")]
names(mymarkers) <- c("lat","lon")
mymarkers$size <- "tiny"
mymarkers$col <- "red"
mymarkers$char <- ""      
#get the bounding box:
bb <- qbbox(lat = mymarkers[,"lat"], lon = mymarkers[,"lon"])
#download the map:
MyMap <- GetMap.bbox(bb$lonR, bb$latR, destfile = "schools.png", GRAYSCALE =T, markers = mymarkers);
#determine the max zoom, so that all points fit on the plot
zoom <- min(MaxZoom(latrange=bb$latR,lonrange=bb$lonR));
tmp <- PlotOnStaticMap(MyMap,lat = mymarkers[,"lat"], lon = mymarkers[,"lon"], cex=1.5,pch=20,col=c('blue', 'green', 'red'), add=F);

While looking for ways to overlay points on a map, I discovered that David Kahle has created some really cool maps with ggplot and google maps [1]. It looks like he is working on an R package to generalise those maps, which is awesome [2].


What do you know? Grockit/Kaggle

I entered a contest on Kaggle which asks us to predict whether a student will get a certain question correct. I'm pretty sure this was a blind alley but I looked at how time taken to answer a question was related to getting a question correct. The data comes from Grockit and are from exam preparation tests. These graphs are produced from the full training set of ~500mb data. Students who answer very quickly, less than 10 seconds, are more likely to get the question wrong. Students that take longer are also more likely to get questions wrong.

 Histograms of time taken to answer a question by Track

Density of time taken by whether question answered correctly

Densities of time taken by track and by answered correctly.  

Yes, I recoded tracks 2  to track 0 and track 3 to track 1 since their numbers were so low.

svn on subdomain and on https

After setting up svn on apache I go about putting svn on a subdomain and also on https (port 443). This is handy because sometimes checking on http (port 80) doesn't work due to firewalls or proxy servers...?


# Uncomment this to enable the repository
DAV svn

# Set this to the path to your repository
SVNPath /home/brendan/svn

AuthType Basic
AuthName "Subversion Repository"
AuthUserFile /etc/apache2/dav_svn.passwd

Require valid-user

NameVirtualHost *:443


DAV svn
SVNPath /home/brendan/svn
AuthType Basic
AuthName "Subversion Repos"
AuthUserFile /etc/apache2/dav_svn.passwd
Require valid-user

SSLEngine on
SSLCertificateFile /etc/apache2/ssl/apache.pem
SSLProtocol all


svn post-commit svn notify & svn update to another server

So I've set up SVN to send an email to me every time someone commits to the repository, and in addition, put instructions for the post-commit hook to update the development server (and/or production server).

file: post-commit


  • is the recipient. For multiple recipients, the format is "first" "second" "third" email addresses.

  • I am using a gmail account to send the notification messages after each commit; PASSWORD needs to be replaced by actual password. Replace my gmail account with your gmail account. Clearly, you could use any email account, not just gmail.

  • project is the name of the checked out repository


# svn update the production server
/usr/bin/svn update /home/brendan/project

# send emails to everyone about the last commit


echo $REPOS
echo $REV

for email in ""
--repos-path "$REPOS"
--revision "$REV"
--handler HTML::ColorDiff
--smtp-pass PASSWORD
--to $email
--subject-prefix [ATLANTAS]

#update the development server

file post-commit-svn


  • Once again PASSWORD needs to be replaced by actual password. This it is the ssh password for the development server.

  • SERVER needs to be replaced by an ip or domain name (address of server with ssh)

#!/usr/bin/expect -f

# AUTHOR: Meitar Moscovitz
# DATE : Thu Jun 21 16:32:42 EDT 2007


# the path to the svn executable on the remote web server
set SVNBIN /usr/bin/svn

# our network is slow, set a long timeout
set timeout 30


# The post-commit hook is invoked after a commit. Subversion runs
# this hook by invoking a program (script, executable, binary, etc.)
# named 'post-commit' (this file) with the
# following ordered arguments:
# [1] REPOS-PATH (the path to this repository)
# [2] REV (the number of the revision just committed)
# Note that Subversion does not provide this program with an environment
# of any kind. That means this program lacks a current working directory,
# a home directory, a $PATH, and so on.

set REPOS [lindex $argv 0]
set REV [lindex $argv 1]

# Define error codes
set E_NO_SSH 1 ;# can't find a usable SSH on our system
set E_NO_CONNECT 2 ;# failure to connect to remote server (timed out)
set E_WRONG_PASS 3 ;# password provided does not work
set E_UNKNOWN 25 ;# unexpected failure

# find the SSH binary on our system
set SSHBIN /usr/bin/ssh

spawn $SSHBIN brendan@SERVER -t "$SVNBIN update /home/brendan/project --password $PASS"

expect {
"continue connecting (yes/no)? " { send "yesr"; exp_continue; }
-nocase "password:" { send "$PASSr"; }
timeout {
send_error "nWe have timed out after $timeout seconds while trying to connect to $HOST!n";

expect {
-nocase "password:" { ;# if we are asked for the password again, then we have provided the wrong password
send_error "nCan not log in to $HOST because the password provided for user $USER has been rejected.n";
-re "revision ([0-9]+)." {
if {$REV == $expect_out(1,string)} {
send_user "nSuccessfully updated $WC on $HOST to revision $REV.n"
} else {
send_user "nUpdated repository to revision $expect_out(1,string), but svn reports that we are at revision number $REV.n"
send_error "CAUTION: Repository updated to revision $expect_out(1,string), but committed revision $REV.n"
default {
send_error "An unexpected error has occured. The process at spawn ID $spawn_id has produced the following output:n"
send_error $expect_out(buffer)


Install Python Django on Apache Ubuntu 10.04

I can tell you that moving from LAMP to Python Django has been a terrific headache. Here is the guide that I wished would have existed before I started out.

1.install relevant packages

sudo apt-get install apache2 subversion
sudo apt-get install mysql-server python-mysqldb
sudo apt-get install libapache2-mod-wsgi

2. download django

sudo apt-get install subversion

svn co ~/django-trunk/

You can really checkout the code to any location, not just "~/django-trunk/"

3. tell python where django is.

python -c "from distutils.sysconfig import get_python_lib; print get_python_lib()"
ln -s /home/brendan/django-trunk/django /DIR-FROM-PREV-LINE/django
ln -s /home/brendan/django-trunk/django/bin/ /usr/local/bin

sudo python install

4. for a production site: edit /etc/apache2/sites-available/ (don't forget to run a2ensite
for localhost: edit /etc/apache2/sites-available/default.
'mysite' is directory for the contents of your new website, to include the code below.

Change the ip address if appropriate. We configure apache to serve static content. This is not recommended by Django but it will do for now.



# uncomment next 3 lines for production
# ServerName
# ServerAlias

# WSGIProcessGroup
WSGIDaemonProcess user=mysite group=mysite threads=10 python-path=/usr/lib/python2.6/site-packages
WSGIScriptAlias / /home/brendan/mysite/mysite.wsgi

Alias /static/ /home/brendan/mysite/static/

/* this is to allow apache to serve static content */  
<Directory "/home/brendan/mysite/static/">
Order deny,allow
Allow from all

<Directory /home/brendan/mysite/>
Order deny,allow
Allow from all

ErrorLog /var/log/apache2/error.log

LogLevel warn
CustomLog /var/log/apache2/access.log combined

chown -R mysite /home/brendan/mysite

6. create mysite.wsgi referred to in the previous line with contents (once again, mysite is the name of your new website):

import os, sys
# using os we can determine the location of this mysite.wsgi file in ROOT
ROOT = os.path.dirname(os.path.realpath(__file__))
os.environ['DJANGO_SETTINGS_MODULE'] = 'settings' # this is your file
os.environ['PYTHON_EGG_CACHE'] = '/tmp'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()

7. Edit the details in within /home/brendan/mysite/, and you're done.

ADMINS = (    # ('Brendan', ''),)
DATABASE_ENGINE =   'django.db.backends.mysql'
DATABASE_NAME =     'mysite_db'
DATABASE_USER =     'user'
TIME_ZONE = 'Australia/Melbourne'
MEDIA_ROOT = '/home/brendan/mysite/static/'
ADMIN_MEDIA_PREFIX = '/static/admin/'
TEMPLATE_LOADERS = (    'django.template.loaders.filesystem.load_template_source',
MIDDLEWARE_CLASSES = (    'django.middleware.common.CommonMiddleware',
ROOT_URLCONF = 'mysite.urls'
TEMPLATE_DIRS = (    "/home/brendan/mysite/templates/")
INSTALLED_APPS = (    'django.contrib.auth',


Django tutorial (part 1)

Convert Excel Tables to Word

Last year I wrote a macro in Excel which automates the process of copying tables from excel format (xml) to word documents.   The macro checks each worksheet and copies each table across to a word document with page breaks. It uses the first cell "A1" in each worksheet to name the table in the word document.