Reverse Engineering the Monad

I've been banging my head up against understanding monads in Haskell for a while, but I'm now delighted to have finally cracked the puzzle. There are a myriad of tutorials and I find it highly unusual that the guides do not all point to You Could Have Invented Monads as to go-to tutorial. Where Dan Piponi (sigfpe) succeeds when others have failed is through demonstrating Monads by example before formalising the definition. The other key to its success is that exercises are provided along with the examples to make the concepts concrete.

There are however some gaps, which are not problematic for the reader, until you actually sit down to do the exercises. This blog post is intended as commentary should you get stuck working through that tutorial. I have included some documentation on monads which should help you avoid the red herrings that are thrown into introductory tutorials.

Please read the following after you have read You Could Have Invented Monads.

I will start by repeating the errors of those who have gone before me (explaining Monads by providing a formal definition) by introducing generalized notation. The key to learning monads is interpreting type signatures.

f and g are functions that map from some type to the same type, like Int to Int. 

f, g :: a -> a

f', g' are analogues to f and g with the addition the the result is chucked into a container. The examples of containers in the tutorial are tuples (Integer, String), lists [a], and states (StdGen, a StdGen). 

f', g' :: a -> m b

Next up is bind. bind f' combines two containers (m a and m b). So bind by itself combines two functions of type (a -> m b), f' and g'. This is where the type signatures are useful. The inputs are actually (a -> m b) and (m a), not (a -> m b) and (a -> m b). What gives? Well, bind combines f' and g' a. Either g' requires an argument, therefore giving a container with a value as the output, or we just add a container with a value as the input. The type signature shows the output of bind is also a container with a value in it. 

bind f' :: m a -> m b

=> bind :: (a -> m b) -> m a -> m b

note: bind has been defined right associative in the tutorial so if's f (g x) or h (f (g x)), or bind h' . bind f' g' x.

unit is the identity analogue where f . id = id and id . f = id.

unit :: a - > m a

lift niftily converts a regular function with type (a -> a) to a monad (a -> m b).

lift f :: a -> m b

=> lift :: (a -> a) -> a -> m b

A digression

Having defined a bind function that connects a (a -> m b) function to a container/monad (m b), wouldn't another useful function be one that connects two (a -> m b) functions?

Enter the compose function (Thompson).

>@> :: (a -> m b) -> (b -> m c) -> (a -> m c)

The monad compose function is introduced during formal definitions of monads as it is easier to show associativity with this function, rather than bind. Associativity is one of the requirements to qualify as a monad.

Next, let's define a function that connects two containers.

It is a monadic sequencing operator which throws away the unpacked value of type a before executing the action of type m b (Yet Another Monad Tutorial - part 3).

>> :: (m a) -> (m b) -> (m b)

Allows for containers to be combined and throws away the inputs.

The three exercises

There are only three exercises for each Monad in the tutorial.

1) define bind. It has the type signature define earlier.

1b) define an operator * such that unit * f' = f' * unit = f'

note that:

 f' :: a -> m b

also, the operator * is an infix equivalent of the function bind.

2) define unit. 2b) define lift as lift f = unit . f

3) show that lift f * lift g = lift (f . g)

note: equivalent to f'' * g'' where f'' = lift f and g'' = lift g

And that's it. Answers to follow in subsequent posts.

...

For the record, I tried Real World Haskell, Learn You a Haskell, and Write Yourself a Scheme. I also watched Brian Beckman's Don't Fear the Monad, which made sense at the time, but had forgotten soon after watching. In relation to other useful references, I found this to be a good companion piece that provides an intuitive explanation. Mike Vanier's series (I read 1, 2 and 3) is also useful too, but probably only after grasping most of it.

Why are Monads confusing? Because it is a very abstract concept that is best explained in a non-abstract way. Aside from that, notation and partial function application, mainly. I also think that it should be called containers for side effects, or something like that.

Movable Type on Ubuntu with Nginx and fcgiwrapper

The recent proliferation of static site generators has piqued my interest in leaving Wordpress which is sometimes unresponsive. Admittedly, this isn't hard given my short attention span. I thought, "Here it is, the reason why my obscure writings won't turn into a hugely influential blog is that the website is too slow. It needs to be static pages so it can scale to handle the the audience of the whole internet.", or something like it.

While deciding which python site generator use, I came across this discussion at HN.

http://news.ycombinator.com/item?id=89663

The comment that caught my eye was the link to Jeff Attwood's explanation as to why he uses Movable Type.  His explanation is along the lines that he chooses the best tool for the job. What I heard was: the benefits of a static site generator, without glueing everything together and writing html and markdown. This commenced the beginning of another wild goose chase, better known as the path to yak shaving. Here are the brief working instructions to Movable Type on Ubuntu with fcgiwrapper and nginx. 

1. Download MT5...

  $ wget http://movabletype.org/stable/MT-5.0-en.zip

2. Unzip it

e.g. unzip MT-5.0-en.zip mt

3. Make some directories and then add symbolic links:

cd /var/www/
mkdir cgi-bin
mkdir html
ln -s /home/ubuntu/mt  /var/www/cgi-bin/mt
ln -s /home/ubuntu/mt-static /var/www/html/mt-static

4. Install nginx, spawn-fcgi and fcgiwrap

sudo apt-get install nginx spawn-fcgi libfcgi0ldbl
sudo apt-get install build-essential libfcgi-dev

5. Modify /etc/init.d/fcgiwrap to set the user and group to www-data or ubuntu. I know it's hacky in a bad way and security risk, but it works.

#FCGI_APP Variables
FCGI_CHILDREN="1"
FCGI_SOCKET="/var/run/$NAME.socket"
FCGI_USER="ubuntu"
FCGI_GROUP="ubuntu"
# Socket owner/group (will default to FCGI_USER/FCGI_GROUP if not defined) FCGI_SOCKET_OWNER="ubuntu" FCGI_SOCKET_GROUP="ubuntu"

 6. Create a socket file for fcgiwrap and set its permissions such that fcgiwrap can access it.

touch /var/run/fcgiwrap.socket
sudo chown ubuntu:ubuntu /var/run/fcgiwrap.socket

7. Nginx config file. This was the biggest pain point for me until I found link [4]:

server {
     server_name mt.bhoung.com;

     root /home/ubuntu/blog/;

     location / {
        index index.html index.htm;
     }

     location ~ ^/cgi-bin/.*\.cgi$ {
     gzip off; #gzip makes scripts feel slower since they have to complete before getting gzipped
     fastcgi_pass unix:/var/run/fcgiwrap.socket;
     fastcgi_index index.cgi;
     fastcgi_param SCRIPT_FILENAME /var/www/$fastcgi_script_name;
     fastcgi_param QUERY_STRING $query_string;
     fastcgi_param REQUEST_METHOD $request_method;
     fastcgi_param CONTENT_TYPE $content_type;
     fastcgi_param CONTENT_LENGTH $content_length;
     fastcgi_param GATEWAY_INTERFACE CGI/1.1;
     fastcgi_param SERVER_SOFTWARE nginx;
     fastcgi_param SCRIPT_NAME $fastcgi_script_name;
     fastcgi_param REQUEST_URI $request_uri;
     fastcgi_param DOCUMENT_URI $document_uri;
     fastcgi_param DOCUMENT_ROOT $document_root;
     fastcgi_param SERVER_PROTOCOL $server_protocol;
     fastcgi_param REMOTE_ADDR $remote_addr;
     fastcgi_param REMOTE_PORT $remote_port;
     fastcgi_param SERVER_ADDR $server_addr;   
     fastcgi_param SERVER_PORT        $server_port;
     fastcgi_param SERVER_NAME        $server_name;
   }
   
   # for static files 
   location /mt-static {
       root /var/www/html;
   }
}

8. As I was importing my entries from wordpress I hit a file size limitation in nginx. 

# vi /usr/local/nginx/conf/nginx.conf
client_max_body_size 2M;

* I know I've skipped a few steps like the MySQL and the mt-config.cgi file... I'll get back it a bit later

[1] http://www.movabletype.org/documentation/installation/
[2] https://help.ubuntu.com/community/FcgiWrap

[3] http://library.linode.com/web-servers/nginx/perl-fastcgi/ubuntu-10.04-lucid

[4] http://wiki.nginx.org/SimpleCGI
[5] http://www.cyberciti.biz/faq/linux-unix-bsd-nginx-413-request-entity-too-large/
[6] http://www.livejournal.com/doc/server/lj.install.perl_setup.modules.html

Google Maps Overlays with TileMill QGIS and R

Check it out: http://maps.bhoung.com/popden/index.html

I had been meaning to replicate the cool maps that Ben Boyer made when he was at the Chicago Tribune (population density map of Chicago: http://media.apps.chicagotribune.com/census-2010/census-demo/index.html) .

I'm glad to be able to say that I successfully completed his tutorial a few weeks ago, with Australian data. I finally found a few minutes to upload the modified code onto Amazon's S3 Buckets service. The main obstacle to following the tutorial for anyone interested was trying to generate the images using Mac OSX 10.8.2. As most of the pieces of software are constantly being updated, it was easiest to fire up Ubuntu and run invar, an open source python script to generate the images, rather than trying to install it on the Mac....

No surprises as to what population density looks like in Australia, but it's pretty nonetheless. Extensions would be look at other kinds of data, perhaps at the postcode level. Also, implementing the layering with dots as per some of the Chicago Tribune's other maps would be great too.

Update: So the time has come for me to replicate these maps with new data. I'll document as I go this time round.

1. Merging spatial data with shapefiles can be done more easily than using postgres with either R (http://gis.stackexchange.com/questions/6839/best-way-to-add-attribute-data-to-a-shape-file) or the mmqgis plugin in QGIS.

2.Carto to conver tilemill mml  file to an xml file

3. install pip and then invar

https://github.com/mapnik/mapnik/wiki/UbuntuInstallatio

  • sudo apt-get install python-mapnik mapnik-utils libmapnik0.7
  • sudo apt-get install python-pip
  • sudo apt-get install python-greenlet
  • sudo pip install invar
  • ivtile -h
  • ivtile project.xml . -9.3 96.9 -43 159 10 16 -p 2

ggmap

ggmap is a fantastic package on R. I've previously blogged about rgooglemaps but ggmap supercedes . It's the equivalent of ggplot for maps and it is very easy to use. In this example I overlay some data points on a map. You may have seen David Kahle's earlier work using ggplot with Houston's crime data here.

 

Retail Volumes April 2012

From all reports, the retail (and services) market in Melbourne has been pretty miserable. It doesn't bear out in the official numbers though, yet.

Choropleth Map Time Lapse with R statistics, ggplot and ffmpegx

Melbourne Unemployment 2008 to 2011 - Revisited I have previously created a choropleth map showing unemployment rates for Victoria. After seeing Drew Conway's animated clock via R-bloggers, I thought I'd give it a go with my map on unemployment rates. Not surprisingly, unemployment rates remain low in the inner-east of Melbourne, while the north and west, and outer south east tend to have higher rates.

rgooglemaps

I recommend you go have a look at my post on ggmap. It's the best mapping package on R for plotting points on a map.

Simon Jackman's maps on Australia's 2010 federal election was one of the reasons I wanted to learn R. His paper "The Spatial Concentration of the Green Vote" has a nice explanation of how he created them. Percentage voting Labour on a 2 party preferred basis, is shown for metropolitan Sydney below. You can generate different variations on his website.

 

Unfortunately since I'm still a novice, I've had to start at the beginning so I've been going through Markus Loecher's rgooglemaps tutorial. Here's where I'm at, plotting 10 schools in Melbourne:

setwd("C:/schools/")

 
require(foreign)
require(RgoogleMaps)
 
schools <- read.dta("C:/schools/schools.dta")
mymarkers <- schools[1:20, c("lat","lng")]
names(mymarkers) <- c("lat","lon")
 
mymarkers$size <- "tiny"
mymarkers$col <- "red"
mymarkers$char <- ""      
 
 
#get the bounding box:
bb <- qbbox(lat = mymarkers[,"lat"], lon = mymarkers[,"lon"])
 
#download the map:
MyMap <- GetMap.bbox(bb$lonR, bb$latR, destfile = "schools.png", GRAYSCALE =T, markers = mymarkers);
 
#determine the max zoom, so that all points fit on the plot
zoom <- min(MaxZoom(latrange=bb$latR,lonrange=bb$lonR));
 
#plot:
png("OverlayTest.png",640,640);
plot.new()
tmp <- PlotOnStaticMap(MyMap,lat = mymarkers[,"lat"], lon = mymarkers[,"lon"], cex=1.5,pch=20,col=c('blue', 'green', 'red'), add=F);
dev.off()

While looking for ways to overlay points on a map, I discovered that David Kahle has created some really cool maps with ggplot and google maps [1]. It looks like he is working on an R package to generalise those maps, which is awesome [2].

 
[1] https://github.com/hadley/ggplot2/wiki/Crime-in-Downtown-Houston,-Texas-:-Combining-ggplot2-and-Google-Maps
[2] https://github.com/dkahle

Grockit on Kaggle

I entered a contest on Kaggle which asks us to predict whether a student will get a certain question correct. I'm pretty sure this was a blind alley but I looked at how time taken to answer a question was related to getting a question correct. The data comes from Grockit and are from exam preparation tests. These graphs are produced from the full training set of ~500mb data. Students who answer very quickly, less than 10 seconds, are more likely to get the question wrong. Students that take longer are also more likely to get questions wrong.

 Histograms of time taken to answer a question by Track

Density of time taken by whether question answered correctly

Densities of time taken by track and by answered correctly.