I've been banging my head up against understanding monads in Haskell for a while, but I'm now delighted to have finally cracked the puzzle. There are a myriad of tutorials and I find it highly unusual that the guides do not all point to You Could Have Invented Monads as to go-to tutorial. Where Dan Piponi (sigfpe) succeeds when others have failed is through demonstrating Monads by example before formalising the definition. The other key to its success is that exercises are provided along with the examples to make the concepts concrete.
There are however some gaps, which are not problematic for the reader, until you actually sit down to do the exercises. This blog post is intended as commentary should you get stuck working through that tutorial. I have included some documentation on monads which should help you avoid the red herrings that are thrown into introductory tutorials.
I will start by repeating the errors of those who have gone before me (explaining Monads by providing a formal definition) by introducing generalized notation. The key to learning monads is interpreting type signatures.
f and g are functions that map from some type to the same type, like Int to Int.
f, g :: a -> a
f', g' are analogues to f and g with the addition the the result is chucked into a container. The examples of containers in the tutorial are tuples (Integer, String), lists [a], and states (StdGen, a StdGen).
f', g' :: a -> m b
Next up is bind. bind f' combines two containers (m a and m b). So bind by itself combines two functions of type (a -> m b), f' and g'. This is where the type signatures are useful. The inputs are actually (a -> m b) and (m a), not (a -> m b) and (a -> m b). What gives? Well, bind combines f' and g' a. Either g' requires an argument, therefore giving a container with a value as the output, or we just add a container with a value as the input. The type signature shows the output of bind is also a container with a value in it.
bind f' :: m a -> m b
=> bind :: (a -> m b) -> m a -> m b
note: bind has been defined right associative in the tutorial so if's f (g x) or h (f (g x)), or bind h' . bind f' g' x.
unit is the identity analogue where f . id = id and id . f = id.
unit :: a - > m a
lift niftily converts a regular function with type (a -> a) to a monad (a -> m b).
lift f :: a -> m b
=> lift :: (a -> a) -> a -> m b
Having defined a bind function that connects a (a -> m b) function to a container/monad (m b), wouldn't another useful function be one that connects two (a -> m b) functions?
Enter the compose function (Thompson).
>@> :: (a -> m b) -> (b -> m c) -> (a -> m c)
The monad compose function is introduced during formal definitions of monads as it is easier to show associativity with this function, rather than bind. Associativity is one of the requirements to qualify as a monad.
Next, let's define a function that connects two containers.
It is a monadic sequencing operator which throws away the unpacked value of type a before executing the action of type m b (Yet Another Monad Tutorial - part 3).
>> :: (m a) -> (m b) -> (m b)
Allows for containers to be combined and throws away the inputs.
The three exercises
There are only three exercises for each Monad in the tutorial.
1) define bind. It has the type signature define earlier.
1b) define an operator * such that unit * f' = f' * unit = f'
f' :: a -> m b
also, the operator * is an infix equivalent of the function bind.
2) define unit. 2b) define lift as lift f = unit . f
3) show that lift f * lift g = lift (f . g)
note: equivalent to f'' * g'' where f'' = lift f and g'' = lift g
And that's it. Answers to follow in subsequent posts.
For the record, I tried Real World Haskell, Learn You a Haskell, and Write Yourself a Scheme. I also watched Brian Beckman's Don't Fear the Monad, which made sense at the time, but had forgotten soon after watching. In relation to other useful references, I found this to be a good companion piece that provides an intuitive explanation. Mike Vanier's series (I read 1, 2 and 3) is also useful too, but probably only after grasping most of it.
Why are Monads confusing? Because it is a very abstract concept that is best explained in a non-abstract way. Aside from that, notation and partial function application, mainly. I also think that it should be called containers for side effects, or something like that.
The recent proliferation of static site generators has piqued my interest in leaving Wordpress which is sometimes unresponsive. Admittedly, this isn't hard given my short attention span. I thought, "Here it is, the reason why my obscure writings won't turn into a hugely influential blog is that the website is too slow. It needs to be static pages so it can scale to handle the the audience of the whole internet.", or something like it.
While deciding which python site generator use, I came across this discussion at HN.
The comment that caught my eye was the link to Jeff Attwood's explanation as to why he uses Movable Type. His explanation is along the lines that he chooses the best tool for the job. What I heard was: the benefits of a static site generator, without glueing everything together and writing html and markdown. This commenced the beginning of another wild goose chase, better known as the path to yak shaving. Here are the brief working instructions to Movable Type on Ubuntu with fcgiwrapper and nginx.
I'm glad to be able to say that I successfully completed his tutorial a few weeks ago, with Australian data. I finally found a few minutes to upload the modified code onto Amazon's S3 Buckets service. The main obstacle to following the tutorial for anyone interested was trying to generate the images using Mac OSX 10.8.2. As most of the pieces of software are constantly being updated, it was easiest to fire up Ubuntu and run invar, an open source python script to generate the images, rather than trying to install it on the Mac....
No surprises as to what population density looks like in Australia, but it's pretty nonetheless. Extensions would be look at other kinds of data, perhaps at the postcode level. Also, implementing the layering with dots as per some of the Chicago Tribune's other maps would be great too.
Update: So the time has come for me to replicate these maps with new data. I'll document as I go this time round.
ggmap is a fantastic package on R. I've previously blogged about rgooglemaps but ggmap supercedes . It's the equivalent of ggplot for maps and it is very easy to use. In this example I overlay some data points on a map. You may have seen David Kahle's earlier work using ggplot with Houston's crime data here.
Melbourne Unemployment 2008 to 2011 - Revisited
I have previously created a choropleth map showing unemployment rates for Victoria. After seeing Drew Conway's animated clock via R-bloggers, I thought I'd give it a go with my map on unemployment rates.
Not surprisingly, unemployment rates remain low in the inner-east of Melbourne, while the north and west, and outer south east tend to have higher rates.
I recommend you go have a look at my post on ggmap. It's the best mapping package on R for plotting points on a map.
Simon Jackman's maps on Australia's 2010 federal election was one of the reasons I wanted to learn R. His paper "The Spatial Concentration of the Green Vote" has a nice explanation of how he created them. Percentage voting Labour on a 2 party preferred basis, is shown for metropolitan Sydney below. You can generate different variations on his website.
Unfortunately since I'm still a novice, I've had to start at the beginning so I've been going through Markus Loecher's rgooglemaps tutorial. Here's where I'm at, plotting 10 schools in Melbourne:
schools <- read.dta("C:/schools/schools.dta")
mymarkers <- schools[1:20, c("lat","lng")]
names(mymarkers) <- c("lat","lon")
mymarkers$size <- "tiny"
mymarkers$col <- "red"
mymarkers$char <- ""
#get the bounding box:
bb <- qbbox(lat = mymarkers[,"lat"], lon = mymarkers[,"lon"])
#download the map:
MyMap <- GetMap.bbox(bb$lonR, bb$latR, destfile = "schools.png", GRAYSCALE =T, markers = mymarkers);
#determine the max zoom, so that all points fit on the plot
zoom <- min(MaxZoom(latrange=bb$latR,lonrange=bb$lonR));
tmp <- PlotOnStaticMap(MyMap,lat = mymarkers[,"lat"], lon = mymarkers[,"lon"], cex=1.5,pch=20,col=c('blue', 'green', 'red'), add=F);
While looking for ways to overlay points on a map, I discovered that David Kahle has created some really cool maps with ggplot and google maps . It looks like he is working on an R package to generalise those maps, which is awesome .
I entered a contest on Kaggle which asks us to predict whether a student will get a certain question correct. I'm pretty sure this was a blind alley but I looked at how time taken to answer a question was related to getting a question correct. The data comes from Grockit and are from exam preparation tests. These graphs are produced from the full training set of ~500mb data. Students who answer very quickly, less than 10 seconds, are more likely to get the question wrong. Students that take longer are also more likely to get questions wrong.
Histograms of time taken to answer a question by Track
Density of time taken by whether question answered correctly
Densities of time taken by track and by answered correctly.