Thursday, August 18, 2011

i am not a monad i am a free algebra! (Pt 1)


This post is a part of a series of posts attempting make explicit some folklore and wisdom of the commons that computer scientists enjoy but neither mainstream programmers nor mainstream mathematicians have exposure to. Both of these cultures would benefit from this common wisdom and so it feels like telling this story is of a certain utility. The caveat is that unlike the other posts the pay-off is cumulative not immediate. i'm not going to present some eye-catching bauble as with the post on Conway Games. Instead, you're going to have to steep in this stuff a bit before the goodness kicks in.

What is a free algebra?

In true agile style, let’s begin with a concrete example. Let’s begin with a monoid. The free monoid over some generators, g1, …, gN, is just all the finite sequences we can make from the elements g1, …, gN. The smallest such is the empty sequence. The set of next such smallest are just the elements themselves. Then we have all sequences of length two, of which there are [exercise: how many and why?]. And so on. Someone coming from computer science will notice that this is also the set of all words over the set G = { g1, …, gN }, sometimes written G*. This set of sequences is equipped with a natural binary operation which is closed on the set, namely concatenation. That is, if w1 and w2 are in G*, then w1w2 is also in G*. If g1 through gN were characters (such as those in the ISO character set), then we would recognize G* as the set of strings.

We call this monoid ‘free’ because there are no constraints on the operation of concatenating elements. We simply generate all the possible combinations of characters in any order. Notice that one way to represent or model this data type would be to use List. That is, the type List[G] is exactly isomorphic to G*, down to using list concatenation (l1 ++ l2) for word concatenation (w1w2). The relationship between List[G] and G* is exactly the relationship between monad and monoid, but i get ahead of myself. The point i wish to make is that from a certain vantage we can treat List[G] as a concrete model of G*. Now, if we were to move from List[G] to Set[G], removing the possibility of duplicates and ignoring order, this would be the same as if we had introduced some constraints on (a polymorphic implementation of) the operation of concatenation (on the type Set[G]). Specifically, we would be requiring that for any words, w, w1 and w2 we have
  • w ++ w == w (that is, if w in Set[G], then w U w = w)
  • w1 ++ w2 == w2 ++ w1 (that is, if w1, w2 in Set[G], then w1 U w2 = w2 U w1)

In other words, constraints are identities.

Notice that constraints also do two other things. They shrink down the size of the collection of inhabitants of the type. For generators g1, …, gN, there are an infinite number of lists; yet, for the same collection of generators there are exactly 2^N sets. While 2^N might look large, it is dwarfed by the countable infinity. Likewise, we cannot see certain structure. To make this point clear, note that we might have a concrete implementation of Set[G] that actually uses List[G] to hold the elements of the collection, but makes certain that with operations that add to the collection, for example, they never allow duplicates.

Algebras and data types

Returning to the monoid as a freely generated algebra, we can recognize that a monoid is also an abstract data type. We can write it down.

trait Monoid[G] {
def identity : G
def compose( g1 : G, g2 : G ) : G
}

It obeys certain laws, namely, if MG : Monoid[G] then for all g, g1, g2, g3 : G, we require
  • MG.compose( MG.identity, g ) == g == MG.compose( g, MG.identity)
  • MG.compose( MG.compose( g1, g2 ), g3 ) == MG.compose( g1, MG.compose( g2, g3 ) )

In this case, the (set of inhabitants of the) type G is called the carrier (set) of the monoid, Monoid[G]. (Notice the distinction between Monoid[G] and G. There might be many, many monoid operations that work with some carrier set G. Mathematicians and computer scientists might abuse terminology and call G a monoid, but they are very aware that the monoid has additional structure, namely the binary operation, compose, and that there may be many well defined compositions over G. This point will become crucial later when we talk about how to represent and think about monads, and more generally when we think about how to structure code in such a way that we can get substantive reuse.)

An observation that is vitally important to get across is that there is a canonical way to implement the free monoid over G, and that is List[G]. Moreover, this solution is entirely syntactic. There is no additional calculation in List( g1, …, gK ) other than to record the elements of the container and their order. This is important because it is true of all free algebras: they always have a canonical syntactic representation.

This observation is even more important in computer science than in other mathematical disciplines because of the relationships between algebras and data types. There’s a reason why functional languages enjoy algebraic data types. There’s more to be said about this, but we have to wait for the next post.

No comments: