Thursday, August 18, 2011

i am not a monad i am a free algebra! (Pt 2)


This post is a part of a series of posts attempting make explicit some folklore and wisdom of the commons that computer scientists enjoy but neither mainstream programmers nor mainstream mathematicians have exposure to. Both of these cultures would benefit from this common wisdom and so it feels like telling this story is of a certain utility. The caveat is that unlike the other posts the pay-off is cumulative not immediate. i'm not going to present some eye-catching bauble as with the post on Conway Games. Instead, you're going to have to steep in this stuff a bit before the goodness kicks in.

The Language connection

The relationship between List[G] and Set[G] is a microcosm in which we see a reflection of a very general phenomenon. For all of the situations the web programmer is likely to face all of the algebras (data types) they will deal with will have a presentation in terms of some freely generated data structure (e.g. List) and a (possibly empty) set of constraints on that (such as the equations on Set that forget order and remove duplicates). In fact, a very large number of practical, day-to-day data types will ultimately boil down to a structure that could be represented by List[G] plus some constraints -- this fact corresponds to the way in which many algebras “factor through” List[G]. It also gives some indication of the "universality" of List as a data type -- one of the primary reasons why it worked so well when paired with the core functional apparatus of the lambda calculus in early functional languages, such as Lisp and Scheme.

Keeping that observation in the back of our minds let's turn to the evident connection to language processing. The connection to languages -- both generating and recognizing, i.e. parsing -- mentioned above is very strong and also a part of the folklore at the intersection of the computer science and meta-mathematics communities. Looking at the example of monoid, this should not be shocking. The free monoid is exactly isomorphic to the structure generated by the Kleene star-like operation of BNF grammars. More generally, a grammar is just another way of specifying a freely generated structure. The constructors of a context-free grammar (CFG), for example, are more general than word concatenation and hence generate more structures than the generators and relations presentations of classical algebras like monoids, groups, rings, fields, etc. The principle, however, is the same: they represent a means of freely generating structures from a simple set of rules for generating new structures from old ones. It’s just that the rules allow constructs other than simple concatenation.

One new thing that arises in this context is that the dual operation to generation, namely recognition or parsing, gets a lot more interesting in this setting and has close connections to phenomena a functional programmer is already familiar with, such as pattern matching, and perhaps more surprisingly type-checking. We'll talk more about that later in the series.

A nod to Curry, Howard and Chomsky

Of particular interest to the computer scientist and programmer is the fact that grammar constructors have close correspondence to type constructors. Again, this is not an accident! For the computer scientist and logician this is another piece of folklore and wisdom of the commons to be exploited to win friends and influence computers. The programmer, though, might not be as familiar with these correspondences which arise from the intimate connections between types, logical formulae and algebras. To spell out the relationships just a bit, there are families of isomorphisms between types and logical formulae that fall under the rubric, the Curry-Howard isomorphism, or propositions-as-types paradigm. For many practical purposes we can view logical formulae and types as essentially two different ways of looking at the same thing. Again, more on this later, just bear in mind that the two notions are somehow effectively the same. Instead, we want to spend a little more time with the relationship between logical formulae and algebras.

The relationship between logical formulae, or more generally, logics and algebras is of the same kind as the one described in the previous post between List[G] and the free monoid over G. List[G] is a model of the algebraic theory of (the free) monoid (over G). In the same way, algebras are models of the logics in which we write our logical formulae. Different logics correspond to different algebras. For example, most programmers are pretty familiar with Boolean operations, such as &. As such they are aware that if

val A1 : Boolean = …;
val A2 : Boolean = …;

that is A1, A2 are of type Boolean, then A1 & A2 == A2 & A1 and also that A1 & A1 == A1. But wait a minute? Aren’t those the same equations we saw holding for Set[G]? Indeed they are! The Set data structure is a model of Boolean logic. In fact, it’s a bit more complex than that. There are such things as Boolean algebras. Boolean algebras are models of Boolean logics and Set is a kind of canonical model that can be seen to underlie all the Boolean algebras. That is, Set[G] plays in the role of the ‘purely syntactical’ model of the Boolean algebras which in turn model the logics (while Set[G] as we know has a purely syntactical model in terms of List[G]).

This sort of relationship is not restricted to the Boolean situation. For example, J-Y Girards's classical linear logic (CLL) sports two distinct flavors of conjunctions (respectively, disjunctions) as well as an involutive negation (meaning that ~~A is logically equivalent to A -- a fact that doesn't hold in many logics important to computing). We get canonical purely syntactic models of formulae in CLL via algebraic structures known as quantales. Fortunately, i don't have to trot out the very fancy definition of a quantale to convey it to a modern programmer. Instead, i can note that given some carrier set G, every quantale has a purely syntactic model in terms of the set of sequences of subsets of G. This we can write down! It's simply the data type List[Set[G]]. There, that's the canonical free quantale. As an ordinary programmer you were already an expert in classical linear logic; you just didn't know it!

In fact, this is a very real pay-off from this kind of thinking. It begins to organize and simplify the veritable zoo of data structures and provide some coherence to the various means of reasoning about them. There are actually a handful of principles and operations at work that give rise to a great deal of the complexity we see at play in day-to-day programming. Most programmers feel this in their bones, but don't quite have a way to express it effectively. This is one way.

More specifically, we teased apart some of these relationships in order to see that in all the cases we've seen so far every structure, be it algebra, type or formula ultimately boils down to -- or in the words of the trade -- factors through a purely syntactical model. On reflection, we've said that a type is (another way of looking at the content of) a logical formulae of some logic -- which in turn has some canonical purely syntactic algebraic model. This fact is what underlies the correspondence between the operators of various grammars types (such as CFG) and type constructors. Ultimately, we need a concrete representation of the types over which we compute in order to do calculations (such as type conformance, or building a new type from a parametric type constructor and a type parameter). Since we know that these type gadgets ultimately factor through some purely syntactical representation what better device than grammars for describing these? Grammars, after all, are our compositional mechanism for describing, computing with and reasoning about syntax.

5 comments:

A Breaking Change said...

Beautiful, and with refreshing precision!

I'm slightly confused about this statement: "while Set[G] as we know has a purely syntactical model in terms of List[G]"

If, by reference to your earlier post, I take "syntactic model" to mean ". . . this solution is entirely syntactic. There is no additional calculation in List( g1, …, gK ) other than to record the elements of the container and their order" -- it is clear how "List[G]" "models" or "implements" Monoid[G], because there are no additional calculations other than to record [existence of] elements and order, i.e., location in the list.

And then I look at "we might have a concrete implementation of Set[G] that actually uses List[G] to hold the elements of the collection, but makes certain that with operations that add to the collection, for example, they never allow duplicates"

and I see "...with operations...[that] never allow duplicates" and I think "Aha, those are 'additional calculations other than recording existence and location of elements'"

and I can't see how List[G] provides a purely syntactic model (or implementation) of Set[G]. The additional duplicate-removing operations and de-ordering operations seem more than syntactic to my eyes. Where have I gone wrong?

Thanks again for this great conversation!

leithaus said...

Dear ABC,

Thanks for your comments and question. Essentially, by purely syntactic model i mean a generators-relations-style presentation which spans the gamut from traditional Universal Algebra presentations to the more modern grammar + structural equivalence presentation. The key point is that you have a free structure together with bi-directional rewrite rules (relations). This is entirely syntactically representable and constitutes a canonical "symbolic" approach to computing.

Best wishes,

--greg

A Breaking Change said...

Very clear. Then, let me see if I can write something BNF-ish to model a set as a list. I think I can do it in Mathematica, because of its heavy pattern matching. Untested, it might look something like this
Adjoin[x_,{}]:={x};
Adjoin[x_,{x_,xs___}]:={x,xs}
Adjoin[x_,{y_,xs___}]:=Append[xs,Adjoin[x,xs]]

Rule 2 says if you see a dup at head of list, don't append. Rule 3 recurses via Mathematica's twisty lack of proper cons, but I hope you can see what I'm trying to express. Y combinator would make it cleaner if longer. When I get back to a machine with an actual installation of Mathematica, I'll test this. Meantime, if you know an expression of this idea in another more mainstream language, I'd be grateful to see it.

A Breaking Change said...

That last line should have said
Append[y, Adjoin[x, xs]]

A Breaking Change said...

Here is a cleaned-up version of the syntactic "Adjoin" that builds sets into lists by pattern-matching and rewrite. It took a minute with real Mathematica. I used "Join" instead of the backward "Append."

Adjoin[x_, {}] := {x};
Adjoin[x_, {x_, xs___}] := {x, xs};
Adjoin[x_, {y_, xs___}] := Join[{y}, Adjoin[x, {xs}]];

Adjoin[1, {}] ~~> {1}

Adjoin[1, {1}] ~~> {1}

Adjoin[1, {1, 2}] ~~> {1,2}

Adjoin[1, {2, 1}] ~~> {2,1}