Sunday, April 26, 2009

The law of attraction

One of the more interesting aspects of Milner's π-calculus is it's generalization of application. In the λ-calculus application comes about by lexical proximity.

M[v] ::= // a term is either
v // a (variable) mention, or
λ v . M[v] // an abstraction, or
M[v] M[v] // an application

(With some minor trepidation we adopt a non-standard variation of EBNF notation to allow for an explicit representation of the grammar as parametric in a type, in this case the grammar is parametric in the type that represents the terminal v for variables. For someone familiar with programming in Haskell, this should represent no particular challenge.)

In this model, two terms "interact" when one occurs immediately to the left of another. We understand this notion of interaction by beta-reduction.

(λ x. M) N → M[N/x]

In the π-calculus, the site of interaction is named and terms are free to float around one another in a mixture. In the following higher-order presentation

P[v] ::= // a process is either
0 // a termination, or
v?(v)P[v] // a reception, or
v!(P[v]) // a transmission, or
P[v] | P[v] // a (parallel) composition

we have application corresponding to

x?(y)P | x!(Q)

where we have more subtle rules governing interaction. The most primitive of these is a synchronization+substitution rule, known as the communication rule, or comm rule for short.

x?(y)P | x!(Q) → P[Q/y]

Meanwhile the mixing occurs via structural equivalence rules of the form

P | (Q | R) ≡ (P | Q) | R
P | Q ≡ Q | P

allowing terms of the form

x?(y)P | R | x!(Q)

to reconfigure to terms of the form

x?(y)P | x!(Q) | R

Terms of this form make it clear that you also need to consider reduction in context and the π-calculus provides explicit rules for this because not all contexts afford reduction. So, we have reduction in the context of parallel composition (aka mixture)

P → P'
P|Q → P'|Q

But, we do not have reduction under a reception (resp., transmission).

Now, a very interesting characteristic of the π-calculus is that this sort of term reconfiguration provided by the mixing laws is not governed by any additional structure. Nothing prevents a term of the type in which a reduction is possible from reconfiguring to a term in which the potential redex is "broken up" by an intervening process. Similarly, nothing compels a term in which a potential redex is "broken up" to reconfigure to one in which the redex is "armed."

When using the π-calculus to model physical processes, such as in the work by the algebraic biology communities, sometimes a preferential rate is applied to different channels to mimic reaction rates in chemical equations, as in stochastic versions of the π-calculus. There is, however, no machinery to model attractiveness or repulsion, per se. That is, there is no machinery to model any forces that might compel term reconfiguration towards states where redexes are armed. Moreover, the machinery that is advocated to model stochastic interpretations of the essential non-determinism of the π-calculus is ad hoc. In fact, it has no relationship the π-calculus, itself, which has been put forward as a foundational model of computing. This brings us to the crux of why i have chosen in the presentation of the syntax to be explicit about fact that the type of process is parametric in the type of entity standing in the role of channel/port/name. That is, the syntax P[v] requires some type to be provided for v before it can be made concrete.

It turns out that there is a very natural candidate for this type that allows a minimal presentation, namely processes themselves. Or more accurately, codes for processes. We introduce a type

C[p] ::= @p // transcription

then we can set

RP = P[C[RP]]

We can also note that -- had we known that -- the type v comes with a notion transcription we could have equipped our processes with a notion of performance (= de-transcription). Retroactively, then we set

P[v] ::= // a process is either
0 // a termination, or
v?(v)P[v] // a reception, or
v!(P[v]) // a transmission, or
*v // a performance, or
P[v] | P[v] // a (parallel) composition

C[p] ::= @p // a transcription

RP = P[C[RP]] // tie the knot

Now, we might express a much more subtle idea of interaction as synchronization+substitution.

s = @S, t = @T, ∀ R. S|T →* R => R →* 0
s?(y)P | t!(Q) → P[@Q/y]

This "comm" rule says that if (this is all the stuff above the horizontal line) s is the transcription of some process, S, and t the transcription of some process T and that whenever you run S and T in parallel composition together whenever they (eventually) reduce to a state R, that state (eventually) reduces to 0,

then (this is all the stuff below the horizontal line) s and t can be used to synchronize for an interaction that results in a subsitution. This basic idea can be used to provide a template for different laws of attraction (resp., repulsion). Before we discuss that thought, however, the attentive reader might be wondering about something. If the "comm" rule is the most basic of the rules governing reduction, but it mentions reduction in its hypothesis, is this rule well-founded? Will it ever allow reduction to take place?

The answer is 'yes'! Note that we require that 0 is the unit of the parallel composition monoid and so 0|0 = 0 ; similarly, and after no steps we get from termination to termination, or in symbols 0 →* 0. From these two facts we have @0 can be used as a channel to synchronize with itself. Taking one step further we see that @(@0?(y)0) can be used to synchronize with @(@0!(0)). More generally, the reader may verify, we have that processes that cancel each other out, matching every output with an input, can be used to provide the codes for channels that can be used to synchronize.

Now, we introduce a new spin on this idea. Notice that to have a consequential reduction (one that occurs in the consequent of the comm rule) we need to check a hypothetical reduction (one that occurs in the hypothesis of the reduction). The proposal, then, is that attractiveness corresponds to minimal hypothetical reduction. Thus, @0 is maximally attracted to itself while @(@0?(y)0)is somewhat less attracted to @(@0!(0)). By the same token, repulsion will correspond to increases in hypothetical reduction. We are familar with processes that "go non-linear". For example, taking D(x) = x?(y)(*y|x!(y)) the term


produces P's ad nauseum. Therefore, @D(x)and @(x!(P|D(x))can be seen as repulsing each other because they keep increasing the amount of hypothetical reduction that must occur to determine if a synchronization may take place.

Note that under this interpretation there is no specific need to wire attraction or repulsion into an operational account of mixing. Instead, for a process of the form

P ≡ P1 | P2 | ... | Pn

we can consider all of the possible reductions as being evaluated simultaneously. Those synchronizations that involve less hypothetical reduction simply result in much more dense consequential reduction in the labeled transition system corresponding to the synchronization tree of the process. Intuitively, if before we can commit to doing some actual work we have to consider some hypothetical work which in turn causes us to hypothesize further, we are led away from the possibility of actual engagement. The position of maximal attractiveness is one in which the hypothetical work is literally a 'no-brainer', there is no hypothetical work to consider, and thus engagement immediate.

This proposal has many fewer moving parts than we find in the current stochastic versions of the π-calculus. Instead, we rely only on the original intuitions set forward by Milner, et al, plus 1, the idea of channels/ports/names as transcriptions or codes of processes. Yet, despite the fact that we have many fewer moving parts we arrive at an effective representation of the essential features of stochastic semantics. In fact, the account is more compelling in one view. The stochastic representation is arbitrary in the assignment of rates to channels. The reflective account given here is not. Rates of interaction are determined by the internal structure of channels on which a given process is free to interact. On the other hand, there are an infinite number of channels with a structure that can be faithfully embedded into a dense subset of the reals. This allows the richness of an effective representation of a continuum of rates.

Further, it is simply too attractive not to mention the apparent alignment with quantum mechanics. The density of consequential reduction bears a resemblance to the probability amplitude of the observation of certain states. An evaluation of this idea would involve a calculation in which it is possible to compositionally interpret <a|M|b> as expressions in the π-calculus, something like:

[<a|M|b>]π = [<a|]π|[M]π|[|b>]π

such that the density of consequential reduction reflected the probability amplitude of the observation of certain states. Lest this seem entirely idle hypothesizing, note that the reflective structure of our version of the π-calculus allows for an unusual duality between contexts and processes. More specifically, if contexts are processes with holes in them, i.e.

K[v] ::= // a context is either
◊ // a location, or
v?(v)K[v] // a reception, or
v!(K[v]) // a transmission, or
K[v] | P[v] // a (parallel) composition

and K[P] := K[P/◊]then we have a duality not normally available in the ordinary π-calculus. Namely,

K · P := @(K[P])

That is, we can define an operation analogous to an inner product in which we have the following correspondences
  • Process <-> Vector (written as a ket in Dirac notation, |b> above)
  • Context <-> Dual (i.e. a map from Vectors to Scalars, written as a bra in Dirac notation, <a|above)
  • Name <-> Scalar (the result of the calculation <a|M|b>)
The definition of the operation simply fills the hole in the context with the process and then transcribes the resulting process. In the context of evaluating whether or not there is an actual correspondence between our proposed law(s) of attraction and quantum mechanics we note that we now have process-theoretic candidates for essentially all of the components of the calculation of probability amplitude, <a|M|b>. The notion of an operator, M, is essentially dictated from the given candidate interpretations and the laws that theses quantities must respect; but that calculation would make this already over lengthy post even longer. Suffice it to say that the compositional interpretation hinges on two observations.
  1. Contexts are in 1-1 correspondence with higher-order processes of a single variable. That is, K |-> u?(x)K[*x]. Therefore, an inner product calculation, K · P, can be uniformly interpreted as a parallel composition. In other words, [<a|b>]π = [<a|]π|[|b>]π.
  2. The second observation is more subtle and has to do with an idea i sometimes call the iterated experiment. The key point is that [<a|M|b>]π is interpreted as a scalar which is interpreted as a name, but we must terminate in a process. The natural solution is that M is associated with a channel, c, on which we make the observation. This channel represents the continuation to which the observation is returned. Therefore, our calculation is refined as something like
[<a|M|b>]π(c) = [<a|]π(c)|[M]π(c)|[|b>]π(c)
In the language of the intuitions set out above, most of the interpretation involves very little additional hypothetical engagement. So, Occam's razor carves this correspondence out as one worthy of actual attention and consequential engagement.


third relation said...

Out of curiousity, do you know if this EBNF variant is ISO-14977 compliant?

leithaus said...

The variant i'm using is just syntactic sugar for a Haskell data type definition. Thus,

M[v] ::= v | lambda v.M[v] | M[v] M[v]

corresponds to

data M v = Var v | Abs v (M v) | App (M v) (M v)

i use a '2d' syntax in which new line + indentation represents of the vertical bar which represents choice. i do this because parallel composition is traditionally written as a vertical bar, and so there is a collision of notation. The modification to EBNF is readily graspable to those who are used to working with i, while there is not immediately obvious alternative choice for parallel composition.

However, when i make concrete syntax for a language like this i do something like the following.

P ::=
| V '?' '(' (V (',' V)) * ')' ';' P
| V '!' '(' P* ')'
| '*' V
| '{' P (',' P)* '}'
V ::= '@' P

This is very close to ISO-14977 compliant and makes sequencing look like sequencing in languages like C or Java. An alternative is to color the braces and give polymorphic interpretation is ';'. In sequencing braces {$| - |$} it means sequence and in parallelizing braces {*| - |*} it means parallel composition. This alternative is close to Haskell's idea of do-notation, but with two specific control-flow monads called out.

third relation said...

Ah, I see. That's useful to me, thanks for explaining it!

third relation said...

The idea of EBNF makes sense, but I was thinking that if your EBNF variant satisfied the ISO constraints then it had a commercial side possibly. Or, standards may evolve to include variants such as this one?