Friday, February 25, 2011

An argument for Datalog as the data manipulation language in a KV-DB

i just checked in my Prolog Term -> XQuery compiler. The main idea can be expressed in a simple example. The prolog term

  • seven(one("a", "b", "c"), two(X, Y), two("d", "e"))

compiles to the XQuery term

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
for $xqV1 in root/seven
let $xqV2 := $xqV1/*[0] ,
$xqV7 := $xqV1/*[1] ,
$xqV9 := $xqV1/*[2]
where ( count($xqV1/*) = 3 )
and ( for $xqV3 in $xqV1/one
let $xqV4 := $xqV3/*[0] ,
$xqV5 := $xqV3/*[1] ,
$xqV6 := $xqV3/*[2]
where ( count($xqV3/*) = 3 )
and ( $xqV4 = a )
and ( $xqV5 = b )
and ( $xqV6 = c )
and ( $xqV2 = $xqV3 )
return $xqV3 )
and ( for $xqV8 in $xqV1/two
where ( count($xqV8/*) = 2 )
and ( $xqV7 = $xqV8 )
return $xqV8 )
and ( for $xqV10 in $xqV1/two
let $xqV11 := $xqV10/*[0] ,
$xqV12 := $xqV10/*[1]
where ( count($xqV10/*) = 2 )
and ( $xqV11 = d )
and ( $xqV12 = e )
and ( $xqV9 = $xqV10 )
return $xqV10 )
return $xqV1

The interpretation is: find all the XML elements in the DB that (when turned into the obvious prolog terms) unify with the element. So, let us look at the element again

seven(one("a", "b", "c"), two(X, Y), two("d", "e"))

This may be interpreted as a prolog pattern with X and Y as variables. This pattern unifies with the term

seven(one("a", "b", "c"), two(true, 10), two("d", "e"))

Producing the substitution X := true, Y := 10.

Next, notice the obvious and evident isomorphism between the term

seven(one("a", "b", "c"), two(true, 10), two("d", "e"))

-- which is a ground term -- and the XML element


<seven>
<one>
<String>a</String>
<String>b</String>
<String>c</String>
</one>
<two>
<Bool>true</Bool>
<Int>10<Int>
</two>
<two>
<String>d</String>
<String>e</String>
</two>
</seven>
Let's call the Prolog -> XML direction of the morphism p and the opposite direction, x. The XQuery generated for our pattern, ptn, will pick out exactly those elements that unify with the pattern -- as if we had selected a xml element, e, from the document, checked

unifies( x( e ), ptn )

and if it does, added it to the result set. Of course, we could implement the approach just like that, but then we would lose all the efficiency of the XQuery engine. So, we use a little category theory and pull the unification algorithm through the p direction of the iso. This gives us a compiler from a prolog pattern to XQuery. Running the resulting algorithm on our pattern, ptn, gives the following XQuery (which is generated by the code in here beginning at line 265).

The beauty of this capability is that -- when coupled with previous idea about a parametrically polymorphic broker -- now the application programmer is completely free to elide distinctions between data being shipped around the internet as messages and data in store. They write programs of the form

for( event <- channel.get( "seven(one("a", "b", "c"), two(X, Y), two("d", "e"))" ) ) {
// event handling code here ...
}

And they don't really care if channel is a RabbitMQ queue or an instance of the eXist XML DB. Their code will block until such time as the data is available. We provide verbs for consuming the data (get) and just reading the data (fetch and subscribe). In the case of the latter the difference is whether the act of reading is linear (one-off) or replicated (it will keep reading from the channel).

(As an aside, flatMap and filter comes into play if they want to do complex event-handling code. For example,

for( askEvent <- askChannel.get( askPattern ); bidEvent <- bidChannel.get( bidPattern ) if (meetsTradeRequirements( askEvent, bidEvent ) ) ) {
issueTradeRequest( askEvent, bidEvent )
}

This code pattern will issueTradeRequests between buyer and seller just when the trade requirements are met between ask and bid.)

To my mind the expansion in the Prolog to XQuery direction (resp, compression in the opposite) is a pretty clear demonstration of the expressiveness in Prolog as a data manipulation language and supports the idea of using DataLog as the core of a NoSQL solution (as i do in my Unification-based K-V-DB found on github, here.)

Of course, it's also a powerful argument for a programming style that's all monads all the time. If you look at the data language here -- XQuery FLWOR expressions -- it's pure monadic structure. Likewise, the threading and session management of the broker is hidden behind another for-comprehension. The key point is that it all just composes!

Now, in up-coming posts we borrow another idea from Prolog: backtracking. We'd like the filter capability to be able to pull information from the streams in an explorative fashion and essentially undo this if the conditions aren't met. One of the tricky parts in this is exploring the streams in a fashion that will minimize some of the gotchas associated with interleaved examination of the stream contents. For this we will employ one of Oleg Kiselyov's insights.

No comments: