Friday, February 25, 2011

An argument for Datalog as the data manipulation language in a KV-DB

i just checked in my Prolog Term -> XQuery compiler. The main idea can be expressed in a simple example. The prolog term

  • seven(one("a", "b", "c"), two(X, Y), two("d", "e"))

compiles to the XQuery term

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
for $xqV1 in root/seven
let $xqV2 := $xqV1/*[0] ,
$xqV7 := $xqV1/*[1] ,
$xqV9 := $xqV1/*[2]
where ( count($xqV1/*) = 3 )
and ( for $xqV3 in $xqV1/one
let $xqV4 := $xqV3/*[0] ,
$xqV5 := $xqV3/*[1] ,
$xqV6 := $xqV3/*[2]
where ( count($xqV3/*) = 3 )
and ( $xqV4 = a )
and ( $xqV5 = b )
and ( $xqV6 = c )
and ( $xqV2 = $xqV3 )
return $xqV3 )
and ( for $xqV8 in $xqV1/two
where ( count($xqV8/*) = 2 )
and ( $xqV7 = $xqV8 )
return $xqV8 )
and ( for $xqV10 in $xqV1/two
let $xqV11 := $xqV10/*[0] ,
$xqV12 := $xqV10/*[1]
where ( count($xqV10/*) = 2 )
and ( $xqV11 = d )
and ( $xqV12 = e )
and ( $xqV9 = $xqV10 )
return $xqV10 )
return $xqV1

The interpretation is: find all the XML elements in the DB that (when turned into the obvious prolog terms) unify with the element. So, let us look at the element again

seven(one("a", "b", "c"), two(X, Y), two("d", "e"))

This may be interpreted as a prolog pattern with X and Y as variables. This pattern unifies with the term

seven(one("a", "b", "c"), two(true, 10), two("d", "e"))

Producing the substitution X := true, Y := 10.

Next, notice the obvious and evident isomorphism between the term

seven(one("a", "b", "c"), two(true, 10), two("d", "e"))

-- which is a ground term -- and the XML element


<seven>
<one>
<String>a</String>
<String>b</String>
<String>c</String>
</one>
<two>
<Bool>true</Bool>
<Int>10<Int>
</two>
<two>
<String>d</String>
<String>e</String>
</two>
</seven>
Let's call the Prolog -> XML direction of the morphism p and the opposite direction, x. The XQuery generated for our pattern, ptn, will pick out exactly those elements that unify with the pattern -- as if we had selected a xml element, e, from the document, checked

unifies( x( e ), ptn )

and if it does, added it to the result set. Of course, we could implement the approach just like that, but then we would lose all the efficiency of the XQuery engine. So, we use a little category theory and pull the unification algorithm through the p direction of the iso. This gives us a compiler from a prolog pattern to XQuery. Running the resulting algorithm on our pattern, ptn, gives the following XQuery (which is generated by the code in here beginning at line 265).

The beauty of this capability is that -- when coupled with previous idea about a parametrically polymorphic broker -- now the application programmer is completely free to elide distinctions between data being shipped around the internet as messages and data in store. They write programs of the form

for( event <- channel.get( "seven(one("a", "b", "c"), two(X, Y), two("d", "e"))" ) ) {
// event handling code here ...
}

And they don't really care if channel is a RabbitMQ queue or an instance of the eXist XML DB. Their code will block until such time as the data is available. We provide verbs for consuming the data (get) and just reading the data (fetch and subscribe). In the case of the latter the difference is whether the act of reading is linear (one-off) or replicated (it will keep reading from the channel).

(As an aside, flatMap and filter comes into play if they want to do complex event-handling code. For example,

for( askEvent <- askChannel.get( askPattern ); bidEvent <- bidChannel.get( bidPattern ) if (meetsTradeRequirements( askEvent, bidEvent ) ) ) {
issueTradeRequest( askEvent, bidEvent )
}

This code pattern will issueTradeRequests between buyer and seller just when the trade requirements are met between ask and bid.)

To my mind the expansion in the Prolog to XQuery direction (resp, compression in the opposite) is a pretty clear demonstration of the expressiveness in Prolog as a data manipulation language and supports the idea of using DataLog as the core of a NoSQL solution (as i do in my Unification-based K-V-DB found on github, here.)

Of course, it's also a powerful argument for a programming style that's all monads all the time. If you look at the data language here -- XQuery FLWOR expressions -- it's pure monadic structure. Likewise, the threading and session management of the broker is hidden behind another for-comprehension. The key point is that it all just composes!

Now, in up-coming posts we borrow another idea from Prolog: backtracking. We'd like the filter capability to be able to pull information from the streams in an explorative fashion and essentially undo this if the conditions aren't met. One of the tricky parts in this is exploring the streams in a fashion that will minimize some of the gotchas associated with interleaved examination of the stream contents. For this we will employ one of Oleg Kiselyov's insights.

Monday, February 14, 2011

Channel-based communication, monadically

The last time we checked in we looked at messaging from a monadic perspective. This time we want to look at a broader class of communication from the point of view of the monadic design pattern. At the heart of this idea is recognizing that what is currently the rage of key-value databases is actually a situation that has a wide range of potential interpretations all captured by a single parametrically polymorphic pattern. Consumers consume data from keyed locations in the store while producers deposit data to keyed locations in the store. The basic control abstraction that supports this situation is the delimited continuation. If a consumer is asking for data from keyed locations where no data exists the store grabs the consumer's thread's continuation up to the point of the request and stores it at the keyed the location. When a producer provides data to some or all of these locations the store first checks to see if there are any continuations waiting on these keys and -- if there are -- makes the data available to them first. This essentially resumes the consumer's thread's flow of control at the point in it's computation where it requested the data from the keyed location.




This basic flow of control abstraction is subject to variation in numerous ways including
  • varying the means of matching keyed location request to keyed location submission;
  • varying whether the request consumes the data, removing it from the location or simply passes a copy to a requester
  • varying whether a stored continuation is removed from the location when data is supplied or simply passed the data
The last variation covers a simple pub/sub mechanism. When continuations are persistent (i.e. not removed on matching data production), this is a simple subscription mechanism. When data is persisted this is storage. When it is not, this is message-passing. When key-matching has a certain level of sophistication this arrangement supports basic broadcast capability. If you look at the code sample and the traces in the screenshots you will see that we've used regular expressions as a basic matching. In the specialK repo we use unification.




All of these variations can be captured in a single configurable mechanism -- the configuration of which can be treated as policy governing communication between producers and consumers. To my mind this is really the pattern underlying the web. The web, in microcosm, is a parametrically polymorphic key-value db. Let's look at some code, however, before getting too philosophical.

The core of the implementation is in the code below. The full implementation is in the sansBNFC branch of the specialK repo.


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
package com.biosimilarity.lift.model.store

import com.biosimilarity.lift.lib._

import scala.concurrent.{Channel => Chan, _}
import scala.concurrent.cpsops._

import scala.util.continuations._
import scala.collection.MapProxy
import scala.collection.mutable.Map
import scala.collection.mutable.HashMap

trait MonadicTupleSpace[Place,Pattern,Resource]
// extends MapLike[Place,Resource, This]
extends MonadicGenerators with FJTaskRunners
{
self : WireTap with Journalist =>

type RK = Option[Resource] => Unit @suspendable
//type CK = Option[Resource] => Unit @suspendable
type Substitution <: Function1[Resource,Option[Resource]]

case class IdentitySubstitution( )
extends Function1[Resource,Option[Resource]] {
override def apply( rsrc : Resource ) = Some( rsrc )
}

case class PlaceInstance(
place : Place,
stuff : Either[Resource,List[RK]],
subst : Substitution
)

def theMeetingPlace : Map[Place,Resource]
def theChannels : Map[Place,Resource]
def theWaiters : Map[Place,List[RK]]
def theSubscriptions : Map[Place,List[RK]]

def fits( ptn : Pattern, place : Place ) : Boolean
def fitsK( ptn : Pattern, place : Place ) : Option[Substitution]
def representative( ptn : Pattern ) : Place

//def self = theMeetingPlace

override def itergen[T]( coll : Iterable[T] ) =
Generator {
gk : ( T => Unit @suspendable ) =>
val collItr = coll.iterator

while( collItr.hasNext ) {
gk( collItr.next )
}
}

def locations(
map : Either[Map[Place,Resource],Map[Place,List[RK]]],
ptn : Pattern
) : List[PlaceInstance] = {
def lox[Trgt,ITrgt](
m : Map[Place,Trgt],
inj : Trgt => ITrgt
) : List[(Place,ITrgt,Substitution)]
= {
( ( Nil : List[(Place,ITrgt,Substitution)] ) /: m )(
{
( acc, kv ) => {
val ( k, v ) = kv
fitsK( ptn, k ) match {
case Some( s ) => acc ++ List[(Place,ITrgt,Substitution)]( ( k, inj( v ), s ) )
case None => acc
}
}
}
)
}
val triples =
map match {
case Left( m ) => {
lox[Resource,Either[Resource,List[RK]]](
m, ( r ) => Left[Resource,List[RK]]( r )
)
}
case Right( m ) => {
lox[List[RK],Either[Resource,List[RK]]](
m, ( r ) => Right[Resource,List[RK]]( r )
)
}
}
triples.map(
( t ) => {
val ( p, e, s ) = t
PlaceInstance( p, e, s )
}
)
}

// val reportage = report( Luddite() ) _
def mget(
channels : Map[Place,Resource],
registered : Map[Place,List[RK]],
consume : Boolean
)( ptn : Pattern )
: Generator[Option[Resource],Unit,Unit] =
Generator {
rk : ( Option[Resource] => Unit @suspendable ) =>
shift {
outerk : ( Unit => Unit ) =>
reset {
val map = Left[Map[Place,Resource],Map[Place,List[RK]]]( channels )
val meets = locations( map, ptn )

if ( meets.isEmpty ) {
val place = representative( ptn )
tweet( "did not find a resource, storing a continuation: " + rk )
registered( place ) =
registered.get( place ).getOrElse( Nil ) ++ List( rk )
rk( None )
}
else {
for(
placeNRrscNSubst <- itergen[PlaceInstance](
meets
)
) {
val PlaceInstance( place, Left( rsrc ), s ) = placeNRrscNSubst

tweet( "found a resource: " + rsrc )
if ( consume ) {
channels -= place
}
rk( s( rsrc ) )

//shift { k : ( Unit => Unit ) => k() }
}
}
tweet( "get returning" )
outerk()
}
}
}

def get( ptn : Pattern ) =
mget( theMeetingPlace, theWaiters, true )( ptn )
def fetch( ptn : Pattern ) =
mget( theMeetingPlace, theWaiters, false )( ptn )
def subscribe( ptn : Pattern ) =
mget( theChannels, theSubscriptions, true )( ptn )

def putPlaces(
channels : Map[Place,Resource],
registered : Map[Place,List[RK]],
ptn : Pattern,
rsrc : Resource
)
: Generator[PlaceInstance,Unit,Unit] = {
Generator {
k : ( PlaceInstance => Unit @suspendable ) =>
// Are there outstanding waiters at this pattern?
val map = Right[Map[Place,Resource],Map[Place,List[RK]]]( registered )
val waitlist = locations( map, ptn )

waitlist match {
// Yes!
case waiter :: waiters => {
tweet( "found waiters waiting for a value at " + ptn )
val itr = waitlist.toList.iterator
while( itr.hasNext ) {
k( itr.next )
}
}
// No...
case Nil => {
// Store the rsrc at a representative of the ptn
tweet( "no waiters waiting for a value at " + ptn )
channels( representative( ptn ) ) = rsrc
}
}
}
}

def mput(
channels : Map[Place,Resource],
registered : Map[Place,List[RK]],
consume : Boolean
)( ptn : Pattern, rsrc : Resource ) : Unit @suspendable = {
for( placeNRKsNSubst <- putPlaces( channels, registered, ptn, rsrc ) ) {
val PlaceInstance( wtr, Right( rks ), s ) = placeNRKsNSubst
tweet( "waiters waiting for a value at " + wtr + " : " + rks )
rks match {
case rk :: rrks => {
if ( consume ) {
for( sk <- rks ) {
spawn {
sk( s( rsrc ) )
}
}
}
else {
registered( wtr ) = rrks
rk( s( rsrc ) )
}
}
case Nil => {
channels( wtr ) = rsrc
}
}
}

}

def put( ptn : Pattern, rsrc : Resource ) =
mput( theMeetingPlace, theWaiters, false )( ptn, rsrc )
def publish( ptn : Pattern, rsrc : Resource ) =
mput( theChannels, theSubscriptions, true )( ptn, rsrc )

}

import java.util.regex.{Pattern => RegexPtn, Matcher => RegexMatcher}

object MonadicTSpace
extends MonadicTupleSpace[String,String,String]
with WireTap
with Journalist
with ConfiggyReporting
with ConfiggyJournal
{

override type Substitution = IdentitySubstitution

override val theMeetingPlace = new HashMap[String,String]()
override val theChannels = new HashMap[String,String]()
override val theWaiters = new HashMap[String,List[RK]]()
override val theSubscriptions = new HashMap[String,List[RK]]()

override def tap [A] ( fact : A ) : Unit = {
reportage( fact )
}

def representative( ptn : String ) : String = {
ptn
}

def fits( ptn : String, place : String ) : Boolean = {
RegexPtn.matches( ptn, place ) || RegexPtn.matches( place, ptn )
}

def fitsK(
ptn : String,
place : String
) : Option[Substitution] = {
if ( fits( ptn, place ) ) {
Some( IdentitySubstitution() )
}
else {
None
}
}

}