Saturday, May 29, 2010

Zip tease

The last posting reviewed Huet's original proposal for the Zipper. Before doing a technical deep dive on some of the consequences of that idea i thought i important to provide some context on how the idea has been developed. Since it was brought to light there have been two important variations rung which i discuss below.

Two kinds of genericity
It turns out that Huet's discovery can be made to work on a much wider class of structures than "just" trees. Intuitively speaking, if their type arguments are "zippable", then virtually all of the common functional data type constructors, including sequencing constructors like product, and branching constructors, like summation or "casing", result in "zippable" types. That is, there are procedures for deriving a notion of zipper capable of traversing and mutating the structure. Essentially, there are two strategies to achieve this genericity: one is based on structural genericity and the other on procedural genericity.

Genericity of structure

The former approach relies on being able to define a notion of context for any "reasonable" data structure. Not surprisingly, it turns out that we can give a good definition of "reasonable". What is surprising is that the resulting definition is amenable to an operation that perfectly mimics the notion of derivative from Newton's calculus. The operation is an operation on types. This allows us to give a type-level definition of the notion of location -- just as we did with trees, but now for any type.

We can use Scala's type notation to see where the new genericity has been added. The type of trees in the example is already polymorphic: Tree[A]. That's what having that type parameter
A means. The navigation trait is therefore also parametric in A. The navigation trait, however, is hardcoded in the container type, Tree[A]. When we add this second level of genericity, the navigation trait will have to take a second, higher-kinded type parameter for the container because it will work on any container within a range of reasonably defined containers.

The use case we have been considering -- navigating and mutating an in-memory representation of a tree -- is then extended to navigating and mutating an in-memory representation of an arbitrary data structure. Moreover, the code is purely functional -- with all of the attendant advantages of purely functional code we have been observing
since forevever. Obviously, in the context of the web, this particular use case is of considerable interest. Nearly, every web application is of this form: navigating a tree or graph of pages. Usually, that graph of pages is somehow homomorphic, i.e. an image of, the graph of some
underlying domain data structure, like the data structures of employee records in a payroll system, or the social graph of a social media application like Twitter. Many web applications, such as so-called content management systems, also support the mutation of the graph of
pages. So, having a method of generating this functionality from the types of the underlying data domain, be they web pages, or some other domain data type, is clearly pertinent to the most focused of application developers.

And yet, the notion of a derivative of data types is irresistibly intriguing. It's not simply that it has many other applications besides web navigation and update. That a calculational device that an Englishman discovered some 400+ years ago in his investigations for providing a mathematical framework for gravitation and other physical phenomena should be applicable to structuring computer programs is as surprising as it is elegant and that makes it cool.

Genericity of control

The latter approach to generically constructing zippers is just as rich in terms of the world of ideas it opens up as it is in the imminent practicality of its immediate applications. The key insight is to abstract on control, rather than on form. Not surprisingly, then the central tool is the (delimited) continuation. To be clear, in this approach, originally developed by Oleg Kiselyov, navigation is reifed as a function and supplied as a parameter. In this sense, it is not automagically deriving mechanism for navigation, as does the structural approach. The semantics of mutation, on the other hand, is provided with a powerful generative mechanism. More specifically, a dial is provided for the visibility of mutation with respect to different threads of control. In other words, fine-grained constrol on the transactional semantics of mutating the data structure is provided. This is exceptionally powerful because, as we have mentioned in many previous posts, the transactional semantics is one of the principal places on which performance of a system -- especially a high-volume system -- hinges; but, by being based on a form of monad, namely delimited continuations, the abstraction gets the compiler involved. This has the effect of enlisting the compiler in maintaining discipline and sanity on transaction semantics -- which is vitally important when supplying a fine-grained control on something as performance-critical as the semantics and visibility of update.

23 comments:

Ben Hutchison said...

Hi Greg,

Im curious what zippers are "good for", in practical terms?

I appreciate that they allow constant time updates to large immutable tree-like data structures, that might otherwise require eg a logN copy of all nodes leading back to the root.

But the price of doing this at that the modified Zipper is a "subjective" view of the data from the focus, or viewpoint, of the modifying user. That seems to require doing work before sharing with, or passing to, other users/threads/downstream-consumers of the data, who may have, or want, another focus.

Luc Duponcheel said...

Hi Ben,

one of the things zippers are good for is replacing structural recursion with tail recursion (that can be optimized using standard techniques)

see:
http://www.cs.nott.ac.uk/~ctm/CJ.pdf

Luc

Ben Hutchison said...

Oh dear, I know I'm in too deep when Im redirected to a Conor McBride paper ;)

Alex said...

Hi,

thanks for your blog post! Is there perhaps a more refined theory behind these data type derivatives? How does it perfectly mimic Newton's calculus?

Thanks,
Alex

清民 said...

幽默並不是諷刺,它或許帶有溫和的嘲諷,卻不傷人,它可能是以別人,也可以用自己為對象。........................................

寶堅強皓 said...

人生是故事的創造與遺忘。 ..................................................

曾文詠 said...

出遊不拘名勝,有景就是好的............................................................

茂俊雅俊憲珊 said...

As a man sows, so he shall reap...................................................

samskivert said...

For a more formal presentation, you can once again consult CMB:

http://strictlypositive.org/diff.pdf

許美玉 said...

Necessity is the mother of invention...................................................................

建月 said...

真是太有道理了~~我支持你~~~.................................................................

江婷 said...

幸福不是一切,人還有責任。.................................................................

亦妮 said...

死亡是悲哀的,但活得不快樂更悲哀。......................................................................

葉婷 said...

成熟,就是有能力適應生活中的模糊。.................................................................

雲亨雲亨雲亨 said...

人生是故事的創造與遺忘。............................................................

群平群平 said...

Hello~Nice meet you~~............................................................

王名仁 said...

向著星球長驅直進的人,反比踟躕在峽路上的人,更容易達到目的。............................................................

吳婷婷 said...

友誼能增進快樂,減少痛苦............................................................

新順 said...

留言是種美德-感謝分享..................................................................

淑芳淑芳 said...

晚上好,很喜歡你的blog哦...............................................................

吳婷婷 said...

People throw stones only at trees with fruit on them.............................................................

楊儀卉 said...

噴泉的高度,不會超過它的源頭。一個人的事業也是如此,它的成就絕不會超過自己的信念。..................................................................

江桂宸江桂宸 said...

臨淵羨魚,不如退而結網。............................................................