comp.lang.lisp needs to stop complaining about overcommit

or: How I got sucked back into SBCL hacking on Christmas.
: david@radon:~; ps -o pid,vsz,rss,comm -p `pidof sbcl`
 1019 570424  4424 sbcl
: david@radon:~; ps -o pid,vsz,rss,comm -p `pidof sbcl`
 1019 963320 404204 sbcl
: david@radon:~; ps -o pid,vsz,rss,comm -p `pidof sbcl`
 1019 3988328 825804 sbcl


Ten years of Closure HTML

More than a decade ago, Gilbert Baumann started writing the Closure web browser. It includes a great HTML parser, written all in Lisp.

Released today, Closure HTML is a stand-alone version of the parser.

It supports HTML 4, understands malformed HTML, and can (optionally) be used in conjunction with Closure XML and its data structures.

An easy way to get started with Closure HTML itself is with its LHTML builder, which represents HTML elements as simple lisp lists.

Together, the two parsers can be used to turn HTML into XHTML or vice versa, and in particular to parse HTML into DOM or STP. Even for users who only parse and work with XHTML internally, the new code can be useful to emit normal HTML 4 as the last step of processing.


Forking SBCL for Dummies

http://repo.or.cz/ hosts a git repository for SBCL and makes it easy to publish your own fork of SBCL.

Here is a quick step-by-step guide for anyone planning to have his repository hosted there. (All of this will be painfully obvious to the git experts.)

  • Register a user account. All you need is an SSH public key, no questions asked.
  • Create the fork. Find the SBCL project and go to "fork". Enter a project name for the fork and an admin password.

Done. Now you have a fork, but you need to initialize it first.

  • Go to your "Project Settings" page and add yourself as a user.

Otherwise you cannot push to your own project.

  • Push into the fork. One way to do this is to clone the normal SBCL repository, then use
    git push --all ssh://yourusername@repo.or.cz/srv/git/sbcl/yourprojectname.git

Don't forget the --all, which instructs git to push all refs. Whatever a ref is, anyway.



It is unfinished, slow, buggy, unmaintained, in need of a rewrite -- and now you can hack it yourself!

Doesn't that sound exciting?  Of course it does.

Prerequisites. Only Linux/x86 is supported1. You will need several hours of spare CPU time, about 1 GB of RAM, and lots of disk space. Compilation involves building SBCL and classpath first, so make sure to install all required dependencies first. Debian users can run

# apt-get install sbcl svn cvs wget jikes
# apt-get build-dep classpath
to do so.

Build script. Grab cloak using git [edit: needs git 1.5, no idea why]:

$ git clone http://www.lichteblau.com/git/cloakbuild.git
and compile it using clbuild-like commands:
$ ./build update
$ ./build world

Usage. The bin directory contains scripts called java, javac (courtesy of ecj), javap, and javah that run Lisp with the right arguments.

$ ./bin/java -version
CLOAK Virtual Machine, running on SBCL (Linux 2.6.22 X86)

Copyright (C) 2003-2007 David Lichteblau

Technically it is a precompiler, and to avoid unpleasant surprises at run time, you might want to run

./bin/precompile foo.jar
before starting anything non-trivial.

Finally, read cloak/TODO and start hacking.

What's new? Compared to the big binary tarball available previously, this one comes with sources only, has been updated for current SBCL, and for Classpath 0.91 (which is still ancient, but a little step forward). The scripts in bin/ are also new.

1 No AMD 64 support yet. For now, use an x86 chroot instead.


A new data structure for XML

As mentioned earlier, I set out to define an alternative to the W3C's Document Object Model, inspired heavily by Java's XOM, but made for Common Lisp.

The result is STP, a data structure for XML that is full-featured and uses CLOS, but is more natural than DOM and gets namespaces right. Its implementation cxml-stp is available as an add-on library for Closure XML.

(For most purposes, it should be preferable to other alternatives, but DOM fans -- in case there are any -- can be assured that DOM support in cxml will not go away either.)

Read more about STP in the tutorial.


Macros for XSLT

Gary King is confused because [XSLT] seems so ridiculously verbose. Others have already suggested mad higher-order function tricks using XSLT 2.0.

My solution: Macros. If XSLT lacks an element to do what you want (creating a text node with a newline, in Gary's case), just invent the feature you need and send your XSLT stylesheet through another XSLT stylesheet to implement it.


Say you have demo.xsl which wants to use

to emit a newline. (In this example, `x' is simply the namespace for our extensions.) Write an additional stylesheet macros.xsl and send the original demo.xsl through the macro stylesheet to generate the actual XSLT source code. A macro template for <x:br> would be as simple as:
  <xsl:template match="x:br">
In the macro stylesheet, xsl is the namespace of the "macro definition" and _xsl is the namespace of the "macro expansion". (If you care about details, the trick is to use xsl:namespace-alias to make the XSLT processor believe they are different namespaces.)


For a more interesting example of macro use, suppose we want to repeat our code count times. Doing this kind of iteration involves a recursive template call, which we want to hide. We will define a macro <x:dotimes> that can be used like this:
  <x:dotimes var="i" count="3">
    <xsl:value-of select="$i"/>
Our macro stylesheet replaces each use of <x:dotimes> with a template call, and adds a recursive template as a top-level element:
  <xsl:template match="xsl:stylesheet">
      <xsl:apply-templates select="@*|node()"/>
      <xsl:for-each select="//x:dotimes">
        <_xsl:template name="x:dotimes_{generate-id()}">
          ... recursive template definition here ...
  <xsl:template match="x:dotimes">
    <_xsl:call-template name="x:dotimes_{generate-id()}">
      ... parameters elided for brevity ...
Download the full macros.xsl and demo.xsl to try the example. To run it with xsltproc, use the Makefile in the same directory.


There's exactly one way to do it

XOM is a DOM alternative written in Java and for Java -- in contrast to DOM, which feels wrong in almost every language.

Key phrases:
  • "Comatose lists"
  • This is a cathedral, not a bazaar
  • There's exactly one way to do it
  • The Wrong Side of 80/20

Lots of good ideas waiting to be stolen. Stay tuned for a Common Lisp adaptation.


clbuild has a new home

As Planet Lisp readers should already know, Luke Gorrie's clbuild has been evolving slowly in at least three different darcs repositories for the last few months.

Someone must have decided that it was time to give it a new home and created a shiny new clbuild project on common-lisp.net.

This completely empty project looked a little sad and lonely though, so I gave it a home page.

Hope you don't mind -- or if you do, you shouldn't have given me write access...


Relax NG for cxml

Relax NG is the friendly schema validation standard for XML with clean namespace support, data types, and uniform treatment of attributes. It is closed under union (allows ambiguity). And it offers a compact non-XML syntax.

Released today, cxml-rng is my implementation of Relax NG in Common Lisp, as an extension to Closure XML.

Learn Relax NG through van der Vlist's book, read The Design of Relax NG by James Clark, pump subtrees, or try cxml-rng yourself.


Federal trojan horse

(Apologies for non-Lisp content.)

German police is going to install surveillance software on suspects' computers.

They are going to install it online through the Internet without anyone noticing, and of course they will do it without exploiting security vulnerabilities. Here is what the president of the Federal Criminal Police Office has to say about it (in german):

taz: How will the "Online-Search" of a computer work technically then?

Ziercke: Naturally I cannot discuss that publically.


clbuild on cygwin

Step 1: SBCL

Install SBCL using its Windows .msi installer.

Step 2: Cygwin and darcs

Since clbuild is a shell script, you need to install cygwin, even though SBCL itself does not depend on it.

Get it from cygwin.org. Make sure to select all packages that clbuild uses to download software. You will need at least cvs, subversion, and wget. In addition, you might want to install emacs (for slime) and X (for CLIM with the CLX backend).

[EDIT: Don't use emacs from cygwin, install the native Windows port of Emacs instead.--2007-06-24]

Not included with cygwin is darcs, but it has a cygwin port, so download it manually from darcs.net and add it to your $PATH.

Step 3: clbuild

Cygwin support is new in my clbuild tree, so until another clbuild hacker merges those changes, fetch it from:

$ darcs get http://www.lichteblau.com/blubba/clbuild

Step 4: Bleeding edge

ASDF as included with SBCL 1.0.3 does not work with clbuild, so you need to replace it with a version including my patch for Windows shortcut support.

Download asdf.lisp and asdf.fasl and copy them into the asdf/ subdirectory of your SBCL installation, replacing the original versions. (diff)

[EDIT: the asdf.fasl linked there isn't up-to-date anymore, but asdf.lisp and asdf.diff are still there, including some fixes. Drop them into your source tree and recompile SBCL. --2007-06-24]

Run clbuild

That's it. Now just run clbuild:

$ cd clbuild
clbuild$ chmod +x clbuild
clbuild$ ./clbuild build

To run CLIM applications using the CLX backend, start an X server first and set $DISPLAY accordingly. (It appears to be necessary to specify an IP address in $DISPLAY so that CLX does not attempt a unix domain socket connection.)

clbuild$ X&
clbuild$ export DISPLAY=
clbuild$ ./clbuild listener

Optional: Gtkairo

To try CLIM's gtkairo backend instead, download GTK+ from gimp-win.sf.net. (For some reason, the installer is wrapped in a zip file.)

Add the bin directory of that GTK+ installation to your PATH and configure clbuild to use gtkairo:

clbuild$ export PATH="/cygdrive/c/Programme/gtk-2.10/bin:$PATH"
clbuild$ export CLIM_BACKEND=gtkairo
clbuild$ ./clbuild listener

ObScreenshot of the listener.

(Expect to find some gtkairo/Windows repainting bugs though.)


Klacks parsing

Closure XML has been based on a SAX-like API for several years now (in addition to the DOM implementation on top of that). But although the pervasive use of SAX within CXML itself has been a success story, most users seem to prefer DOM usage over SAX handler hacks. Anyone who has ever parsed a non-trivial schema using SAX knows why: Maintaining separate start-element and end-element methods is very inconvenient. Code ends up dispatching on tag names using huge case forms while doing all bookkeeping manually in slots of the handler instance.

Starting with the current release of CXML, there is now a new parser interface called Klacks.

Similar to StAX, the new interface is more convenient than SAX, while still providing the same features as the old one, including validation.

Basically, the klacks parser can be used as a (rather sophisticated) tokenizer, and you get to write a recursive descent parser based on that.

SAX and StAX are Java's protocols for XML parsing. Sometimes they are being referred to as low-level interfaces for "expert" use only (the suggested alternative being something like DOM), but their purpose is really to parse XML without building an in-memory representation.

Low-level or not, they are the right choice when parsing into application-defined data structures or when performing simple on-the-fly transformation of XML data as it is being read.

In SAX, an XML parser will process the entire document in one go, emitting events as it sees them. User code needs to implement its own handler class, with methods for the events it cares about. The SAX concept is known as "push-based".

In contrast, the "pull-based" StAX parsing model is similar to working with an input stream. User code starts by creating an input stream object for the XML document, then reads events from that stream one by one. (Klacks uses the term source instead of stream, to avoid confusion with Common Lisp streams.)

API design choices. StAX distinguishes between a high-level API, which creates a Java object for each event, and the low-level API, which just returns an enum indicating the type of event, and has separate methods to access the current event's data.

Klacks has just one set of functions for both purposes, since it seemed more lispy to use multiple values. Instead of returning just a keyword indicating the event type, the main klacks functions always include useful event data as additional return values.

Java's StAX also includes classes for XML serialization. No such extension was needed for CXML, since it already supports convenient serialization using SAX events. The with-element macro and related functions make generation of those events easy.

Simple klacks parsing example:
* (defparameter *source* (cxml:make-source "<example>text</example>"))
* (klacks:peek-next *source*)
* (klacks:peek-next *source*)
NIL                      ;namespace URI
"example"                ;local name
"example"                ;qualified name
* ...


McCLIM's tab layout

Paolo Amoroso used to blog about McCLIM commits. But he stopped posting updates some months ago, so here is my own report, starting with the tab-layout.

The original tab-layout, written by Max-Gerd Retzlaff, implements a CLIM pane similar to what GTK+ calls GtkNotebook, using only portable CLIM mechanisms to do so:

With my changes, committed a few weeks ago, the tab-layout's architecture is now closer to other gadgets in CLIM, which are split into an abstract superclass and several frame-manager-specific subclasses. This new version allows the Gtkairo backend to implement its own subclass using a native GtkNotebook:

One aspect of the tab-layout that still stands out is the use of presentations and commands. While both are fundamental CLIM concepts, other gadgets defined in CLIM 2 come without any integration into the presentation system, so there was some discussion about turning the tab-layout into a "proper" gadget using simple callbacks instead of commands.

In the end, I decided to keep the use of presentations in the generic tab layout and hack its Gtkairo version to simulate them, too. To me, the greatest advantage of this implementation is that, thanks to presentations, there is an easy way to define context menus for tab pages in a CLIMy way.

You can try the new-and-improved tab-layout using simple demo code included with clim-examples (start it using (clim-demo::demodemo)), or by trying one of currently two real applications featuring tabs. One of them is beirc, the other is the web browser closure, which now supports tabbed browsing!