musings of a Lispnik

Daniel Janus's blog

Combining virtual sequences or, Sequential Fun with Macros or, How to Implement Clojure-Like Pseudo-Sequences with Poor Man’s Laziness in a Predominantly Imperative Language

| Comments

Sequences and iteration

There are a number of motivations for this post. One stems from my extensive exposure to Clojure over the past few years: this was, and still is, my primary programming language for everyday work. Soon, I realized that much of the power of Clojure comes from a sequence abstraction being one of its central concepts, and a standard library that contains many sequence-manipulating functions. It turns out that by combining them it is possible to solve a wide range of problems in a concise, high-level way. In contrast, it pays to think in terms of whole sequences, rather than individual elements.

Another motivation comes from a classical piece of functional programming humour, The Evolution of a Haskell Programmer. If you don’t know it, go check it out: it consists of several Haskell implementations of factorial, starting out from a straightforward recursive definition, passing through absolutely hilarious versions involving category-theoretical concepts, and finally arriving at this simple version that is considered most idiomatic:

1
fac n = product [1..n]

This is very Clojure-like in that it involves a sequence (a list comprehension). In Clojure, this could be implemented as

1
2
(defn fac [n]
  (reduce * 1 (range 1 (inc n))))

Now, I thought to myself, how would I write factorial in an imperative language? Say, Pascal?

1
2
3
4
5
6
7
8
9
function fac(n : integer) : integer;
var
  i, res : integer;
begin
  res := 1;
  for i := 1 to n do
    res := res * i;
  fac := res;
end;

This is very different from the functional version that works with sequences. It is much more elaborate, introducing an explicit loop. On the other hand, it’s memory efficient: it’s clear that its memory requirements are O(1), whereas a naïve implementation of a sequence would need O(n) to construct it all in memory and then reduce it down to a single value.

Or is it really that different? Think of the changing values of i in that loop. On first iteration it is 1, on second iteration it’s 2, and so on up to n. Therefore, one can really think of a for loop as a sequence! I call it a “virtual” sequence, since it is not an actual data structure; it’s just a snippet of code.

To rephrase it as a definition: a virtual sequence is a snippet of code that (presumably repeatedly) yields the member values.

Let’s write some code!

To illustrate it, throughout the remainder of this article I will be using Common Lisp, for the following reasons:

  • It allows for imperative style, including GOTO-like statements. This will enable us to generate very low-level code.
  • Thanks to macros, we will be able to obtain interesting transformations.

Okay, so let’s have a look at how to generate a one-element sequence. Simple enough:

1
2
(defmacro vsingle (x)
 `(yield ,x))

The name VSINGLE stands for “Virtual sequence that just yields a SINGLE element”. (In general, I will try to define virtual sequences named and performing similarly to their Clojure counterparts here; whenever there is a name clash with an already existing CL function, the name will be prefixed with V.) We will not concern ourselves with the actual definition of YIELD at the moment; for debugging, we can define it just as printing the value to the standard output.

1
2
(defun yield (x)
  (format t "~A~%" x))

We can also convert a Lisp list to a virtual sequence which just yields each element of the list in turn:

1
2
3
4
5
(defmacro vseq (list)
  `(loop for x in ,list do (yield x)))

(defmacro vlist (&rest elems)
  `(vseq (list ,@elems)))

Now let’s try to define RANGE. We could use loop, but for the sake of example, let’s pretend that it doesn’t exist and write a macro that expands to low-level GOTO-ridden code. For those of you who are not familiar with Common Lisp, GO is like GOTO, except it takes a label that should be established within a TAGBODY container.

1
2
3
4
5
6
7
8
9
10
11
12
13
(defmacro range (start &optional end (step 1))
  (unless end
    (setf end start start 0))
  (let ((fv (gensym)))
    `(let ((,fv ,start))
       (tagbody
        loop
          (when (>= ,fv ,end)
            (go out))
          (yield ,fv)
          (incf ,fv ,step)
          (go loop)
       out))))

Infinite virtual sequences are also possible. After all, there’s nothing preventing us from considering a snippet of code that loops infinitely, executing YIELD, as a virtual sequence! We will define the equivalent of Clojure’s iterate: given a function fun and initial value val, it will repeatedly generate val, (fun val), (fun (fun val)), etc.

1
2
3
4
5
6
7
(defmacro iterate (fun val)
  (let ((fv (gensym)))
    `(let ((,fv ,val))
       (tagbody loop
          (yield ,fv)
          (setf ,fv (funcall ,fun ,fv))
          (go loop)))))

So far, we have defined a number of ways to create virtual sequences. Now let’s ask ourselves: is there a way, given code for a virtual sequence, to yield only the elements from the original that satisfy a certain predicate? In other words, can we define a filter for virtual sequences? Sure enough. Just replace every occurrence of yield with code that checks whether the yielded value satisfies the predicate, and only if it does invokes yield.

First we write a simple code walker that applies some transformation to every yield occurrence in a given snippet:

1
2
3
4
5
6
(defun replace-yield (tree replace)
  (if (consp tree)
      (if (eql (car tree) 'yield)
          (funcall replace (cadr tree))
          (loop for x in tree collect (replace-yield x replace)))
      tree))

We can now write filter like this:

1
2
3
(defmacro filter (pred vseq &environment env)
  (replace-yield (macroexpand vseq env)
                 (lambda (x) `(when (funcall ,pred ,x) (yield ,x)))))

It is important to point out that since filter is a macro, the arguments are passed to it unevaluated, so if vseq is a virtual sequence definition like (range 10), we need to macroexpand it before replacing yield.

We can now verify that (filter #'evenp (range 10)) works. It macroexpands to something similar to

1
2
3
4
5
6
7
8
9
(LET ((#:G70192 0))
  (TAGBODY
    LOOP (IF (>= #:G70192 10)
           (PROGN (GO OUT)))
         (IF (FUNCALL #'EVENP #:G70192)
           (PROGN (YIELD #:G70192)))
         (SETQ #:G70192 (+ #:G70192 1))
         (GO LOOP)
    OUT))

concat is extremely simple. To produce all elements of vseq1 followed by all elements of vseq2, just execute code corresponding to vseq1 and then code corresponding to vseq2. Or, for multiple sequences:

1
2
(defmacro concat (&rest vseqs)
  `(progn ,@vseqs))

To define take, we’ll need to wrap the original code in a block that can be escaped from by means of return-from (which is just another form of goto). We’ll add a counter that will start from n and keep decreasing on each yield; once it reaches zero, we escape the block:

1
2
3
4
5
6
7
8
9
10
(defmacro take (n vseq &environment env)
  (let ((x (gensym))
        (b (gensym)))
    `(let ((,x ,n))
       (block ,b
         ,(replace-yield (macroexpand vseq env)
                         (lambda (y) `(progn (yield ,y)
                                             (decf ,x)
                                             (when (zerop ,x)
                                               (return-from ,b)))))))))

rest (or, rather, vrest, as that name is taken) can be defined similarly:

1
2
3
4
5
(defmacro vrest (vseq &environment env)
  (let ((skipped (gensym)))
    (replace-yield
     `(let ((,skipped nil)) ,(macroexpand vseq env))
     (lambda (x) `(if ,skipped (yield ,x) (setf ,skipped t))))))

vfirst is another matter. It should return a value instead of producing a virtual sequence, so we need to actually execute the code — but with yield bound to something else. We want to establish a block as with take, but our yield will immediately return from the block once the first value is yielded:

1
2
3
4
5
(defmacro vfirst (vseq)
  (let ((block-name (gensym)))
   `(block ,block-name
      (flet ((yield (x) (return-from ,block-name x)))
        ,vseq))))

Note that so far we’ve seen three classes of macros:

  • macros that create virtual sequences;
  • macros that transform virtual sequences to another virtual sequences;
  • and finally, vfirst is our first example of a macro that produces a result out of a virtual sequence.

Our next logical step is vreduce. Again, we’ll produce code that rebinds yield: this time to a function that replaces the value of a variable (the accumulator) by result of calling a function on the accumulator’s old value and the value being yielded.

1
2
3
4
5
(defmacro vreduce (f val vseq)
  `(let ((accu ,val))
     (flet ((yield (x) (setf accu (funcall ,f accu x))))
       ,vseq
       accu)))

We can now build a constructs that executes a virtual sequence and wraps the results up as a Lisp list, in terms of vreduce.

1
2
3
4
5
(defun conj (x y)
  (cons y x))

(defmacro realize (vseq)
 `(nreverse (vreduce #'conj nil ,vseq)))

Let’s verify that it works:

1
2
3
4
5
CL-USER> (realize (range 10))
(0 1 2 3 4 5 6 7 8 9)

CL-USER> (realize (take 5 (filter #'oddp (iterate #'1+ 0))))
(1 3 5 7 9)

Hey! Did we just manipulate an infinite sequence and got the result in a finite amount of time? And that without explicit support for laziness in our language? How cool is that?!

Anyway, let’s finally define our factorial:

1
2
(defun fac (n)
  (vreduce #'* 1 (range 1 (1+ n))))

Benchmarking

Factorials grow too fast, so for the purpose of benchmarking let’s write a function that adds numbers from 0 below n, in sequence-y style. First using Common Lisp builtins:

1
2
(defun sum-below (n)
  (reduce #'+ (loop for i from 0 below n collect i) :initial-value 0))

And now with our virtual sequences:

1
2
(defun sum-below-2 (n)
  (vreduce #'+ 0 (range n)))

Let’s try to time the two versions. On my Mac running Clozure CL 1.7, this gives:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CL-USER> (time (sum-below 10000000))
(SUM-BELOW 10000000) took 8,545,512 microseconds (8.545512 seconds) to run
                    with 2 available CPU cores.
During that period, 2,367,207 microseconds (2.367207 seconds) were spent in user mode
                    270,481 microseconds (0.270481 seconds) were spent in system mode
5,906,274 microseconds (5.906274 seconds) was spent in GC.
 160,000,016 bytes of memory allocated.
 39,479 minor page faults, 1,359 major page faults, 0 swaps.
49999995000000

CL-USER> (time (sum-below-2 10000000))
(SUM-BELOW-2 10000000) took 123,081 microseconds (0.123081 seconds) to run
                    with 2 available CPU cores.
During that period, 127,632 microseconds (0.127632 seconds) were spent in user mode
                    666 microseconds (0.000666 seconds) were spent in system mode
 4 minor page faults, 0 major page faults, 0 swaps.
49999995000000

As expected, SUM-BELOW-2 is much faster, causes less page faults and presumably conses less. (Critics will be quick to point out that we could idiomatically write it using LOOP’s SUM/SUMMING clause, which would probably be yet faster, and I agree; yet if we were reducing by something other than + — something that LOOP has not built in as a clause — this would not be an option.)

Conclusion

We have seen how snippets of code can be viewed as sequences and how to combine them to produce other virtual sequences. As we are nearing the end of this article, it is perhaps fitting to ask: what are the limitations and drawbacks of this approach?

Clearly, this kind of sequences is less powerful than “ordinary” sequences such as Clojure’s. The fact that we’ve built them on macros means that once we escape the world of code transformation by invoking some macro of the third class, we can’t manipulate them anymore. In Clojure world, first and rest are very similar; in virtual sequences, they are altogether different: they belong to different world. The same goes for map (had we defined one) and reduce.

But imagine that instead of having just one programming language, we have a high-level language A in which we are writing macros that expand to code in a low-level language B. It is important to point out that the generated code is very low-level. It could almost be assembly: in fact, most of the macros we’ve written don’t even require language B to have composite data-types beyond the type of elements of collections (which could be simple integers)!

Is there a practical side to this? I don’t know: to me it just seems to be something with hack value. Time will tell if I can put it to good use.

Color your own Europe with Clojure!

| Comments

This is a slightly edited translation of an article I first published on my Polish blog on January 19, 2011. It is meant to target newcomers to Clojure and show how to use Clojure to solve a simple real-life problems.

The problem

Some time ago I was asked to prepare a couple of differently-colored maps of Europe. I got some datasets which mapped countries of Europe to numerical values: the greater the value, the darker the corresponding color should be. A sample colored map looked like this:

I began by downloading an easily editable map from Wikipedia Commons, calculated the required color intensities for the first dataset, launched Inkscape and started coloring. After half an hour of tedious clicking, I realized that I would be better off writing a simple program in Clojure that would generate the map for me. It turned out to be an easy task: the remainder of this article will be an attempt to reconstruct my steps.

SVG

The format of the source image is SVG. I knew it was an XML-based vector graphics format, I’d often encountered images in this format on Wikipedia — but editing it by hand was new to me. Luckily, it turned out that the image has a simple structure. Each country’s envelope curve is described with a path element that looks like this:

1
2
3
4
<path
   id="pl"
   class="eu europe"
   d="a long list of curve node coordinates" />

An important thing to note here is the id attribute — this is the two-letter ISO-3166-1-ALPHA2 country code. In fact, there is an informative comment right at the beginning of the image that explains the naming conventions used. Having such a splendid input was of great help.

Just like HTML, SVG uses CSS stylesheets to define the look of an element. All that is needed to color Poland red is to style the element with a fill attribute:

1
2
3
4
5
<path
   id="pl"
   style="fill: #ff0000;"
   class="eu europe"
   d="a long list of curve node coordinates" />

Now that we know all this, let’s start coding!

XML in Clojure

The basic way to handle XML in Clojure is to use the clojure.xml namespace, which contains functions that parse XML (on a DOM basis, i.e., into an in-memory tree structure) and serialize such structures back into XML. Let us launch a REPL and start by reading our map and parsing it:

1
2
3
4
5
6
> (use 'clojure.xml)
nil
> (def m (parse "/home/nathell/eur/Blank_map_of_Europe.svg"))
[...a long while...]
Unexpected end of file from server
  [Thrown class java.net.SocketException]

Hold on in there! What’s that SocketException doing here? Firefox displays this map properly, so does Chrome, WTF?! Shouldn’t everything work fine in such a great language as Clojure?

Well, the language is as good as its libraries — and when it comes to Clojure, one can stretch that thought further: Clojure libraries are as good as the Java libraries they use under the hood. In this case, we’ve encountered a feature of the standard Java XML parser (from javax.xml package). It is restrictive and tries to reject invalid documents (even if they are well-formed). If the file being parsed contains a DOCTYPE declaration, the Java parser, and hence clojure.xml/parse, tries to download the DTD schema from the given address and validate the document against that schema. This is unfortunate in many aspects, especially from the point of view of the World Wide Web Consortium, since their servers hold the Web standards. One can easily imagine the volume of network traffic this generates: W3C has a blog post about it. Many Java programmers have encountered this problem at some time. There are a few solutions; we will go the simplest way and just manually remove the offending DOCTYPE declaration.

1
2
3
4
> (def m (parse "/home/nathell/eur/bm.svg"))
#'user/m
> m
[...many screenfuls of numbers...]

This time we managed to parse the image. Viewing the structure is not easy because of its sheer size (as expected: the file weighs in at over 0,5 MB!), but from the very first characters of the REPL’s output we can make out that’s it a Clojure map (no pun intended). Let’s examine its keys:

1
2
> (keys m)
(:tag :attrs :content)

So the map contains three entries with descriptive names. :tag contains the name of the XML elements, :attrs is a map of attributes for this element, and content is a vector of its subelements, each in turn being represented by similarly structured map (or a string if it’s a text node):

1
2
3
4
5
6
> (:tag m)
:svg
> (:attrs m)
{:xmlns "http://www.w3.org/2000/svg", :width "680", :height "520", :viewBox "1754 161 9938 7945", :version "1.0", :id "svg2"}
> (count (:content m))
68

Just for the sake of practice, let’s try to write the serialized representation of the parsed back as XML. The function emit should be able to do it, but it prints XML to standard output. We can use the with-out-writer macro from the namespace clojure.contrib.io to dump the XML to a file:

1
2
3
4
> (use 'clojure.contrib.io)
nil
> (with-out-writer "/tmp/a.svg" (emit m))
nil

We try to view a.svg in Firefox and…

Error parsing XML: not well-formed
Area: file:///tmp/a.xml
Row 15, column 44: Updated to reflect dissolution of Serbia & Montenegro: http://commons.wikimedia.org/wiki/User:Zirland
                 -------------------------------------------^

It turns out that using clojure.xml/emit is not recommended, because it does not handle XML entities in comments correctly; we should use clojure.contrib.lazy-xml instead. For the sake of example, though, let’s stay with emit and manually remove the offending line once again (we can safely do it, since that’s just a comment).

Coloring Poland

We saw earlier that our main XML node contains 68 subnodes. Let’s see what they are — tag names will suffice:

1
2
> (map :tag (:content m))
(:title :desc :defs :rect :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :g :path :path :g :path :path :path)

So far, so good. Seems that all country descriptions are contained directly in the main node. Let us try to find Poland:

1
2
3
4
> (count (filter #(and (= (:tag %) :path)
                       (= ((:attrs %) :id) "pl"))
                 (:content m)))
1

(This snippet of code filters the list of subnodes of m to pick only those elements whose tag name is path and value of attribute id is pl, and returns the length of such list.) Let’s try to add a style attribute to that element, according to what we said earlier. Because Clojure data structures are immutable, we have to define a new top-level element which will be the same as m, except that we will set the style of the appropriate subnode:

1
2
3
4
5
6
7
8
9
10
> (def m2 (assoc m
                :content
                (map #(if (and (= (:tag %) :path)
                               (= ((:attrs %) :id) "pl"))
                        (assoc % :attrs (assoc (:attrs %) :style "fill: #ff0000;"))
                        %)
                     (:content m))))
#'user/m2
> (with-out-writer "/tmp/a.svg" (emit m2))
nil

We open the created file and see a map with Poland colored red. Yay!

Generalization

We will generalize our code a bit. Let us write a function that colors a single state, taking a path element (subnode of svg) as an argument:

1
2
3
4
5
6
(defn color-state
  [{:keys [tag attrs] :as element} colorize-fn]
  (let [state (:id attrs)]
    (if-let [color (colorize-fn state)]
      (assoc element :attrs (assoc attrs :style (str "fill:" color)))
      element)))

This function is similar to the anonymous one we used above in the map call, but differs in some respects. It takes two arguments. As mentioned, the first one is the XML element (destructured into tag and attrs: you can read more about destructuring in the appropriate part of Clojure docs), and the second argument is… a function that should take a two-letter country code and return a HTML color description (or nil, if that country’s color is not specified — color-state will cope with this and return the element unchanged).

Now that we have color-state, we can easily write a higher-level function that processes and writes XML in one step:

1
2
3
4
5
(defn save-color-map
  [svg colorize-fn outfile]
  (let [colored-map (assoc svg :content (map #(color-state % colorize-fn) (:content svg)))]
    (with-out-writer out
      (emit colored-map))))

Let’s test it:

1
2
> (save-color-map m {"pl" "#00ff00"} "/tmp/a.svg")
nil

This time Poland is green (we used a country→color map as an argument to color-state, since Clojure maps are callable like functions). Let’s try to add blue Germany:

1
2
> (save-color-map m {"pl" "#00ff00", "de" "#0000ff"} "/tmp/a.svg")
nil

It works!

Problem with the UK

Inspired by our success, we try to color different countries. It mostly works, but the United Kingdom remains gray, regardless of whether we specify its code as “uk” or “gb”. We resort to the source of our image, and the beginning comment once again proves helpful:

Certain countries are further subdivided the United Kingdom has gb-gbn for Great Britain and gb-nir for Northern Ireland. Russia is divided into ru-kgd for the Kaliningrad Oblast and ru-main for the Main body of Russia. There is the additional grouping #xb for the “British Islands” (the UK with its Crown Dependencies – Jersey, Guernsey and the Isle of Man)

Perhaps we have to specify “gb-gbn” and “gb-nir”, instead of just “gb”? We try that, but still no luck. After a while of thought: oh yes! Our initial assumption that all the country definitions are path subnodes of the toplevel svg node is false. We have to fix that.

So far we have been doing a “flat” transform of the SVG tree: we only changed the subnodes of the toplevel node, but no deeper. We should change all the path elements (and g, if we want to color groups of paths like the UK), regardless of how deep they occur in the tree.

We can use a zipper to do a depth-first walk of the SVG tree. Let us define a function that takes a zipper, a predicate that tells whether to edit the node in question, and the transformation function to apply to the node if the predicate returns true:

1
2
3
4
(defn map-zipper [f pred z]
  (if (zip/end? z)
    (zip/root z)
    (recur f pred (-> z (zip/edit #(if (pred %) (f %) %)) zip/next)))))

Now we rewrite save-color-map as:

1
2
3
4
5
(defn save-color-map
  [svg colorize-fn outfile]
  (let [colored-map (map-zipper #(color-state % colorize-fn) (fn [x] (#{:g :path} (:tag x))) (zip/xml-zip svg))]
    (with-out-writer out
      (emit colored-map))))

This time the UK can be colored.

Colorizers

We have automated the process of styling countries to make them appear in color, but translating particular numbers to RGB is tedious. In the last part of this article we will see how to ease this: we are going to write a colorizer, i.e., a function suitable for passing to color-state and save-color-map (so far we’ve been using maps for this).

Let’s start by writing a functions that translates a triplet of numbers into a HTML RGB notation, because it will be easier for us to work with integers than with strings:

1
2
3
(defn htmlize-color
  [[r g b]]
  (format "#%02x%02x%02x" r g b))

Now we insert a call to htmlize-color into the appropriate pace in color-state:

1
2
3
4
5
6
(defn color-state
  [{:keys [tag attrs] :as element} colorize-fn]
  (let [state (:id attrs)]
    (if-let [color (colorize-fn state)]
      (assoc element :attrs (assoc attrs :style (str "fill:" (htmlize-color color))))
      element)))

Now imagine we have a table with numeric values for states, like this:

StateValue
Poland20
Germany15
Netherlands30

We want to have a function that assigns colors to states, such that the intensity of a color should be proportional to the value assigned to a given state. To be more general, assume we have two colors, c1 and c2, and for a given state, for each of the R, G, B components we assign a value proportional to the difference between the state’s value and the smallest value in the dataset, normalized to lie between c1 and c2.

This sounds complex, but I hope an example will clear things up. This is the Clojure implementation of the described algorithm:

1
2
3
4
5
6
7
8
(defn make-colorizer
  [dataset ranges]
  (let [minv (apply min (vals dataset))
        maxv (apply max (vals dataset))
        progress (map (fn [[min-col max-col]] (/ (- max-col min-col) (- maxv minv))) ranges)]
    (into {}
          (map (fn [[k v]] [(.toLowerCase k) (map (fn [progress [min-color _]] (int (+ min-color (* (- v minv) progress)))) progress ranges)])
               dataset))))

Let us see how it works on our sample data:

1
2
> (make-colorizer {"pl" 20, "de" 15, "nl" 30} [[0 255] [0 0] [0 0]])
{"pl" (85 0 0), "de" (0 0 0), "nl" (255 0 0)}

The second argument means that the red component is to range between 0 and 255, and the green and blue components are to be fixed at 0.

Like we wanted, Germany ends up darkest (because it has the least value), the Netherlands is lightest (because it has the greatest value), and Poland’s intensity is one third that of the Netherlands (because 20 is in one third of the way between 15 and 30).

Wrapping up

The application we created can be further developed in many ways. One can, for instance, add a Web interface for it, or write many different colorizers (e.g., discrete colorizer: fixed colours for ranges of input values, or a temperature colorizer transitioning smoothly from blue through white to red — to do this we would have to pass through the HSV color space).

What is your idea to improve on it? For those of you who are tired of pasting snippets of code into the REPL, I’m putting the complete source code with a Leiningen project on GitHub. Forks are welcome.

Meet my little friend createTree

| Comments

I’ve recently been developing an iPhone application in my spare time. I’m not going to tell you what it is just yet (I will post a separate entry once I manage to get it into the App Store); for now, let me just say that I’m writing it in JavaScript and HTML5, using PhoneGap and jQTouch to give it a native touch.

After having written some of code, I began testing it on a real device and encountered a nasty issue. It turned out that some of the screens of my app, containing a dynamically-generated content, sometimes would not show up. I tried to chase the problem down, but it seemed totally random. Finally, I googled up this blog post that gave me a clue.

My code was using jQuery’s .html() method (and hence innerHTML under the hood) to display the dynamic content. It turns out that, on Mobile Safari, using innerHTML is highly unreliable (at least on iOS 4.3, but this seems to be a long-standing bug). Sometimes, the change just does not happen. I changed one of my screens, to build and insert DOM objects explicitly, and sure enough, it started to work predictably well.

So I had to remove all usages of .html() from my app. The downside to it was that explicit DOM-building code is much more verbose than the version that constructs HTML and then sets it up. It’s tedious to write and contains much boilerplate.

To not be forced to change code, the above-quoted article advocates using a pure-JavaScript HTML parser outputting DOM to replace jQuery’s .html() method. I considered this for a while, but finally decided against it — I didn’t want to include another big, complex dependency that potentially could misbehave at times (writing HTML parsers is hard).

Instead, I came up with this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function createTree(tree) {
    if (typeof tree === 'string' || typeof tree === 'number')
        return document.createTextNode(tree);
    var tag = tree[0], attrs = tree[1], res = document.createElement(tag);
    for (var attr in attrs) {
        val = attrs[attr];
        if (attr === 'class')
            res.className = val;
        else
            $(res).attr(attr, val);
    }
     for (var i = 2; i < tree.length; i++)
        res.appendChild(createTree(tree[i]));
    return res;
}

This is very similar in spirit to .html(), except that instead of passing HTML, you give it a data structure representing the DOM tree to construct. It can either be a string (which yields a text node), or a list consisting of the HTML tag name, an object mapping attributes to their values, and zero or more subtrees of the same form. Compare:

Using .html():

1
2
var html = '<p>This is an <span class="red">example.</span></p>';
$('#myDiv').html(html);

Using createTree:

1
2
3
4
var tree = ['p', {},
            'This is an ',
            ['span', {'class': 'red'}, 'example.']];
$('#myDiv').empty().append(createTree(tree));

A side benefit is that it is just as easy to build up a tree dynamically as it is to create HTML, and the code often gets clearer. Note how the createTree version above does not mix single and double quotes which is easy to mess up in the .html() version.

A quirk with JavaScript closures

| Comments

I keep running into this obstacle every now and then. Consider this example:

1
2
3
4
5
6
> q = []
[]
> for (var i = 0; i < 3; i++)
    q.push(function() { console.log(i); });
> q[0]()
3

I wanted an array of three closures, each printing a different number to the console when called. Instead, each prints 3 (or, rather, whatever the value of the variable i happens to be).

I am not exactly sure about the reason, but presumably this happens because the i in each lambda refers to the variable i itself, not to its binding from the creation time of the function.

One solution is to enforce the bindings explicitly on each iteration, like this:

1
2
3
4
for (var i = 0; i < 3; i++)
  (function(v) {
    q.push(function() { console.log(v); });
  })(i);

Or use Underscore.js, which is what I actually do:

1
2
3
_([1,2,3]).each(function(i) {
  q.push(function() { console.log(i); });
});

The Dijkstran wheel of fortune: SPSS, Excel, VBA

| Comments

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.

       — Edsger W. Dijkstra, EWD 498

I like to think of myself somewhat egotistically as a counterexample to the above Dijkstra’s statement. Granted, some of my code is definitely of poor quality, and I dare not call myself a good programmer. But, having started with BASIC on a Commodore 64, then proceeding to learn Pascal (of the Turbo/Borland flavour), then C, x86 assembly, OCaml, Smalltalk, Java, C++, Haskell, Common Lisp, Clojure, and a couple of other languages, with a few enlightenments achieved along the way, I do think I managed to regenerate from the mental wounds that BASIC had inflicted upon me. And now I feel a strange sensation, now that the Dijkstran wheel of fortune has made a full spin: I’ve spend the last few days writing BASIC code. I’ve written several Excel macros in Visual Basic for Applications.

Why the strange selection of a language? Well, this was simply the best tool for the job. What I needed to do was postprocess the output of some statistical analyses performed in SPSS running under Windows, altering the way the results were presented. SPSS can export data to HTML, Word, and Excel; of these three, the latter is most convenient, because it preserves the structure of the output tables most thoroughly. (In principle, HTML does too, and in fact my first stab was with Clojure, but I stopped after realizing just how much ad-hoc, throwaway code that parses the SPSS-generated HTML, munges it several times to and fro, and then outputs back HTML I’d have to write). So I went the Excel way, and in this post I’d like to share my mixed feelings from that encounter.

Visual Basic the language is icky. It is certainly a step forward from the BASIC I remember from decades ago, in that I didn’t have to number my lines, and it is possible to structure the code nicely so that it doesn’t contain any GOTOs, GOSUBs or RETURNs. And it has this object-oriented feel to it. But compared to modern languages, programming in it resembles voluntarily putting on handcuffs, and then jumping around to avoid stumbling over the logs it throws under your legs. Not quite so big and scary logs as C++ does, but still. I mean, why on earth does VB have to distinguish between expressions and statements? Many languages do, but in most of them an expression is at least a valid statement. Not so in VB. Also, VB is still line-oriented: whether or not you require an End If in the conditional statement depends on whether it fits in one line or not. But my biggest pain was with the assignments. VB makes a distinction between reference assignments and other assignments, requiring a Set statement in the first case, and disallowing it in the second. So, Set myCell = thatOtherCell but foo = 42. Worse, forgetting the Set in the first case does not result in an error, which makes such bugs very hard to debug. Yurgh.

Also, the IDE built into Excel for developing VB macros is mediocre. There is an editor, which highlights the syntax and automatically reformats the code, inserting spaces as appropriate, which is nice. It slaps me in face with a modal dialog whenever I make a syntax error and move off the line, which is not so nice. There is a REPL of sorts, taking the form of an “Immediate” window, into which you can type statements (not expressions, remember?) and tap Enter to execute them. You can also Debug.Print to them, like to a JavaScript console. It is not reachable by Ctrl-Tab from the editor, so I ended up using mouse much more often than normally. I want my Emacs back!

On the other hand, I find the object-oriented API for actually accessing the spreadsheets quite well-designed and pleasant to use. You just grab the object representing your worksheets from the global Worksheets object (indexable by number or by name), and from there you access your cells. The basic object you work with is the Range object, representing either a single cell or a bunch of them; you can get or set cell values, change the formatting, call Offset to navigate around as if with cursor keys. You also can search for specific content in the sheet. Simple enough, easy to use and pick up; and above all, allows to get the job done without getting in the way much.

As for SPSS itself: it sucks. In fact, it sucks so great and in so many different ways that it merits its own blog entry (which will follow someday). For now, I’ll only note down the things pertaining to Excel interop; hopefully it will save somebody’s time.

Problem is, SPSS 19’s Excel export is buggy. In fact, it’s so unreliable that I’ve wasted more hours struggling with it than actually writing my macros. (We’re talking SPSS 19 here; I’ve also tried version 17, with the same results.) It exports small data chunks fine, but the larger your output, the more likely it is that Excel alerts about unreadable content in your file. Excel then offers to repair the data, which mostly succeeds, but inevitably loses the formatting — which for me was a no-no.

So, after long hours of experimentation and attempting different workarounds, I found that it is much, much more reliable to just copy your data and paste it into Excel directly, without exporting to a temporary file. Just do Edit -> Copy special and select Excel BIFF format, to make sure you’re copying the right data. If Excel complains about not being able to understand the copied content (turn on the Clipboard preview to find out), save your output to .spv, restart SPSS, re-run your syntax and try again. With luck, it will eventually work. At least for me it did.

Hello world, again

| Comments

I’ve been quiet on the front of blogging in English recently. But that doesn’t mean I’ve given up.

After more than a year, I had become tired of maintaining a Blosxom installation. I greatly admire Blosxom, its minimalism and extensibility, but the default installation is just too minimal for my needs. And the plugins tend to have rough edges. Like the Disqus comments that I’ve enabled at one time on the otherwise static blog pages: the correct number of comments appears in some places but not all; besides, they just don’t feel right.

So I’ve embarked on an experiment with a blogging platform, namely Posterous. I’ve started a blog in Polish there to comment on local affairs in my mother tongue and to popularize Clojure among Polish programmers. And after a few months, I consider this experiment successful. Posterous supports Markdown, which I grew accustomed to while using Blosxom. It automatically syntax-highlights snippets of Clojure code that I post, which is a big win. It is highly customizable, easy to use (blogging via email FTW!), and lets me control my data. It does have its deficiencies, but on the whole it gets in the way less. So I’m switching to Posterous for “Musings of a Lispnik” too.

It is unclear for me how to migrate the old content to new platform, so for now I’ll leave it as is under a temporary address, while posting new things exclusively here. (Update 2012-09-24: After another migration, this time to Octopress, I’ve merged the old contents back where it belongs.)

In the near future, I plan to translate a few articles about Clojure I’d written in Polish and post their English versions here. Stay tuned!

Keyword arguments

| Comments

There’s been an ongoing debate about how to pass optional named arguments to Clojure functions. One way to do this is the defnk macro from clojure.contrib.def; I hesitate to call it canonical, since apparently not everyone uses it, but I’ve found it useful a number of times. Here’s a sample:

1
2
3
4
5
6
7
8
user> (use 'clojure.contrib.def)
nil
user> (defnk f [:b 43] (inc b))
#'user/f
user> (f)
44
user> (f :b 100)
101

This is an example of keyword arguments in action. Keyword arguments are a core feature of some languages, notably Common Lisp and Objective Caml. Clojure doesn’t have them, but it’s pretty easy to emulate their basic usage with macros, as defnk does.

But there’s more to Common Lisp’s keyword arguments than defnk provides. In CL, the default value of a keyword argument can be an expression referring to other arguments of the same function. For example:

1
2
3
4
5
6
7
8
9
CL-USER> (defun f (&key (a 1) (b a))
           (+ a b))
F
CL-USER> (f)
2
CL-USER> (f :a 45)
90
CL-USER> (f :b 101)
102

I wish defnk had this feature. Or is there some better way that I don’t know of?

Sunflower

| Comments

The program I’ve been writing about recently has come to a point where I think it can be shown to the wide public. It’s called Sunflower and has its home on GitHub. It’s nowhere near being completed, and of alpha quality right now, but even at this stage it might be useful.

Just as sunflower seed kernels come wrapped in hulls, most HTML documents seen in the wild come wrapped in noise that is not really part of the document itself. Take any news site: a document from such a site contains things such as advertisements, header, footer, and many links. Now suppose you have many documents grabbed from the same site. Is it possible to somehow automate the extraction of the document “essences”?

Sunflower to the rescue. It relies on the assumption that documents coming from the same source have the same structure. It presents a list of strings to the user, and asks to pick those that are contained in the text essence. Then it finds the coordinates of the smallest HTML subtree that contains all those strings, and uses those coordinates to extract information from all documents. And it comes with a nice, easily understandable GUI for that.

This technique works remarkably well for many collections, although not all. An earlier, proof-of-concept implementation (in Common Lisp) has been used to extract many press texts for the National Corpus of Polish.

I’ve given up on the symbol-capturing approach to wizards I’ve presented in my previous posts. Inspired by the DOM tree in Web apps, with a bag of elements with identifiers, I now have a central bag of Swing widgets (implemented as an atom) identified by keywords. This bag contains tidbits of the mutable state of Sunflower. This means that I can write callback functions like this:

1
2
3
4
5
6
#(with-components [strings-model selected-dir]
   (.removeAllElements strings-model)
   (let [p (-> selected-dir htmls first parse)]
     (add-component :parsed p)
     (doseq [x (strings p)]
       (.addElement strings-model x))))

Name and conquer: having parts of state explicitly named mean that I can reliably access them from just about anywhere. This reduces confusion and allows for less tangled, more self-contained and understandable code.

A case for symbol capture

| Comments

Clojure by default protects macro authors from incidentally capturing a local symbol. Stuart Halloway describes this in more detail, explaining why this is a Good Thing. However, sometimes this kind of symbol capture is called for. I’ve encountered one such case today while hacking a Swing application.

As I develop the app, I find new ways to express Swing concepts and interact with Swing objects in a more Clojuresque way, so a library of GUI macros and functions gets written. One of them is a wizard macro for easy creation of installer-like wizards, where there is a sequence of screens that can be navigated with Back and Next buttons at the bottom of the window.

The API (certainly not finished yet) currently looks like this:

(wizard & components)

where each Swing component corresponding to one wizard screen can be augmented by a supplementary map, which can contain, inter alia, a function to execute upon showing the screen in question.

Now, I want those functions to be able to access the Back and Next buttons in case they want to disable or enable them at need. I thus want the API user to be able to use two symbols, back-button and next-button, in the macro body, and have them bound to the corresponding buttons.

It is crucial that these bindings be lexical and not dynamic. If they were dynamic, they would be only effective during the definition of the wizard, but not when my closures are invoked later on. Thus, my implementation looks like this:

1
2
3
4
(defmacro wizard [& panels]
  `(let [~'back-button (button "< Back")
         ~'next-button (button "Next >")]
   (do-wizard ~'back-button ~'next-button ~(vec panels))))

where do-wizard is a private function implementing the actual wizard creation, and the ~'foo syntax forces symbol capture.

By the way, if all goes well, this blog post should be the first one syndicated to Planet Clojure. Hello, Planet Clojure readers!