musings of a Lispnik

Daniel Janus's blog

You already use Lisp syntax

| Comments

Unix Developer: I’m not going to touch Lisp. It’s horrible!

Me: Why so?

UD: The syntax! This illegible prefix-RPN syntax that nobody else uses. And just look at all these parens!

Me: Well, many people find it perfectly legible, although most agree that it takes some time to get accustomed to. But I think you’re mistaken. Lots of people are using Lisp syntax on a daily basis…

UD: I happen to know no one doing this.

Me: …without actually realizing this. In fact, I think you yourself are using it.

UD: Wait, what?!

Me: And the particular variant of Lisp syntax you’re using is called Bourne shell.

UD: Now I don’t understand. What on earth does the shell have to do with Lisp?

Me: Just look: in the shell, you put the name of the program first, followed by the arguments, separated by spaces. In Lisp it’s exactly the same, except that you put an opening paren at the beginning and a closing paren at the end.

Shell: run-something arg1 arg2 arg3

Lisp: (run-something arg1 arg2 arg3)

UD: I still don’t get the analogy.

Me: Then you need a mechanism for expression composition — putting the output of one expression as an input to another. In Lisp, you just nest the lists. And in the shell?

UD: Backticks.

Me: That’s right. Or $(), which has the advantage of being more easily nestable. Let’s try arithmetic. How do you do arithmetic in the shell?

UD: expr. Or the Bash builtin let. For example,

$ let x='2*((10+4)/7)'; echo $x
4

Me: Now wouldn’t it be in line with the spirit of Unix — to have programs do just one thing — if we had one program to do addition, and another to do subtraction, and yet another to do multiplication and division?

It’s trivial to write it in C:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv) {
  int mode = -1, cnt = argc - 1, val, i;
  char **args = argv + 1;
  switch (argv[0][strlen(argv[0]) - 1]) {
    case '+': mode = 0; break;
    case '-': mode = 1; break;
    case 'x': mode = 2; break;
    case 'd': mode = 3; break;
  }
  if (mode == -1) {
    fprintf(stderr, "invalid math operation\n");
    return 1;
  }
  if ((mode == 1 || mode == 3) && !cnt) {
    fprintf(stderr, "%s requires at least one arg\n", argv[0]);
    return 1;
  }
  switch (mode) {
    case 0: val = 0; break;
    case 2: val = 1; break;
    default: val = atoi(*args++); cnt--; break;
  }
  while (cnt--) {
    switch (mode) {
      case 0: val += atoi(*args++); break;
      case 1: val -= atoi(*args++); break;
      case 2: val *= atoi(*args++); break;
      case 3: val /= atoi(*args++); break;
    }
  }
  printf("%d\n", val);
  return 0;
}

This dispatches on the last character of its name, so it can be symlinked to +, -, x and d (I picked unusual names for multiplication and division to make them legal and avoid escaping).

Now behold:

$ x 2 $(d $(+ 10 4) 7)
4

UD: Wow, this sure looks a lot like Lisp!

Me: And yet it’s the shell. Our two basic rules — program-name-first and $()-for-composition — allowed us to explicitly specify the order of evaluation, so there was no need to do any fancy parsing beyond what the shell already provides.

UD: So is the shell a Lisp?

Me: Not really. The shell is stringly typed: a program takes textual parameters and produces textual output. To qualify as a Lisp, it would have to have a composite type: a list or a cons cell to build lists on top of. Then, you’d be able to represent code as this data structure, and write programs to transform code to other code.

But the Tao of Lisp lingers in the shell syntax.


I know I’ve glossed over many details here, like the shell syntax for redirection, globbing, subprocesses, the fact that programs have standard input in addition to command-line arguments, pipes, etc. — all these make the analogy rather weak. But I think it’s an interesting way to teach Lisp syntax to people.

DOS debugging quirk

| Comments

While hacking on Lithium, I’ve noticed an interesting thing. Here’s a sample DOS program in assembly (TASM syntax):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
.model tiny
.code
  org 100h

N equ 2

start:
  mov bp,sp
  mov ax,100
  mov [bp-N],ax
  mov cx,[bp-N]
  cmp cx,ax
  jne wrong
  mov dx,offset msg
  jmp disp
wrong:
  mov dx,offset msg2
disp:
  mov ah,9
  int 21h
  mov ax,4c00h
  int 21h

msg db "ok$"
msg2 db "wrong$"
end start

If you assemble, link and then execute it normally, typing prog in the DOS command line, it will output the string “ok”. But if you trace through the program in a debugger instead, it will say “wrong”! What’s wrong?

The problem is in lines 10-11 (instructions 3-4). Here’s what happens when you trace through this program in DOS 6.22’s DEBUG.EXE:

Note how in instruction 3 (actually displayed as the second above) we set the word SS:0xFFFC to 100. When about to execute the following instruction, we would expect that word to continue to hold the value 100, because nothing which could have changed that value has happened in between. Instead, the debugger still reports as 0x0D8A, as if instruction 3 had not been executed at all — and, interestingly, after actually executing this instruction, CX gets yet another value of 0x7302!

Normally, thinking of DOS .COM programs, you assume a 64KB-long chunk of memory that the program has all to itself: the code starts at 0x100, the stack grows from 0xFFFE downwards (at any given time, the region from SP to 0xFFFE contains data currently on the stack), and all memory in between is free for the program to use however it deems fit. It turns out that, when debugging, it is not the case: the debuggers need to manipulate the region just underneath the program’s stack in order to handle the tracing/breakpoint interrupt traps.

I’ve verified that both DOS’s DEBUG and Borland’s Turbo Debugger 5 do this. The unsafe-to-touch amount of space below SP that they need, however, varies. Manipulating the N constant in the original program, I’ve determined that DEBUG only needs 8 bytes below SP, whereas for TD it is a whopping 18 bytes.

2048: A close look at the source

| Comments

Dust has now mostly settled down on 2048. Yet, in all the deluge of variants and clones that has swept through Hacker News, little has been written about the experience of modifying the game. As I too have jumped on the 2048-modding bandwagon, it’s time to fill that gap, because, as we shall see, the code more than deserves a close look.

I’ll start with briefly describing my variant. It’s called “words oh so great” (a rather miserable attempt at a pun on “two-oh-four-eight”) and is a consequence of a thought I had, being an avid Scrabble player, after seeing the 3D and 4D versions: “what if we mashed 2048 and Scrabble together?” The answer just lended itself automatically.

Letters instead of number tiles, that was obvious. And you use them to form words. It is unclear how merging tiles should work: merging two identical tiles, as in the original, just wouldn’t make sense here, so drop the concept of merging and make the tiles disappear instead when you form a word. In Scrabble, the minimum length of a word is two, but allowing two-letter words here would mean too many words formed accidentally, so make it at least three. And 16 squares sounds like too tight a space, so increase it to 5x5. And there you have the modified rules.

I cloned the Git repo, downloaded an English word list (EOWL), and set out to work. It took me just over three hours from the initial idea to putting the modified version online and submitting a link to HN. I think three hours is not bad, considering that I’ve significantly changed the game mechanics. And, in my opinion, this is a testimony to the quality of Gabriele Cirulli’s code.

The code follows the MVC pattern, despite not relying on any frameworks or libraries. The model is comprised of the Tile and Grid classes, laying out the universe for the game as well as some basic rules governing it, and the GameManager that implements the game mechanics: how tiles move around, when they can merge together, when the game ends, and so on. It also uses a helper class called LocalStorageManager to keep the score and save it in the browser’s local storage.

The view part is called an “actuator” in 2048 parlance. The HTMLActuator takes the game state and updates the DOM tree accordingly. It also uses a micro-framework for animations. The controller takes the form of a KeyboardInputManager, whose job is to receive keyboard events and translate them to changes of the model.

The GameManager also contains some code to tie it all together — not really a part of the model as in MVC. Despite this slight inconsistency, the separation of concerns is very neatly executed in 2048’s code; I would even go so far as to say that it could be used as a demonstration in teaching MVC to people.

The only gripe I had with the code is that it violates the DRY principle in several places. Specifically, to change the board size to 5x5, I had to modify as many as three places: the HTML (it contains the initial definition for the DOM, including 16 empty divs making up the grid, which is unfortunate — I’d change it to set up the DOM at runtime during initialization); the model (instantiation of GameManager); and the .scss file from which the CSS is generated.

While on this topic, let me add that 2048’s usage of SASS is a prime example of its capabilities. It is very instructive to see how the sizing and positioning of the grid, and also styling for the tiles down to the glow, is done programmatically. I was aware of the existence of SASS before, but never got around to explore it. Now, I’m sold on it.

To sum up: 2048 rocks. And it’s fun to modify. Go try it.

Lithium revisited: A 16-bit kernel (well, sort of) written in Clojure (well, sort of)

| Comments

Remember Lithium? The x86 assembler written in Clojure, and a simple stripes effect written in it? Well, here’s another take on that effect:

And here is the source code:

1
2
3
4
5
6
7
8
9
(do (init-graph)
    (loop [x 0 y 0]
      (put-pixel x y (let [z (mod (+ (- 319 x) y) 32)]
                       (if (< z 16) (+ 16 z) (+ 16 (- 31 z)))))
      (if (= y 200)
        nil
        (if (= x 319)
          (recur 0 (inc y))
          (recur (inc x) y)))))

I’ve implemented this several months ago, pushed it to Github and development has pretty much stalled since then. And after seeing this recent post on HN today, I’ve decided to give Lithium a little more publicity, in the hope that it will provide a boost of motivation to me. Because what we have here is pretty similar to Rustboot: it’s a 16-bit kernel written in Clojure.

Well, sort of.

After writing a basic assembler capable of building bare binaries of simple x86 real-mode programs, I’ve decided to make it a building block of a larger entity. So I’ve embarked on a project to implement a compiler for a toy Lisp-like language following the paper “An Incremental Approach to Compiler Construction”, doing it in Clojure and making the implemented language similar to Clojure rather than to Scheme.

(Whether it actually can be called Clojure is debatable. It’s unclear what the definition of Clojure the language is. Is running on JVM a part of what makes Clojure Clojure? Or running on any host platform? Is ClojureScript Clojure? What about ClojureCLR, or clojure-py?)

So far I’ve only gotten to step 7 of 24 or so, but that’s already enough to have a working loop/recur implementation, and it was trivial to throw in some graphical mode 13h primitives to be able to implement this effect.

By default I’m running Lithium programs as DOS .COM binaries under DOSBox, but technically, the code doesn’t depend on DOS in any way (it doesn’t ever invoke interrupt 21h) and so it can be combined with a simple bootloader into a kernel runnable on the bare metal.

The obligatory HOWTO on reproducing the effect: install DOSBox and Leiningen, checkout the code, launch a REPL with lein repl, execute the following forms, and enjoy the slowness with which individual pixels are painted:

1
2
3
(require 'lithium.compiler)
(in-ns 'lithium.compiler)
(run! (compile-program "/path/to/lithium/examples/stripes-grey.clj"))

My top three iOS apps for mapping

| Comments

Living in London means that I now have a whole lot of new area to explore by cycling or walking. I try to take every opportunity to spend a free day or weekend out. One of the most important things when on the move is knowing where you are, where to go, and how to get there — and for that, you need a map. As I soon learned, the maps to use in the UK are the Ordnance Survey ones (either the Landranger/Explorer series, or maps by another publisher, such as AA, based on OS data). However, the Landranger series encompasses over 200 1:50000 maps, standing at some £8 each, and when that level of detail is not enough, there are more than 400 Explorer maps on top of that. Not only does this get pricey after a while, but also the sheer volume of map juggling quickly becomes impractical when you cycle a lot outside of town.

So I’ve turned to my old trusty iPhone 3GS as a mapping device instead, and set out to complete a set of mapping apps that do the job for me. In this post, I’d like to share my list.

I briefly thought of directly using OS maps on the iPhone via the Outdoors GPS GB app; it does meet my requirement of being accessible off-network, but the pricing of individual maps is on par with the paper version, so I ruled it out.

Instead, I am using this trio now:

  1. The official National Cycle Network app by Sustrans. Beside being free, it has an advantage of detailing every numbered national cycle route, as well as most local routes (that often predate NCN or are not yet integrated into the network). At high detail, the data seem to be OS-sourced, which is good.

    It downloads maps from the Internet on demand, but you can also save a map portion for future use. The app asks you how much detail you want, tells you how large the download will be, then proceeds to get the data. The nuisance here is that you can only download 40 MB in one go, which corresponds to an area stretching for approximately 50-60 km at 1:50000 (and correspondingly smaller at 1:25000), so it takes a lot of tapping and downloading if you’re planning a longer trip.

    The other downsides are that the app is a little shaky at times, and GPS positioning sometimes displays your position somewhat misplaced. I mitigate this by using this app in combination with the next one…

  2. …which is MapsWithMe. The tagline “Offline Mobile Maps” nails it down: it’s just maps, easily downloadable, covering the entire world, and nothing else. This really does one thing well. The map data source is OpenStreetMap, so all the maps are available for free as well; one ‘Download’ tap and you’ve got the whole country covered, once and for all. It also displays GPS position much more reliably than NCN. On the other hand, it can’t offer quite the same level of detail as NCN, and doesn’t know anything about cycle routes, but it’s still highly convenient.

    My typical flow when cycling in the UK is: check my position with MapsWithMe, then optionally switch to NCN, locate the same position on the map by hand and see where the route goes. I’ve also done one continental three-day trip, from Dunkirk in France to Hoek van Holland in the Netherlands, using just MapsWithMe to navigate, and it worked out very well.

  3. Unlike the other two, the last app I want to point out, GPS2OS, is paid. And it’s more than worth its meager price, despite being next to useless when cycling. But when hiking, especially in remote mountainous areas, it can literally be a lifesaver. Here’s the catch: my basic navigation tools in harsh conditions are a compass and a plain ol’ paper map, and the iPhone is treated only as a supplementary aid (you never know when the battery goes out). However, instead of indicating the latitude and longitude in degrees/minutes/seconds, OS maps use their own grid. So you cannot use the default Compass app, which tells you your position in degrees, directly with them, and you need a tool just like this one to do the coordinate translation. Works very well; it helped me find my way in dense mist down from the summit of Ben Macdui during my recent holiday in Scotland.

One final tip: when you want to conserve battery as much as possible, airplane mode is a real saver. However, GPS doesn’t seem to work when airplane mode is on. So the next best thing is to remove the SIM card (you can then reinsert it, just don’t enter the PIN), so that the phone doesn’t keep trying to connect to cellular networks. And keep it warm in a pocket beside your body: cold devices discharge much faster.

Lithium: an x86 assembler for Clojure

| Comments

Ah, the golden days of childhood’s hackage. Don’t you have fond memories of them?

I got my first PC when I was 10. It was a 486DX2/66 with 4 megs of RAM and a 170 meg HDD; it ran DOS and had lots of things installed on it, notably Turbo Pascal 6. I hacked a lot in it. These were pre-internet days when knowledge was hard to come by, especially for someone living in a small town in Poland; my main sources were the software I had (TP’s online help was of excellent quality), a couple of books, and a popular computing magazine that published articles on programming. From the latter, I learned how to program the VGA: how to enter mode 13h, draw pixels on screen, wait for vertical retrace, manipulate the palette and how to combine these things into neat effects. One of the very first thing I discovered was when you plot every pixel using sum of its coordinates modulo 40 as color, you get a nice-looking diagonal stripes effect. Because of the initially incomprehensible inline assembly snippets appearing all over the place, I eventually learned x86 assembly, too.

Back to 2012: I’ve long been wanting to hack on something just for pure fun, a side pet project. Writing code for the bare metal is fun because it’s just about as close as you can get to wielding the ultimate power. And yet, since Clojure is so much fun too, I wanted the project to have something to do with Clojure.

So here’s Lithium, an x86 16-bit assembler written in pure Clojure and capable of assembling a binary version of the stripes effect.

To try it, clone the git repo to your Linux or OS X machine, install DOSBox, launch a REPL with Leiningen, change to the lithium namespace and say:

1
(run! "/home/you/lithium/src/stripes.li.clj")

FAQ

(Well, this is not really a FAQ since nobody actually asked me any questions about Lithium yet. This is more in anticipation of questions that may arise.)

What’s the importance of this?

None whatsoever. It’s just for fun.

How complete is it?

Very incomplete. To even call it pre-pre-alpha would be an exaggeration. It’s currently little more than pure minimum required to assemble stripes.li.clj. Output format wise, it only produces bare binaries (similar to DOS .COMs), and that’s unlikely to change anytime soon.

Do you intend to continue developing it?

Absolutely. I will try to make it more complete, add 32- and possibly 64-bit modes, see how to add a macro system (since the input is s-expressions, it should be easy to produce Clojure macros to write assembly), write something nontrivial in it, and see how it can be used as a backend for some higher-level language compiler (I’m not sure yet which language that will turn out to be).

How to call a private function in Clojure

| Comments

tl;dr: Don’t do it. If you really have to, use (#'other-library/private-function args).


A private function in Clojure is one that has been defined using the defn- macro, or equivalently by setting the metadata key :private to true on the var that holds the function. It is normally not allowed in Clojure to call such functions from outside of the namespace where they have been defined. Trying to do so results in an IllegalStateException stating that the var is not public.

It is possible to circumvent this and call the private function, but it is not recommended. That the author of the library decided to make a function private probably means that he considers it to be an implementation detail, subject to change at any time, and that you should not rely on it being there. If you think it would be useful to have this functionality available as part of the public API, your best bet is to contact the library author and consult the change, so that it may be included officially in a future version.

Contacting the author, however, is not always feasible: she may not be available or you might be in haste. In this case, several workarounds are available. The simplest is to use (#'other-library/private-function args), which works in Clojure 1.2.1 and 1.3.0 (it probably works in other versions of Clojure as well, but I haven’t checked that).

Why does this work? When the Clojure compiler encounters a form (sym args), it invokes analyzeSeq on that form. If its first element is a symbol, it proceeds to analyze that symbol. One of the first operation in that analysis is checking if it names an inline function, by calling isInline. That function looks into the metadata of the Var named by the symbol in question. If it’s not public, it throws an exception.

On the other hand, #' is the reader macro for var. So our workaround is equivalent to ((var other-library/private-function) args). In this case, the first element of the form is not a symbol, but a form that evaluates to a var. The compiler is not able to check for this so it does not insert a check for privateness. So the code compiles to calling a Var object.

Here’s the catch: Vars are callable, just like functions. They implement IFn. When a var is called, it delegates the call to the IFn object it is holding. This has been recently discussed on the Clojure group. Since that delegation does not check for the var’s privateness either, the net effect is that we are able to call a private function this way.

Lifehacking: How to get cheap home equipment using Clojure

| Comments

I’ve moved to London last September. Like many new Londoners, I have changed accommodation fairly quickly, being already after one removal and with another looming in a couple of months; my current flat was largely unfurnished when I moved in, so I had to buy some basic homeware. I didn’t want to invest much in it, since it’d be only for a few months. Luckily, it is not hard to do that cheaply: many people are moving out and getting rid of their stuff, so quite often you can search for the desired item on Gumtree and find there’s a cheap one a short bike ride away.

Except when there isn’t. In this case, it’s worthwhile to check again within a few days as new items are constantly being posted. Being lazy, I’ve decided to automate this. A few hours and a hundred lines of Clojure later, gumtree-scraper was born.

I’ve packaged it using lein uberjar into a standalone jar, which, when run, produces a gumtree.rss that is included in my Google Reader subscriptions. This way, whenever something I’m interested in appears, I get notified within an hour or so.

It’s driven by a Google spreadsheet. I’ve created a sheet that has three columns: item name, minimum price, maximum price; then I’ve made it available to anyone who knows the URL. This way I can edit it pretty much from everywhere without touching the script. Each time the script is run (by cron), it downloads that spreadsheet as a CSV that looks like this:

hand blender,,5
bike rack,,15

For each row the script queries Gumtree’s category “For Sale” within London given the price range, gets each result and transforms it to a RSS entry.

Gumtree has no API, so I’m using screenscraping to retrieve all the data. Because the structure of the pages is much simpler, I’m actually scraping the mobile version; a technical twist here is that the mobile version is only served to actual browsers so I’m supplying a custom User-Agent, pretending to be Safari. For actual scraping, the code uses Enlive; it works out nicely.

About half of the code is RSS generation — mostly XML emitting. I’d use clojure.xml/emit but it’s known to produce malformed XML at times, so I include a variant that should work.

In case anyone wants to tries it out, be aware that the location and category are hardcoded in the search URL template; if you want, change the template line in get-page. The controller spreadsheet URL is not, however, hardcoded; it’s built up using the spreadsheet.key system property. Here’s the wrapper script I use that is actually run by cron:

1
2
3
4
5
6
7
8
#!/bin/bash
if [ "`ps ax | grep java | grep gumtree`" ]; then
  echo "already running, exiting"
  exit 0
fi
cd "`dirname $0`"
java -Dspreadsheet.key=MY_SECRET_KEY -jar $HOME/gumtree/gumtree.jar
cp $HOME/gumtree/gumtree.rss $HOME/public_html

Now let me remove that entry for a blender — I’ve bought one yesterday for £4…

Ever wanted to programmatically file a lawsuit? In Poland, you can.

| Comments

This has somehow escaped me: just over a year ago, the Sixth Civil Division of the Lublin-West Regional Court in Lublin, Poland, has opened its online branch. It serves the entire territory of Poland and is competent to recognize lawsuits concerning payment claims. There is basic information available in English. It has proven immensely popular, having processed about two million cases in its first year of operation.

And the really cool thing is, they have an API.

It’s SOAP-based and has a publicly available spec. (Due to the way their web site is constructed, I cannot link to the spec directly; this last link leads to a collection of files related to the web service. The spec is called EpuWS_ver.1.14.1.pdf; it’s in Polish only, but it should be easy to run it through Google Translate.) There are a couple of XML schemas as well, plus the spec contains links to a WSDL and some code samples (in C#) at the end.

To actually use the API, you need to get yourself an account of the appropriate type (there are two types corresponding to two groups of methods one can use: that of a bailiff and of a mass plaintiff). You then log on to the system, where you can create an API key that is later used for authentication. They throttle the speed down to 1 req/s per user to mitigate DoS attacks.

The methods include FileLawsuits, FileComplaints, SupplyDocuments, GetCaseHistory and so on (the actual names are in Polish). To give you an example, the FileLawsuits method returns a structure that consists of, inter alia, the amount of court fee to pay, the value of the matter of dispute (both broken down into individual lawsuits), and a status code with a description.

iOS app, anyone?

Combining virtual sequences or, Sequential Fun with Macros or, How to Implement Clojure-Like Pseudo-Sequences with Poor Man’s Laziness in a Predominantly Imperative Language

| Comments

Sequences and iteration

There are a number of motivations for this post. One stems from my extensive exposure to Clojure over the past few years: this was, and still is, my primary programming language for everyday work. Soon, I realized that much of the power of Clojure comes from a sequence abstraction being one of its central concepts, and a standard library that contains many sequence-manipulating functions. It turns out that by combining them it is possible to solve a wide range of problems in a concise, high-level way. In contrast, it pays to think in terms of whole sequences, rather than individual elements.

Another motivation comes from a classical piece of functional programming humour, The Evolution of a Haskell Programmer. If you don’t know it, go check it out: it consists of several Haskell implementations of factorial, starting out from a straightforward recursive definition, passing through absolutely hilarious versions involving category-theoretical concepts, and finally arriving at this simple version that is considered most idiomatic:

1
fac n = product [1..n]

This is very Clojure-like in that it involves a sequence (a list comprehension). In Clojure, this could be implemented as

1
2
(defn fac [n]
  (reduce * 1 (range 1 (inc n))))

Now, I thought to myself, how would I write factorial in an imperative language? Say, Pascal?

1
2
3
4
5
6
7
8
9
function fac(n : integer) : integer;
var
  i, res : integer;
begin
  res := 1;
  for i := 1 to n do
    res := res * i;
  fac := res;
end;

This is very different from the functional version that works with sequences. It is much more elaborate, introducing an explicit loop. On the other hand, it’s memory efficient: it’s clear that its memory requirements are O(1), whereas a naïve implementation of a sequence would need O(n) to construct it all in memory and then reduce it down to a single value.

Or is it really that different? Think of the changing values of i in that loop. On first iteration it is 1, on second iteration it’s 2, and so on up to n. Therefore, one can really think of a for loop as a sequence! I call it a “virtual” sequence, since it is not an actual data structure; it’s just a snippet of code.

To rephrase it as a definition: a virtual sequence is a snippet of code that (presumably repeatedly) yields the member values.

Let’s write some code!

To illustrate it, throughout the remainder of this article I will be using Common Lisp, for the following reasons:

  • It allows for imperative style, including GOTO-like statements. This will enable us to generate very low-level code.
  • Thanks to macros, we will be able to obtain interesting transformations.

Okay, so let’s have a look at how to generate a one-element sequence. Simple enough:

1
2
(defmacro vsingle (x)
 `(yield ,x))

The name VSINGLE stands for “Virtual sequence that just yields a SINGLE element”. (In general, I will try to define virtual sequences named and performing similarly to their Clojure counterparts here; whenever there is a name clash with an already existing CL function, the name will be prefixed with V.) We will not concern ourselves with the actual definition of YIELD at the moment; for debugging, we can define it just as printing the value to the standard output.

1
2
(defun yield (x)
  (format t "~A~%" x))

We can also convert a Lisp list to a virtual sequence which just yields each element of the list in turn:

1
2
3
4
5
(defmacro vseq (list)
  `(loop for x in ,list do (yield x)))

(defmacro vlist (&rest elems)
  `(vseq (list ,@elems)))

Now let’s try to define RANGE. We could use loop, but for the sake of example, let’s pretend that it doesn’t exist and write a macro that expands to low-level GOTO-ridden code. For those of you who are not familiar with Common Lisp, GO is like GOTO, except it takes a label that should be established within a TAGBODY container.

1
2
3
4
5
6
7
8
9
10
11
12
13
(defmacro range (start &optional end (step 1))
  (unless end
    (setf end start start 0))
  (let ((fv (gensym)))
    `(let ((,fv ,start))
       (tagbody
        loop
          (when (>= ,fv ,end)
            (go out))
          (yield ,fv)
          (incf ,fv ,step)
          (go loop)
       out))))

Infinite virtual sequences are also possible. After all, there’s nothing preventing us from considering a snippet of code that loops infinitely, executing YIELD, as a virtual sequence! We will define the equivalent of Clojure’s iterate: given a function fun and initial value val, it will repeatedly generate val, (fun val), (fun (fun val)), etc.

1
2
3
4
5
6
7
(defmacro iterate (fun val)
  (let ((fv (gensym)))
    `(let ((,fv ,val))
       (tagbody loop
          (yield ,fv)
          (setf ,fv (funcall ,fun ,fv))
          (go loop)))))

So far, we have defined a number of ways to create virtual sequences. Now let’s ask ourselves: is there a way, given code for a virtual sequence, to yield only the elements from the original that satisfy a certain predicate? In other words, can we define a filter for virtual sequences? Sure enough. Just replace every occurrence of yield with code that checks whether the yielded value satisfies the predicate, and only if it does invokes yield.

First we write a simple code walker that applies some transformation to every yield occurrence in a given snippet:

1
2
3
4
5
6
(defun replace-yield (tree replace)
  (if (consp tree)
      (if (eql (car tree) 'yield)
          (funcall replace (cadr tree))
          (loop for x in tree collect (replace-yield x replace)))
      tree))

We can now write filter like this:

1
2
3
(defmacro filter (pred vseq &environment env)
  (replace-yield (macroexpand vseq env)
                 (lambda (x) `(when (funcall ,pred ,x) (yield ,x)))))

It is important to point out that since filter is a macro, the arguments are passed to it unevaluated, so if vseq is a virtual sequence definition like (range 10), we need to macroexpand it before replacing yield.

We can now verify that (filter #'evenp (range 10)) works. It macroexpands to something similar to

1
2
3
4
5
6
7
8
9
(LET ((#:G70192 0))
  (TAGBODY
    LOOP (IF (>= #:G70192 10)
           (PROGN (GO OUT)))
         (IF (FUNCALL #'EVENP #:G70192)
           (PROGN (YIELD #:G70192)))
         (SETQ #:G70192 (+ #:G70192 1))
         (GO LOOP)
    OUT))

concat is extremely simple. To produce all elements of vseq1 followed by all elements of vseq2, just execute code corresponding to vseq1 and then code corresponding to vseq2. Or, for multiple sequences:

1
2
(defmacro concat (&rest vseqs)
  `(progn ,@vseqs))

To define take, we’ll need to wrap the original code in a block that can be escaped from by means of return-from (which is just another form of goto). We’ll add a counter that will start from n and keep decreasing on each yield; once it reaches zero, we escape the block:

1
2
3
4
5
6
7
8
9
10
(defmacro take (n vseq &environment env)
  (let ((x (gensym))
        (b (gensym)))
    `(let ((,x ,n))
       (block ,b
         ,(replace-yield (macroexpand vseq env)
                         (lambda (y) `(progn (yield ,y)
                                             (decf ,x)
                                             (when (zerop ,x)
                                               (return-from ,b)))))))))

rest (or, rather, vrest, as that name is taken) can be defined similarly:

1
2
3
4
5
(defmacro vrest (vseq &environment env)
  (let ((skipped (gensym)))
    (replace-yield
     `(let ((,skipped nil)) ,(macroexpand vseq env))
     (lambda (x) `(if ,skipped (yield ,x) (setf ,skipped t))))))

vfirst is another matter. It should return a value instead of producing a virtual sequence, so we need to actually execute the code — but with yield bound to something else. We want to establish a block as with take, but our yield will immediately return from the block once the first value is yielded:

1
2
3
4
5
(defmacro vfirst (vseq)
  (let ((block-name (gensym)))
   `(block ,block-name
      (flet ((yield (x) (return-from ,block-name x)))
        ,vseq))))

Note that so far we’ve seen three classes of macros:

  • macros that create virtual sequences;
  • macros that transform virtual sequences to another virtual sequences;
  • and finally, vfirst is our first example of a macro that produces a result out of a virtual sequence.

Our next logical step is vreduce. Again, we’ll produce code that rebinds yield: this time to a function that replaces the value of a variable (the accumulator) by result of calling a function on the accumulator’s old value and the value being yielded.

1
2
3
4
5
(defmacro vreduce (f val vseq)
  `(let ((accu ,val))
     (flet ((yield (x) (setf accu (funcall ,f accu x))))
       ,vseq
       accu)))

We can now build a constructs that executes a virtual sequence and wraps the results up as a Lisp list, in terms of vreduce.

1
2
3
4
5
(defun conj (x y)
  (cons y x))

(defmacro realize (vseq)
 `(nreverse (vreduce #'conj nil ,vseq)))

Let’s verify that it works:

1
2
3
4
5
CL-USER> (realize (range 10))
(0 1 2 3 4 5 6 7 8 9)

CL-USER> (realize (take 5 (filter #'oddp (iterate #'1+ 0))))
(1 3 5 7 9)

Hey! Did we just manipulate an infinite sequence and got the result in a finite amount of time? And that without explicit support for laziness in our language? How cool is that?!

Anyway, let’s finally define our factorial:

1
2
(defun fac (n)
  (vreduce #'* 1 (range 1 (1+ n))))

Benchmarking

Factorials grow too fast, so for the purpose of benchmarking let’s write a function that adds numbers from 0 below n, in sequence-y style. First using Common Lisp builtins:

1
2
(defun sum-below (n)
  (reduce #'+ (loop for i from 0 below n collect i) :initial-value 0))

And now with our virtual sequences:

1
2
(defun sum-below-2 (n)
  (vreduce #'+ 0 (range n)))

Let’s try to time the two versions. On my Mac running Clozure CL 1.7, this gives:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CL-USER> (time (sum-below 10000000))
(SUM-BELOW 10000000) took 8,545,512 microseconds (8.545512 seconds) to run
                    with 2 available CPU cores.
During that period, 2,367,207 microseconds (2.367207 seconds) were spent in user mode
                    270,481 microseconds (0.270481 seconds) were spent in system mode
5,906,274 microseconds (5.906274 seconds) was spent in GC.
 160,000,016 bytes of memory allocated.
 39,479 minor page faults, 1,359 major page faults, 0 swaps.
49999995000000

CL-USER> (time (sum-below-2 10000000))
(SUM-BELOW-2 10000000) took 123,081 microseconds (0.123081 seconds) to run
                    with 2 available CPU cores.
During that period, 127,632 microseconds (0.127632 seconds) were spent in user mode
                    666 microseconds (0.000666 seconds) were spent in system mode
 4 minor page faults, 0 major page faults, 0 swaps.
49999995000000

As expected, SUM-BELOW-2 is much faster, causes less page faults and presumably conses less. (Critics will be quick to point out that we could idiomatically write it using LOOP’s SUM/SUMMING clause, which would probably be yet faster, and I agree; yet if we were reducing by something other than + — something that LOOP has not built in as a clause — this would not be an option.)

Conclusion

We have seen how snippets of code can be viewed as sequences and how to combine them to produce other virtual sequences. As we are nearing the end of this article, it is perhaps fitting to ask: what are the limitations and drawbacks of this approach?

Clearly, this kind of sequences is less powerful than “ordinary” sequences such as Clojure’s. The fact that we’ve built them on macros means that once we escape the world of code transformation by invoking some macro of the third class, we can’t manipulate them anymore. In Clojure world, first and rest are very similar; in virtual sequences, they are altogether different: they belong to different world. The same goes for map (had we defined one) and reduce.

But imagine that instead of having just one programming language, we have a high-level language A in which we are writing macros that expand to code in a low-level language B. It is important to point out that the generated code is very low-level. It could almost be assembly: in fact, most of the macros we’ve written don’t even require language B to have composite data-types beyond the type of elements of collections (which could be simple integers)!

Is there a practical side to this? I don’t know: to me it just seems to be something with hack value. Time will tell if I can put it to good use.