Sequences and iteration
There are a number of motivations for this post. One stems from my extensive exposure to Clojure over the past few years: this was, and still is, my primary programming language for everyday work. Soon, I realized that much of the power of Clojure comes from a sequence abstraction being one of its central concepts, and a standard library that contains many sequence-manipulating functions. It turns out that by combining them it is possible to solve a wide range of problems in a concise, high-level way. In contrast, it pays to think in terms of whole sequences, rather than individual elements.
Another motivation comes from a classical piece of functional programming humour, The Evolution of a Haskell Programmer. If you don’t know it, go check it out: it consists of several Haskell implementations of factorial, starting out from a straightforward recursive definition, passing through absolutely hilarious versions involving category-theoretical concepts, and finally arriving at this simple version that is considered most idiomatic:
This is very Clojure-like in that it involves a sequence (a list comprehension). In Clojure, this could be implemented as
Now, I thought to myself, how would I write factorial in an imperative language? Say, Pascal?
1 2 3 4 5 6 7 8 9
This is very different from the functional version that works with sequences. It is much more elaborate, introducing an explicit loop. On the other hand, it’s memory efficient: it’s clear that its memory requirements are O(1), whereas a naïve implementation of a sequence would need O(n) to construct it all in memory and then reduce it down to a single value.
Or is it really that different? Think of the changing values of
that loop. On first iteration it is 1, on second iteration it’s 2, and
so on up to n. Therefore, one can really think of a
for loop as a
sequence! I call it a “virtual” sequence, since it is not an actual
data structure; it’s just a snippet of code.
To rephrase it as a definition: a virtual sequence is a snippet of code that (presumably repeatedly) yields the member values.
Let’s write some code!
To illustrate it, throughout the remainder of this article I will be using Common Lisp, for the following reasons:
- It allows for imperative style, including GOTO-like statements. This will enable us to generate very low-level code.
- Thanks to macros, we will be able to obtain interesting transformations.
Okay, so let’s have a look at how to generate a one-element sequence. Simple enough:
VSINGLE stands for “Virtual sequence that just yields a
SINGLE element”. (In general, I will try to define virtual sequences
named and performing similarly to their Clojure counterparts here;
whenever there is a name clash with an already existing CL function,
the name will be prefixed with
V.) We will not concern ourselves
with the actual definition of
YIELD at the moment; for debugging, we
can define it just as printing the value to the standard output.
We can also convert a Lisp list to a virtual sequence which just yields each element of the list in turn:
1 2 3 4 5
Now let’s try to define
RANGE. We could use
loop, but for the sake
of example, let’s pretend that it doesn’t exist and write a macro that
expands to low-level GOTO-ridden code. For those of you who are not
familiar with Common Lisp,
GO is like GOTO, except it takes a label
that should be established within a
1 2 3 4 5 6 7 8 9 10 11 12 13
Infinite virtual sequences are also possible. After all, there’s
nothing preventing us from considering a snippet of code that loops
YIELD, as a virtual sequence! We will define
the equivalent of Clojure’s
iterate: given a function
val, it will repeatedly generate
(fun (fun val)), etc.
1 2 3 4 5 6 7
So far, we have defined a number of ways to create virtual sequences.
Now let’s ask ourselves: is there a way, given code for a virtual
sequence, to yield only the elements from the original that satisfy a
certain predicate? In other words, can we define a
virtual sequences? Sure enough. Just replace every occurrence of
yield with code that checks whether the yielded value satisfies the
predicate, and only if it does invokes
First we write a simple code walker that applies some
transformation to every
yield occurrence in a given snippet:
1 2 3 4 5 6
We can now write
filter like this:
1 2 3
It is important to point out that since
filter is a macro, the
arguments are passed to it unevaluated, so if
vseq is a virtual
sequence definition like
(range 10), we need to macroexpand it
We can now verify that
(filter #'evenp (range 10)) works. It
macroexpands to something similar to
1 2 3 4 5 6 7 8 9
concat is extremely simple. To produce all elements of
followed by all elements of
vseq2, just execute code corresponding
vseq1 and then code corresponding to
vseq2. Or, for multiple
take, we’ll need to wrap the original code in a block
that can be escaped from by means of
return-from (which is just
another form of
goto). We’ll add a counter that will start from
n and keep decreasing on each
yield; once it reaches zero, we
escape the block:
1 2 3 4 5 6 7 8 9 10
rest (or, rather,
vrest, as that name is taken) can be defined
1 2 3 4 5
vfirst is another matter. It should return a value instead of
producing a virtual sequence, so we need to actually execute the code
— but with
yield bound to something else. We want to establish a
block as with
take, but our
yield will immediately return from the
block once the first value is yielded:
1 2 3 4 5
Note that so far we’ve seen three classes of macros:
- macros that create virtual sequences;
- macros that transform virtual sequences to another virtual sequences;
- and finally,
vfirstis our first example of a macro that produces a result out of a virtual sequence.
Our next logical step is
vreduce. Again, we’ll produce code
yield: this time to a function that replaces
the value of a variable (the accumulator) by result of calling
a function on the accumulator’s old value and the value being yielded.
1 2 3 4 5
We can now build a constructs that executes a virtual sequence and
wraps the results up as a Lisp list, in terms of
1 2 3 4 5
Let’s verify that it works:
1 2 3 4 5
Hey! Did we just manipulate an infinite sequence and got the result in a finite amount of time? And that without explicit support for laziness in our language? How cool is that?!
Anyway, let’s finally define our factorial:
Factorials grow too fast, so for the purpose of benchmarking let’s write a function that adds numbers from 0 below n, in sequence-y style. First using Common Lisp builtins:
And now with our virtual sequences:
Let’s try to time the two versions. On my Mac running Clozure CL 1.7, this gives:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
SUM-BELOW-2 is much faster, causes less page faults and
presumably conses less. (Critics will be quick to point out that we
could idiomatically write it using
which would probably be yet faster, and I agree; yet if we were
reducing by something other than
+ — something that
LOOP has not
built in as a clause — this would not be an option.)
We have seen how snippets of code can be viewed as sequences and how to combine them to produce other virtual sequences. As we are nearing the end of this article, it is perhaps fitting to ask: what are the limitations and drawbacks of this approach?
Clearly, this kind of sequences is less powerful than “ordinary”
sequences such as Clojure’s. The fact that we’ve built them on macros
means that once we escape the world of code transformation by invoking
some macro of the third class, we can’t manipulate them anymore. In
rest are very similar; in virtual
sequences, they are altogether different: they belong to different
world. The same goes for
map (had we defined one) and
But imagine that instead of having just one programming language, we have a high-level language A in which we are writing macros that expand to code in a low-level language B. It is important to point out that the generated code is very low-level. It could almost be assembly: in fact, most of the macros we’ve written don’t even require language B to have composite data-types beyond the type of elements of collections (which could be simple integers)!
Is there a practical side to this? I don’t know: to me it just seems to be something with hack value. Time will tell if I can put it to good use.