If you assemble, link and then execute it normally, typing prog in the DOS command line, it will output the string “ok”. But if you trace through the program in a debugger instead, it will say “wrong”! What’s wrong?
The problem is in lines 10-11 (instructions 3-4). Here’s what happens when you trace through this program in DOS 6.22’s DEBUG.EXE:
Note how in instruction 3 (actually displayed as the second above) we set the word SS:0xFFFC to 100. When about to execute the following instruction, we would expect that word to continue to hold the value 100, because nothing which could have changed that value has happened in between. Instead, the debugger still reports it as 0x0D8A, as if instruction 3 had not been executed at all — and, interestingly, after actually executing this instruction, CX gets yet another value of 0x7302!
Normally, thinking of DOS .COM programs, you assume a 64KB-long chunk of memory that the program has all to itself: the code starts at 0x100, the stack grows from 0xFFFE downwards (at any given time, the region from SP to 0xFFFE contains data currently on the stack), and all memory in between is free for the program to use however it deems fit. It turns out that,
when debugging, it is not the case: the debuggers need to manipulate the region just underneath the program’s stack in order to handle the tracing/breakpoint interrupt traps.
I’ve verified that both DOS’s DEBUG and Borland’s Turbo Debugger 5 do this. The unsafe-to-touch amount of space below SP that they need, however, varies. Manipulating the N constant in the original program, I’ve determined that DEBUG only needs 8 bytes below SP, whereas for TD it is a whopping 18 bytes.
Dust has now mostly settled down on 2048. Yet, in all the deluge
of variants and clones that has swept through Hacker News, little
has been written about the experience of modifying the game. As I too
have jumped on the 2048-modding bandwagon, it’s time to fill that gap,
because, as we shall see, the code more than deserves a close look.
I’ll start with briefly describing my variant. It’s called
“words oh so great” (a rather miserable attempt at a pun on
“two-oh-four-eight”) and is a consequence of a thought I had, being an
avid Scrabble player, after seeing the 3D and 4D versions:
“what if we mashed 2048 and Scrabble together?” The answer just lended
Letters instead of number tiles, that was obvious. And you use them to
form words. It is unclear how merging tiles should work: merging two
identical tiles, as in the original, just wouldn’t make sense here, so
drop the concept of merging and make the tiles disappear instead when
you form a word. In Scrabble, the minimum length of a word is two, but
allowing two-letter words here would mean too many words formed
accidentally, so make it at least three. And 16 squares sounds like
too tight a space, so increase it to 5x5. And there you have the
I cloned the Git repo, downloaded an English word list
(EOWL), and set out to work. It took me just over three hours
from the initial idea to putting the modified version online and
submitting a link to HN. I think three hours is not bad, considering
that I’ve significantly changed the game mechanics. And, in my opinion,
this is a testimony to the quality of Gabriele Cirulli’s code.
The code follows the MVC pattern, despite not relying on any
frameworks or libraries. The model is comprised of the Tile and
Grid classes, laying out the universe for the game as well as
some basic rules governing it, and the GameManager that
implements the game mechanics: how tiles move around, when they can
merge together, when the game ends, and so on. It also uses a helper
class called LocalStorageManager to keep the score and save it
in the browser’s local storage.
The view part is called an “actuator” in 2048 parlance. The
HTMLActuator takes the game state and updates the DOM tree
accordingly. It also uses a micro-framework for animations. The
controller takes the form of a KeyboardInputManager, whose job
is to receive keyboard events and translate them to changes of the
The GameManager also contains some code to tie it all together — not
really a part of the model as in MVC. Despite this slight
inconsistency, the separation of concerns is very neatly executed in
2048’s code; I would even go so far as to say that it could be used as
a demonstration in teaching MVC to people.
The only gripe I had with the code is that it violates the DRY
principle in several places. Specifically, to change the board size to
5x5, I had to modify as many as three places: the HTML (it contains
the initial definition for the DOM, including 16 empty divs making up
the grid, which is unfortunate — I’d change it to set up the DOM at
runtime during initialization); the model (instantiation of
GameManager); and the .scss file from which the CSS is generated.
While on this topic, let me add that 2048’s usage of SASS is a prime
example of its capabilities. It is very instructive to see how the
sizing and positioning of the grid, and also styling for the tiles
down to the glow, is done programmatically. I was aware of the
existence of SASS before, but never got around to explore it. Now, I’m
sold on it.
To sum up: 2048 rocks. And it’s fun to modify. Go try it.
I’ve implemented this several months ago, pushed it to Github and
development has pretty much stalled since then. And after seeing
this recent post on HN today, I’ve decided to give Lithium
a little more publicity, in the hope that it will provide a boost
of motivation to me. Because what we have here is pretty similar
to Rustboot: it’s a 16-bit kernel written in Clojure.
Well, sort of.
After writing a basic assembler capable of building bare binaries of
simple x86 real-mode programs, I’ve decided to make it a building
block of a larger entity. So I’ve embarked on a project to implement a
compiler for a toy Lisp-like language following the paper “An Incremental Approach to Compiler Construction”, doing it in
Clojure and making the implemented language similar to Clojure rather
than to Scheme.
(Whether it actually can be called Clojure is debatable. It’s unclear
what the definition of Clojure the language is. Is running on JVM
a part of what makes Clojure Clojure? Or running on any host
platform? Is ClojureScript Clojure? What about ClojureCLR, or
So far I’ve only gotten to step 7 of 24 or so, but that’s already
enough to have a working loop/recur implementation, and it
was trivial to throw in some graphical mode 13h primitives
to be able to implement this effect.
By default I’m running Lithium programs as DOS .COM binaries under
DOSBox, but technically, the code doesn’t depend on DOS in any way
(it doesn’t ever invoke interrupt 21h) and so it can be combined
with a simple bootloader into a kernel runnable on the bare metal.
The obligatory HOWTO on reproducing the effect: install DOSBox
and Leiningen, checkout the code, launch a REPL with
lein repl, execute the following forms, and enjoy the slowness
with which individual pixels are painted:
Living in London means that I now have a whole lot of new area to
explore by cycling or walking. I try to take every opportunity to
spend a free day or weekend out. One of the most important things when
on the move is knowing where you are, where to go, and how to get there
— and for that, you need a map. As I soon learned, the maps to use
in the UK are the Ordnance Survey ones (either the Landranger/Explorer
series, or maps by another publisher, such as AA, based on OS
data). However, the Landranger series encompasses over 200 1:50000
maps, standing at some £8 each, and when that level of detail is not
enough, there are more than 400 Explorer maps on top of that. Not only
does this get pricey after a while, but also the sheer volume of map
juggling quickly becomes impractical when you cycle a lot outside of
So I’ve turned to my old trusty iPhone 3GS as a mapping device
instead, and set out to complete a set of mapping apps that do the
job for me. In this post, I’d like to share my list.
I briefly thought of directly using OS maps on the iPhone via the
Outdoors GPS GB app; it does meet my requirement of being
accessible off-network, but the pricing of individual maps is on par
with the paper version, so I ruled it out.
Instead, I am using this trio now:
The official National Cycle Network app by Sustrans. Beside
being free, it has an advantage of detailing every numbered
national cycle route, as well as most local routes (that often
predate NCN or are not yet integrated into the network). At high
detail, the data seem to be OS-sourced, which is good.
It downloads maps from the Internet on demand, but you can also
save a map portion for future use. The app asks you how much
detail you want, tells you how large the download will be, then
proceeds to get the data. The nuisance here is that you can only
download 40 MB in one go, which corresponds to an area stretching
for approximately 50-60 km at 1:50000 (and correspondingly smaller
at 1:25000), so it takes a lot of tapping and downloading if
you’re planning a longer trip.
The other downsides are that the app is a little shaky at times,
and GPS positioning sometimes displays your position somewhat
misplaced. I mitigate this by using this app in combination with
the next one…
…which is MapsWithMe. The tagline “Offline Mobile Maps” nails
it down: it’s just maps, easily downloadable, covering the entire
world, and nothing else. This really does one thing well. The
map data source is OpenStreetMap, so all the maps are available
for free as well; one ‘Download’ tap and you’ve got the whole country
covered, once and for all. It also displays GPS position much more
reliably than NCN. On the other hand, it can’t offer quite the
same level of detail as NCN, and doesn’t know anything about cycle
routes, but it’s still highly convenient.
My typical flow when cycling in the UK is: check my position with
MapsWithMe, then optionally switch to NCN, locate the same
position on the map by hand and see where the route goes. I’ve
also done one continental three-day trip, from Dunkirk in France
to Hoek van Holland in the Netherlands, using just MapsWithMe to
navigate, and it worked out very well.
Unlike the other two, the last app I want to point out,
GPS2OS, is paid. And it’s more than worth its meager price,
despite being next to useless when cycling. But when hiking,
especially in remote mountainous areas, it can literally be a
lifesaver. Here’s the catch: my basic navigation tools in harsh
conditions are a compass and a plain ol’ paper map, and the iPhone
is treated only as a supplementary aid (you never know when the
battery goes out). However, instead of indicating the latitude
and longitude in degrees/minutes/seconds, OS maps use
their own grid. So you cannot use the default Compass app,
which tells you your position in degrees, directly with them,
and you need a tool just like this one to do the coordinate
translation. Works very well; it helped me find my way in dense
mist down from the summit of Ben Macdui during my recent holiday
One final tip: when you want to conserve battery as much as possible,
airplane mode is a real saver. However, GPS doesn’t seem to work
when airplane mode is on. So the next best thing is to remove the
SIM card (you can then reinsert it, just don’t enter the PIN),
so that the phone doesn’t keep trying to connect to cellular networks.
And keep it warm in a pocket beside your body: cold devices discharge
Ah, the golden days of childhood’s hackage. Don’t you have fond
memories of them?
I got my first PC when I was 10. It was a 486DX2/66 with 4 megs of RAM
and a 170 meg HDD; it ran DOS and had lots of things installed on it,
notably Turbo Pascal 6. I hacked a lot in it. These were pre-internet
days when knowledge was hard to come by, especially for someone living
in a small town in Poland; my main sources were the software I
had (TP’s online help was of excellent quality), a couple of books,
and a popular computing magazine that published articles on
programming. From the latter, I learned how to program the VGA: how to
enter mode 13h, draw pixels on screen, wait for vertical retrace,
manipulate the palette and how to combine these things into neat
effects. One of the very first thing I discovered was when you plot
every pixel using sum of its coordinates modulo 40 as color, you get a
nice-looking diagonal stripes effect. Because of the initially
incomprehensible inline assembly snippets appearing all over the
place, I eventually learned x86 assembly, too.
Back to 2012: I’ve long been wanting to hack on something just for
pure fun, a side pet project. Writing code for the bare metal is
fun because it’s just about as close as you can get to wielding
the ultimate power. And yet, since Clojure is so much fun too, I
wanted the project to have something to do with Clojure.
So here’s Lithium, an x86 16-bit assembler written in pure
Clojure and capable of assembling a binary version of the stripes
To try it, clone the git repo to your Linux or OS X machine,
install DOSBox, launch a REPL with Leiningen, change to the
lithium namespace and say:
(Well, this is not really a FAQ since nobody actually asked me any
questions about Lithium yet. This is more in anticipation of questions
that may arise.)
Very incomplete. To even call it pre-pre-alpha would be an
exaggeration. It’s currently little more than pure minimum required to
assemble stripes.li.clj. Output format wise, it only produces
bare binaries (similar to DOS .COMs), and that’s unlikely to change
Do you intend to continue developing it?
Absolutely. I will try to make it more complete, add 32- and possibly
64-bit modes, see how to add a macro system (since the input is
s-expressions, it should be easy to produce Clojure macros to write
assembly), write something nontrivial in it, and see how it can be
used as a backend for some higher-level language compiler (I’m not
sure yet which language that will turn out to be).
tl;dr: Don’t do it. If you really have to, use
A private function in Clojure is one that has been defined using the
defn- macro, or equivalently by setting the metadata key :private
to true on the var that holds the function. It is normally not
allowed in Clojure to call such functions from outside of the
namespace where they have been defined. Trying to do so results in an
IllegalStateException stating that the var is not public.
It is possible to circumvent this and call the private function, but
it is not recommended. That the author of the library decided to make
a function private probably means that he considers it to be an
implementation detail, subject to change at any time, and that you
should not rely on it being there. If you think it would be useful to
have this functionality available as part of the public API, your best
bet is to contact the library author and consult the change, so that
it may be included officially in a future version.
Contacting the author, however, is not always feasible: she may not be
available or you might be in haste. In this case, several workarounds
are available. The simplest is to use
(#'other-library/private-function args), which works in Clojure
1.2.1 and 1.3.0 (it probably works in other versions of Clojure as
well, but I haven’t checked that).
Why does this work? When the Clojure compiler encounters a form
(sym args), it invokes analyzeSeq on that form. If its first
element is a symbol, it proceeds to analyze that symbol. One of the first
operation in that analysis is checking if it names an inline function,
by calling isInline. That function looks into the metadata of the
Var named by the symbol in question. If it’s not public, it
throws an exception.
On the other hand, #' is the reader macro for var. So our
workaround is equivalent to
((var other-library/private-function) args).
In this case, the first element of the form is not a symbol, but a form
that evaluates to a var. The compiler is not able to check for this
so it does not insert a check for privateness. So the code compiles
to calling a Var object.
Here’s the catch: Vars are callable, just like functions. They
implement IFn. When a var is called, it delegates the call
to the IFn object it is holding. This has been recently
discussed on the Clojure group. Since that delegation does not
check for the var’s privateness either, the net effect is that
we are able to call a private function this way.
I’ve moved to London last September. Like many new Londoners, I have
changed accommodation fairly quickly, being already after one removal
and with another looming in a couple of months; my current flat was
largely unfurnished when I moved in, so I had to buy some basic
homeware. I didn’t want to invest much in it, since it’d be only for a
few months. Luckily, it is not hard to do that cheaply: many people
are moving out and getting rid of their stuff, so quite often you can
search for the desired item on Gumtree and find there’s a cheap
one a short bike ride away.
Except when there isn’t. In this case, it’s worthwhile to check again
within a few days as new items are constantly being posted. Being
lazy, I’ve decided to automate this. A few hours and a hundred lines
of Clojure later, gumtree-scraper was born.
I’ve packaged it using lein uberjar into a standalone jar, which,
when run, produces a gumtree.rss that is included in my Google
Reader subscriptions. This way, whenever something I’m interested in
appears, I get notified within an hour or so.
It’s driven by a Google spreadsheet. I’ve created a sheet that has
three columns: item name, minimum price, maximum price; then I’ve made
it available to anyone who knows the URL. This way I can edit it
pretty much from everywhere without touching the script. Each time the
script is run (by cron), it downloads that spreadsheet as a CSV that
looks like this:
For each row the script queries Gumtree’s category “For Sale” within
London given the price range, gets each result and transforms it to
a RSS entry.
Gumtree has no API, so I’m using screenscraping to retrieve all the
data. Because the structure of the pages is much simpler, I’m actually
scraping the mobile version; a technical twist here is that the
mobile version is only served to actual browsers so I’m supplying a
custom User-Agent, pretending to be Safari. For actual scraping, the
code uses Enlive; it works out nicely.
About half of the code is RSS generation — mostly XML emitting. I’d
use clojure.xml/emit but it’s known to produce malformed XML at
times, so I include a variant that should work.
In case anyone wants to tries it out, be aware that the location and
category are hardcoded in the search URL template; if you want, change
the template line in get-page. The controller spreadsheet URL is
not, however, hardcoded; it’s built up using the spreadsheet.key
system property. Here’s the wrapper script I use that is actually run
This has somehow escaped me: just over a year ago, the Sixth Civil
Division of the Lublin-West Regional Court in Lublin, Poland, has
opened its online branch. It serves the entire territory of
Poland and is competent to recognize lawsuits concerning payment
claims. There is basic information available in English. It has
proven immensely popular, having processed about two million cases in
its first year of operation.
And the really cool thing is, they have an API.
It’s SOAP-based and has a publicly available spec. (Due to the
way their web site is constructed, I cannot link to the spec directly;
this last link leads to a collection of files related to the web
service. The spec is called EpuWS_ver.1.14.1.pdf; it’s in Polish
only, but it should be easy to run it through Google Translate.) There
are a couple of XML schemas as well, plus the spec contains links to
a WSDL and some code samples (in C#) at the end.
To actually use the API, you need to get yourself an account of the
appropriate type (there are two types corresponding to two groups of
methods one can use: that of a bailiff and of a mass plaintiff). You
then log on to the system, where you can create an API key that is
later used for authentication. They throttle the speed down to 1 req/s
per user to mitigate DoS attacks.
The methods include FileLawsuits, FileComplaints,
SupplyDocuments, GetCaseHistory and so on (the actual names are in
Polish). To give you an example, the FileLawsuits method returns a
structure that consists of, inter alia, the amount of court
fee to pay, the value of the matter of dispute (both broken down into
individual lawsuits), and a status code with a description.
There are a number of motivations for this post. One stems from my
extensive exposure to Clojure over the past few years: this was, and
still is, my primary programming language for everyday work. Soon, I
realized that much of the power of Clojure comes from a sequence
abstraction being one of its central concepts, and a standard library
that contains many sequence-manipulating functions. It turns out that
by combining them it is possible to solve a wide range of problems in
a concise, high-level way. In contrast, it pays to think in terms of
whole sequences, rather than individual elements.
Another motivation comes from a classical piece of functional
programming humour, The Evolution of a Haskell Programmer. If you
don’t know it, go check it out: it consists of several Haskell
implementations of factorial, starting out from a straightforward
recursive definition, passing through absolutely hilarious versions
involving category-theoretical concepts, and finally arriving at this
simple version that is considered most idiomatic:
This is very Clojure-like in that it involves a sequence (a list
comprehension). In Clojure, this could be implemented as
(defn fac[n](reduce * 1(range 1(inc n))))
Now, I thought to myself, how would I write factorial in an imperative
language? Say, Pascal?
This is very different from the functional version that works with
sequences. It is much more elaborate, introducing an explicit loop.
On the other hand, it’s memory efficient: it’s clear that its memory
requirements are O(1), whereas a naïve implementation of a sequence
would need O(n) to construct it all in memory and then reduce it down
to a single value.
Or is it really that different? Think of the changing values of i in
that loop. On first iteration it is 1, on second iteration it’s 2, and
so on up to n. Therefore, one can really think of a for loop as a
sequence! I call it a “virtual” sequence, since it is not an actual
data structure; it’s just a snippet of code.
To rephrase it as a definition: a virtual sequence is a snippet of
code that (presumably repeatedly) yields the member values.
Let’s write some code!
To illustrate it, throughout the remainder of this article I will be
using Common Lisp, for the following reasons:
It allows for imperative style, including GOTO-like statements.
This will enable us to generate very low-level code.
Thanks to macros, we will be able to obtain interesting
Okay, so let’s have a look at how to generate a one-element
sequence. Simple enough:
The name VSINGLE stands for “Virtual sequence that just yields a
SINGLE element”. (In general, I will try to define virtual sequences
named and performing similarly to their Clojure counterparts here;
whenever there is a name clash with an already existing CL function,
the name will be prefixed with V.) We will not concern ourselves
with the actual definition of YIELD at the moment; for debugging, we
can define it just as printing the value to the standard output.
We can also convert a Lisp list to a virtual sequence which just
yields each element of the list in turn:
Now let’s try to define RANGE. We could use loop, but for the sake
of example, let’s pretend that it doesn’t exist and write a macro that
expands to low-level GOTO-ridden code. For those of you who are not
familiar with Common Lisp, GO is like GOTO, except it takes a label
that should be established within a TAGBODY container.
Infinite virtual sequences are also possible. After all, there’s
nothing preventing us from considering a snippet of code that loops
infinitely, executing YIELD, as a virtual sequence! We will define
the equivalent of Clojure’s iterate: given a function fun and
initial value val, it will repeatedly generate val, (fun val),
(fun (fun val)), etc.
So far, we have defined a number of ways to create virtual sequences.
Now let’s ask ourselves: is there a way, given code for a virtual
sequence, to yield only the elements from the original that satisfy a
certain predicate? In other words, can we define a filter for
virtual sequences? Sure enough. Just replace every occurrence of
yield with code that checks whether the yielded value satisfies the
predicate, and only if it does invokes yield.
First we write a simple code walker that applies some
transformation to every yield occurrence in a given snippet:
It is important to point out that since filter is a macro, the
arguments are passed to it unevaluated, so if vseq is a virtual
sequence definition like (range 10), we need to macroexpand it
before replacing yield.
We can now verify that (filter #'evenp (range 10)) works. It
macroexpands to something similar to
concat is extremely simple. To produce all elements of vseq1
followed by all elements of vseq2, just execute code corresponding
to vseq1 and then code corresponding to vseq2. Or, for multiple
To define take, we’ll need to wrap the original code in a block
that can be escaped from by means of return-from (which is just
another form of goto). We’ll add a counter that will start from
n and keep decreasing on each yield; once it reaches zero, we
escape the block:
vfirst is another matter. It should return a value instead of
producing a virtual sequence, so we need to actually execute the code
— but with yield bound to something else. We want to establish a
block as with take, but our yield will immediately return from the
block once the first value is yielded:
Note that so far we’ve seen three classes of macros:
macros that create virtual sequences;
macros that transform virtual sequences to another virtual
and finally, vfirst is our first example of a macro that
produces a result out of a virtual sequence.
Our next logical step is vreduce. Again, we’ll produce code
that rebinds yield: this time to a function that replaces
the value of a variable (the accumulator) by result of calling
a function on the accumulator’s old value and the value being yielded.
As expected, SUM-BELOW-2 is much faster, causes less page faults and
presumably conses less. (Critics will be quick to point out that we
could idiomatically write it using LOOP’s SUM/SUMMING clause,
which would probably be yet faster, and I agree; yet if we were
reducing by something other than + — something that LOOP has not
built in as a clause — this would not be an option.)
We have seen how snippets of code can be viewed as sequences and how
to combine them to produce other virtual sequences. As we are nearing
the end of this article, it is perhaps fitting to ask: what are the
limitations and drawbacks of this approach?
Clearly, this kind of sequences is less powerful than “ordinary”
sequences such as Clojure’s. The fact that we’ve built them on macros
means that once we escape the world of code transformation by invoking
some macro of the third class, we can’t manipulate them anymore. In
Clojure world, first and rest are very similar; in virtual
sequences, they are altogether different: they belong to different
world. The same goes for map (had we defined one) and reduce.
But imagine that instead of having just one programming language, we
have a high-level language A in which we are writing macros that
expand to code in a low-level language B. It is important to point out
that the generated code is very low-level. It could almost be
assembly: in fact, most of the macros we’ve written don’t even require
language B to have composite data-types beyond the type of elements of
collections (which could be simple integers)!
Is there a practical side to this? I don’t know: to me it just seems
to be something with hack value. Time will tell if I can put it to
This is a slightly edited translation of an article I first
published on my Polish blog on January 19, 2011. It is meant to target
newcomers to Clojure and show how to use Clojure to solve a simple
Some time ago I was asked to prepare a couple of differently-colored
maps of Europe. I got some datasets which mapped countries of Europe
to numerical values: the greater the value, the darker the corresponding
color should be. A sample colored map looked like this:
I began by downloading an easily editable map from Wikipedia
Commons, calculated the required color intensities for the first
dataset, launched Inkscape and started coloring. After half an
hour of tedious clicking, I realized that I would be better off
writing a simple program in Clojure that would generate the map for
me. It turned out to be an easy task: the remainder of this article
will be an attempt to reconstruct my steps.
The format of the source image is SVG. I knew it was an XML-based
vector graphics format, I’d often encountered images in this format on
Wikipedia — but editing it by hand was new to me. Luckily, it turned
out that the image has a simple structure. Each country’s envelope
curve is described with a path element that looks like this:
<pathid="pl"class="eu europe"d="a long list of curve node coordinates"/>
An important thing to note here is the id attribute — this is the two-letter
ISO-3166-1-ALPHA2 country code. In fact, there is an informative comment
right at the beginning of the image that explains the naming conventions
used. Having such a splendid input was of great help.
Just like HTML, SVG uses CSS stylesheets to define the look of an
element. All that is needed to color Poland red is to style the element
with a fill attribute:
<pathid="pl"style="fill: #ff0000;"class="eu europe"d="a long list of curve node coordinates"/>
Now that we know all this, let’s start coding!
XML in Clojure
The basic way to handle XML in Clojure is to use the clojure.xml
namespace, which contains functions that parse XML (on a DOM basis,
i.e., into an in-memory tree structure) and serialize such
structures back into XML. Let us launch a REPL and start by
reading our map and parsing it:
Hold on in there! What’s that SocketException doing here? Firefox
displays this map properly, so does Chrome, WTF?! Shouldn’t everything
work fine in such a great language as Clojure?
Well, the language is as good as its libraries — and when it comes
to Clojure, one can stretch that thought further: Clojure libraries
are as good as the Java libraries they use under the hood. In this
case, we’ve encountered a feature of the standard Java XML parser
(from javax.xml package). It is restrictive and tries to reject
invalid documents (even if they are well-formed). If the file
being parsed contains a DOCTYPE declaration, the Java parser, and hence
clojure.xml/parse, tries to download the DTD schema from the given
address and validate the document against that schema. This is unfortunate
in many aspects, especially from the point of view of the World Wide
Web Consortium, since their servers hold the Web standards. One
can easily imagine the volume of network traffic this generates:
W3C has a blog post about it. Many Java programmers have encountered
this problem at some time. There are a few solutions; we will go
the simplest way and just manually remove the offending DOCTYPE declaration.
This time we managed to parse the image. Viewing the structure is
not easy because of its sheer size (as expected: the file weighs in
at over 0,5 MB!), but from the very first characters of the REPL’s
output we can make out that’s it a Clojure map (no pun intended).
Let’s examine its keys:
> (keys m)(:tag:attrs:content)
So the map contains three entries with descriptive names.
:tag contains the name of the XML elements, :attrs is
a map of attributes for this element, and content is
a vector of its subelements, each in turn being represented
by similarly structured map (or a string if it’s a text node):
Just for the sake of practice, let’s try to write the serialized
representation of the parsed back as XML. The function emit should
be able to do it, but it prints XML to standard output. We can use
the with-out-writer macro from the namespace clojure.contrib.io
to dump the XML to a file:
Error parsing XML: not well-formed
Row 15, column 44: Updated to reflect dissolution of Serbia & Montenegro: http://commons.wikimedia.org/wiki/User:Zirland
It turns out that using clojure.xml/emit is not recommended, because
it does not handle XML entities in comments correctly; we should
use clojure.contrib.lazy-xml instead. For the sake of example, though,
let’s stay with emit and manually remove the offending line once
again (we can safely do it, since that’s just a comment).
We saw earlier that our main XML node contains 68 subnodes. Let’s see
what they are — tag names will suffice:
(This snippet of code filters the list of subnodes of m to pick only
those elements whose tag name is path and value of attribute id is
pl, and returns the length of such list.) Let’s try to add a style
attribute to that element, according to what we said earlier. Because
Clojure data structures are immutable, we have to define a new top-level
element which will be the same as m, except that we will set the style
of the appropriate subnode:
This function is similar to the anonymous one we used above in the
map call, but differs in some respects. It takes two arguments.
As mentioned, the first one is the XML element (destructured
into tag and attrs: you can read more about destructuring in
the appropriate part of Clojure docs), and the second argument
is… a function that should take a two-letter country code and return
a HTML color description (or nil, if that country’s color is not
specified — color-state will cope with this and return the element
Now that we have color-state, we can easily write a higher-level function
that processes and writes XML in one step:
Inspired by our success, we try to color different countries.
It mostly works, but the United Kingdom remains gray, regardless
of whether we specify its code as “uk” or “gb”. We resort to the
source of our image, and the beginning comment once again proves
Certain countries are further subdivided the United Kingdom has
gb-gbn for Great Britain and gb-nir for Northern Ireland. Russia is
divided into ru-kgd for the Kaliningrad Oblast and ru-main for the
Main body of Russia. There is the additional grouping #xb for the
“British Islands” (the UK with its Crown Dependencies –
Jersey, Guernsey and the Isle of Man)
Perhaps we have to specify “gb-gbn” and “gb-nir”, instead of just “gb”?
We try that, but still no luck. After a while of thought: oh yes! Our
initial assumption that all the country definitions are path subnodes
of the toplevel svg node is false. We have to fix that.
So far we have been doing a “flat” transform of the SVG tree: we
only changed the subnodes of the toplevel node, but no deeper.
We should change all the path elements (and g, if we want to
color groups of paths like the UK), regardless of how deep they
occur in the tree.
We can use a zipper to do a depth-first walk of the SVG tree.
Let us define a function that takes a zipper, a predicate that tells
whether to edit the node in question, and the transformation function
to apply to the node if the predicate returns true:
We have automated the process of styling countries to make them
appear in color, but translating particular numbers to RGB is tedious.
In the last part of this article we will see how to ease this:
we are going to write a colorizer, i.e., a function suitable for
passing to color-state and save-color-map (so far we’ve been
using maps for this).
Let’s start by writing a functions that translates a triplet of numbers
into a HTML RGB notation, because it will be easier for us to work
with integers than with strings:
Now imagine we have a table with numeric values for states, like this:
We want to have a function that assigns colors to states, such that
the intensity of a color should be proportional to the value assigned
to a given state. To be more general, assume we have two colors,
c1 and c2, and for a given state, for each of the R, G, B components
we assign a value proportional to the difference between the state’s value
and the smallest value in the dataset, normalized to lie between c1 and c2.
This sounds complex, but I hope an example will clear things up.
This is the Clojure implementation of the described algorithm:
The second argument means that the red component is to range between
0 and 255, and the green and blue components are to be fixed at 0.
Like we wanted, Germany ends up darkest (because it has the least value),
the Netherlands is lightest (because it has the greatest value), and Poland’s
intensity is one third that of the Netherlands (because 20 is in one third of
the way between 15 and 30).
The application we created can be further developed in many ways.
One can, for instance, add a Web interface for it, or write
many different colorizers (e.g., discrete colorizer: fixed colours
for ranges of input values, or a temperature colorizer transitioning
smoothly from blue through white to red — to do this we would have to
pass through the HSV color space).
What is your idea to improve on it? For those of you who are tired
of pasting snippets of code into the REPL, I’m putting the complete
source code with a Leiningen project on GitHub. Forks are