Tue, 04 May 2010
Keyword arguments
There’s been an ongoing debate about how to pass optional named
arguments to Clojure functions. One way to do this is the defnk
macro from clojure.contrib.def; I hesitate to call it canonical,
since apparently not everyone uses it, but I’ve found it useful a
number of times. Here’s a sample:
user> (use 'clojure.contrib.def)
nil
user> (defnk f [:b 43] (inc b))
#'user/f
user> (f)
44
user> (f :b 100)
101
This is an example of keyword arguments in action. Keyword arguments
are a core feature of some languages, notably Common Lisp and
Objective Caml. Clojure doesn’t have them, but it’s pretty easy to
emulate their basic usage with macros, as defnk does.
But there’s more to Common Lisp’s keyword arguments than defnk
provides. In CL, the default value of a keyword argument can be an
expression referring to other arguments of the same function. For
example:
CL-USER> (defun f (&key (a 1) (b a))
(+ a b))
F
CL-USER> (f)
2
CL-USER> (f :a 45)
90
CL-USER> (f :b 101)
102
I wish defnk had this feature. Or is there some better way that I
don’t know of?
Sun, 18 Apr 2010
Sunflower
The program I’ve been writing about recently has come to a point where I think it can be shown to the wide public. It’s called Sunflower and has its home on GitHub. It’s nowhere near being completed, and of alpha quality right now, but even at this stage it might be useful.
Just as sunflower seed kernels come wrapped in hulls, most HTML documents seen in the wild come wrapped in noise that is not really part of the document itself. Take any news site: a document from such a site contains things such as advertisements, header, footer, and many links. Now suppose you have many documents grabbed from the same site. Is it possible to somehow automate the extraction of the document “essences”?
Sunflower to the rescue. It relies on the assumption that documents coming from the same source have the same structure. It presents a list of strings to the user, and asks to pick those that are contained in the text essence. Then it finds the coordinates of the smallest HTML subtree that contains all those strings, and uses those coordinates to extract information from all documents. And it comes with a nice, easily understandable GUI for that.
This technique works remarkably well for many collections, although not all. An earlier, proof-of-concept implementation (in Common Lisp) has been used to extract many press texts for the National Corpus of Polish.
I’ve given up on the symbol-capturing approach to wizards I’ve presented in my previous posts. Inspired by the DOM tree in Web apps, with a bag of elements with identifiers, I now have a central bag of Swing widgets (implemented as an atom) identified by keywords. This bag contains tidbits of the mutable state of Sunflower. This means that I can write callback functions like this:
#(with-components [strings-model selected-dir]
(.removeAllElements strings-model)
(let [p (-> selected-dir htmls first parse)]
(add-component :parsed p)
(doseq [x (strings p)]
(.addElement strings-model x))))
Name and conquer: having parts of state explicitly named mean that I can reliably access them from just about anywhere. This reduces confusion and allows for less tangled, more self-contained and understandable code.
Mon, 05 Apr 2010
A case for symbol capture
Clojure by default protects macro authors from incidentally capturing a local symbol. Stuart Halloway describes this in more detail, explaining why this is a Good Thing. However, sometimes this kind of symbol capture is called for. I’ve encountered one such case today while hacking a Swing application.
As I develop the app, I find new ways to express Swing concepts and
interact with Swing objects in a more Clojuresque way, so a library of
GUI macros and functions gets written. One of them is a wizard
macro for easy creation of installer-like wizards, where there is a
sequence of screens that can be navigated with Back and Next
buttons at the bottom of the window.
The API (certainly not finished yet) currently looks like this:
(wizard & components)
where each Swing component corresponding to one wizard screen can be
augmented by a supplementary map, which can contain, inter alia, a
function to execute upon showing the screen in question.
Now, I want those functions to be able to access the Back and Next
buttons in case they want to disable or enable them at need. I thus
want the API user to be able to use two symbols, back-button and
next-button, in the macro body, and have them bound to the
corresponding buttons.
It is crucial that these bindings be lexical and not dynamic. If they were dynamic, they would be only effective during the definition of the wizard, but not when my closures are invoked later on. Thus, my implementation looks like this:
(defmacro wizard [& panels]
`(let [~'back-button (button "< Back")
~'next-button (button "Next >")]
(do-wizard ~'back-button ~'next-button ~(vec panels))))
where do-wizard is a private function implementing the actual wizard
creation, and the ~'foo syntax forces symbol capture.
By the way, if all goes well, this blog post should be the first one syndicated to Planet Clojure. Hello, Planet Clojure readers!
Sun, 04 Apr 2010
Hiking in the Apennines
I’ve recently done a week-long hike in the Umbria-Marche region of the Italian Apennines (the vicinity of Monte Catria, near Cantiano, to be more precise), and here are some tips I’d like to share.
- The Umbria-Marche Apennine doesn’t seem to be frequented by a lot of tourists, especially in mid-March. The information offices, although helpful, are often closed (this is not only the case with the mountain region: contrary to information available on the Web, the tourist information at Forli airport was closed on Sunday morning), and most of the Italians we’ve met didn’t speak English.
- The tourist trails in the region are not well marked. Direction marks are nowhere to be found, nor are the signs visible on junctions. We had to ask the locals when leaving Cantiano for Monte Tenetra (and ended up on M. Alto instead anyway).
- There are a lot of rifugi (mountain huts), but most of them are closed at this time of year. We passed by six or seven, out of which only one was available for sleep: Rifugio Fonte del Faggio (depicted), merely a small bothy with one worm-eaten bunk bed. Another one, Cupa delle Cotaline, with restaurant facilities and situated by a station of a local skilift, opened in the morning, but was closed for the night.

Rifugio Fonte del Faggio
Wed, 31 Mar 2010
The pitfalls of `lein swank`
A couple of weeks ago I finally got around to acquainting myself with Leiningen, one of the most popular build tools for Clojure. The thing that stopped me the most was that Leiningen uses Maven under the hood, which seemed a scary beast at first sight – but once I’ve overcome the initial fear, it turned out to be a quite simple and useful tool.
One feature in particular is very useful for Emacs users like me:
lein swank. You define all dependencies in project.clj as usual,
add a magical line to :dev-dependencies, then say
$ lein swank
and lo and behold, you can M-x slime-connect from your Emacs and
have all the code at your disposal.
There is, however, an issue that you must be aware of when using lein
swank: Leiningen uses a custom class loader – AntClassLoader to be
more precise – to load the Java classes referenced by the code.
Despite being a seemingly irrelevant thing – an implementation detail
– this can bite you in a number of most surprising and obscure ways.
Try evaluating the following code in a Leiningen REPL:
(str (.decode
(java.nio.charset.Charset/forName "ISO-8859-2")
(java.nio.ByteBuffer/wrap
(into-array Byte/TYPE (map byte [-79 -26 -22])))))
==> "???"
The same code evaluated in a plain Clojure REPL will give you "ąćę",
which is a string represented in ISO-8859-2 by the three bytes from
the above snippet.
Whence the difference? Internally, each charset is represented as a
unique instance of its specific class. These are loaded lazily as
needed by the Charset/forName method. Presumably, the system class
loader is used for that, and somewhere along the way a
SecurityException gets thrown and caught.
Note also that there are parts of Java API which use the charset
lookup under the hood and are thus vulnerable to the same problem,
for example Reader constructors taking charset names. If you use
clojure.contrib.duck-streams, then rebinding *default-encoding*
will not work from a Leiningen REPL. Jars and überjars produced by
Leiningen should be fine, though.
Tue, 16 Feb 2010
Downcasing strings
I just needed to convert a big (around 200 MB) text file, encoded in
UTF-8 and containing Polish characters, all into lowercase. tr to
the rescue, right? Well, not quite.
$ echo ŻŹŚÓŃŁĘĆĄ | tr A-ZĄĆĘŁŃÓŚŹŻ a-ząćęłńóśźż
żźśóńłęćą
Looks reasonable (apart from the fact that I need to specify an
explicit character mapping — it would be handy to just have a
lcase utility or suchlike); but here’s what happens on another
random string:
$ echo abisyński | tr A-ZĄĆĘŁŃÓŚŹŻ a-ząćęłńóśźż
abisyŅski
I was just about to report this as a bug, when I spotted the following in the manual:
Currently
trfully supports only single-byte characters. Eventually it will support multibyte characters; when it does, the-Coption will cause it to complement the set of characters, whereas-cwill cause it to complement the set of values.
Turns out some of the basic tools don’t support multibyte encodings.
dd conv=lcase, for instance, doesn’t even pretend to touch non-ASCII
letters, and perl’s tr operator likewise fails miserably even when
one specifies use utf8.
This is a sad, sad state of affairs. It’s 2010, UTF-8 has been around for seventeen years, and it’s still not supported by one of the core operating system components as other encodings are becoming more and more obsolete. I’m dreaming of the day my system uses it internally for everything.
Fortunately, not everything is broken. Gawk, for example, works:
$ echo koŃ i żÓłw | gawk '{ print tolower($0); }'
koń i żółw
and so does sed.
Update 2010-04-04: I should have been more specific. The above rant
applies to the GNU tools (tr and dd) as found in most Linux
distributions; other versions can be more featureful. As Alex Ott
points out in an email comment, tr on OS X works as expected for
characters outside of ASCII, and also supports character classes as in
tr '[:upper:]' '[:lower:]'. This is yet another testimony to
general high quality of Apple software; in this particular case,
though, it may well be a direct effect of OS X’s BSD heritage. Does
it work on *BSD?
Wed, 10 Feb 2010
Clojure SET
I’ve just taken a short breath off work to put some code on GitHub that I had written over one night some two months ago. It is an implementation of the Set game in Clojure, using Swing for GUI.
I do not have time to clean up or comment the code, so I’m leaving it as is for now; however, I hope that even in its current state it can be of interest, especially for Clojure learners.
Some random notes on the code:
- Clojure is concise! The whole thing is just under 250 lines of code, complete with game logic and the GUI. Of these, the logic is about 50 LOC. Despite this it reads clearly and has been a pleasure to write, thanks to Clojure’s supports for sets as a data structure (in vein of the game’s title and theme).
- There are no graphics included. All the drawing is done in the GUI part of code (I’ve replaced the canonical squiggle shape by a triangle and stripes by gradients, for the sake of easier drawing).
- I’ve toyed around with different Swing layout managers for this
game. Back in the days when I wrote in plain Java, I used to use
TableLayout, but it has a non-free license; JGoodies Forms is
also nice, but has a slightly more complicated API (and it’s an
additional dependency, after all). In the end I’ve settled with
the standard GridBagLayout, which is similar in spirit to those
two, but requires more boilerplate to set up. As it turned out,
simple macrology makes it quite pleasurable to use; see
add-gridbagin the code for details. - Other things of interest might be my function to randomly shuffle seqs, which strikes a nice balance between simplicity/conciseness of implementation and randomness; and a useful debugging macro.
Comments?
Mon, 18 Jan 2010
Reactivation (and some ramblings on my blogging infrastructure)
This blog has not seen content updates in more than a year. Plenty of things can happen in such a long period, and in fact many aspect of my life have seen major changes over this time. I’m not, however, going to write a lengthy post about all that right now. Instead, I just would like to announce the reactivation of the blog.
You might have noticed that many things have changed. First, the blog has a new address: http://blog.danieljanus.pl; the address of the RSS feed has also changed and is now http://blog.danieljanus.pl/index.rss — please update your readers!
Probably the most important change is that you now may post comments under the entries, even though this blog continues to be just a bunch of static HTML pages. This is possible thanks to the Disqus service. I wonder whether it will encourage people to give feedback: I have received very few email comments since I started blogging. Also, the static calendar at the top of each page is gone, replaced by a bunch of links to archive posts.
I have long been considering changing Blosxom to something else. The main reason for such a step is that it’s written in Perl, which makes it particularly hard to debug upon encountering an unexpected behaviour. The single most irritating thing was that Blosxom would unexpectedly change the date of a post that was edited (which did not let me fix typos and other glitches); I found a patch for this somewhere, but lost it.
On the other hand, I really liked — and still like — Blosxom’s minimalistic approach and the ease of adding posts. (The very idea of installing a monstrosity such as Wordpress, with its gazillion of features I don’t need, posts kept in a database and what not, makes me feel dizzy.) I fiddled for a while with the thought of reimplementing Blosxom in Common Lisp, but that turned out to be a more time-consuming project than it initially seemed. So when I found The Unofficial Blosxom User Group and learned that, contrary to my belief, Blosxom is still actively maintained and has a thriving community, I ended up staying with the original Perl version, refining my installation so that it no longer gets in the way (this FAQ entry did the trick). I also rewrote all my source text files to Markdown, which made them vastly more readable and easy to edit, updating links and adding short followup notes where appropriate, but otherwise leaving old entries as they were.
I’d like to thank Maciek Pasternacki for inspiring me to finally get around to this. While my plans are not as ambitious as his — I am not courageous enough to publicly prove my perseverance, so my blogging will likely continue to be irregular — I plan to write more (having accumulated many ideas for blog posts) and I hope the periods of silence will be much shorter than hitherto.
I would like to take this opportunity to wish my readers all the best in the New Year!