Sun, 18 Apr 2010
Sunflower
The program I’ve been writing about recently has come to a point where I think it can be shown to the wide public. It’s called Sunflower and has its home on GitHub. It’s nowhere near being completed, and of alpha quality right now, but even at this stage it might be useful.
Just as sunflower seed kernels come wrapped in hulls, most HTML documents seen in the wild come wrapped in noise that is not really part of the document itself. Take any news site: a document from such a site contains things such as advertisements, header, footer, and many links. Now suppose you have many documents grabbed from the same site. Is it possible to somehow automate the extraction of the document “essences”?
Sunflower to the rescue. It relies on the assumption that documents coming from the same source have the same structure. It presents a list of strings to the user, and asks to pick those that are contained in the text essence. Then it finds the coordinates of the smallest HTML subtree that contains all those strings, and uses those coordinates to extract information from all documents. And it comes with a nice, easily understandable GUI for that.
This technique works remarkably well for many collections, although not all. An earlier, proof-of-concept implementation (in Common Lisp) has been used to extract many press texts for the National Corpus of Polish.
I’ve given up on the symbol-capturing approach to wizards I’ve presented in my previous posts. Inspired by the DOM tree in Web apps, with a bag of elements with identifiers, I now have a central bag of Swing widgets (implemented as an atom) identified by keywords. This bag contains tidbits of the mutable state of Sunflower. This means that I can write callback functions like this:
#(with-components [strings-model selected-dir]
(.removeAllElements strings-model)
(let [p (-> selected-dir htmls first parse)]
(add-component :parsed p)
(doseq [x (strings p)]
(.addElement strings-model x))))
Name and conquer: having parts of state explicitly named mean that I can reliably access them from just about anywhere. This reduces confusion and allows for less tangled, more self-contained and understandable code.
Mon, 05 Apr 2010
A case for symbol capture
Clojure by default protects macro authors from incidentally capturing a local symbol. Stuart Halloway describes this in more detail, explaining why this is a Good Thing. However, sometimes this kind of symbol capture is called for. I’ve encountered one such case today while hacking a Swing application.
As I develop the app, I find new ways to express Swing concepts and
interact with Swing objects in a more Clojuresque way, so a library of
GUI macros and functions gets written. One of them is a wizard
macro for easy creation of installer-like wizards, where there is a
sequence of screens that can be navigated with Back and Next
buttons at the bottom of the window.
The API (certainly not finished yet) currently looks like this:
(wizard & components)
where each Swing component corresponding to one wizard screen can be
augmented by a supplementary map, which can contain, inter alia, a
function to execute upon showing the screen in question.
Now, I want those functions to be able to access the Back and Next
buttons in case they want to disable or enable them at need. I thus
want the API user to be able to use two symbols, back-button and
next-button, in the macro body, and have them bound to the
corresponding buttons.
It is crucial that these bindings be lexical and not dynamic. If they were dynamic, they would be only effective during the definition of the wizard, but not when my closures are invoked later on. Thus, my implementation looks like this:
(defmacro wizard [& panels]
`(let [~'back-button (button "< Back")
~'next-button (button "Next >")]
(do-wizard ~'back-button ~'next-button ~(vec panels))))
where do-wizard is a private function implementing the actual wizard
creation, and the ~'foo syntax forces symbol capture.
By the way, if all goes well, this blog post should be the first one syndicated to Planet Clojure. Hello, Planet Clojure readers!
Sun, 04 Apr 2010
Hiking in the Apennines
I’ve recently done a week-long hike in the Umbria-Marche region of the Italian Apennines (the vicinity of Monte Catria, near Cantiano, to be more precise), and here are some tips I’d like to share.
- The Umbria-Marche Apennine doesn’t seem to be frequented by a lot of tourists, especially in mid-March. The information offices, although helpful, are often closed (this is not only the case with the mountain region: contrary to information available on the Web, the tourist information at Forli airport was closed on Sunday morning), and most of the Italians we’ve met didn’t speak English.
- The tourist trails in the region are not well marked. Direction marks are nowhere to be found, nor are the signs visible on junctions. We had to ask the locals when leaving Cantiano for Monte Tenetra (and ended up on M. Alto instead anyway).
- There are a lot of rifugi (mountain huts), but most of them are closed at this time of year. We passed by six or seven, out of which only one was available for sleep: Rifugio Fonte del Faggio (depicted), merely a small bothy with one worm-eaten bunk bed. Another one, Cupa delle Cotaline, with restaurant facilities and situated by a station of a local skilift, opened in the morning, but was closed for the night.

Rifugio Fonte del Faggio