musings of a Lispnik

Daniel Janus's blog

Hiking in the Apennines

| Comments

I’ve recently done a week-long hike in the Umbria-Marche region of the Italian Apennines (the vicinity of Monte Catria, near Cantiano, to be more precise), and here are some tips I’d like to share.

  • The Umbria-Marche Apennine doesn’t seem to be frequented by a lot of tourists, especially in mid-March. The information offices, although helpful, are often closed (this is not only the case with the mountain region: contrary to information available on the Web, the tourist information at Forli airport was closed on Sunday morning), and most of the Italians we’ve met didn’t speak English.
  • The tourist trails in the region are not well marked. Direction marks are nowhere to be found, nor are the signs visible on junctions. We had to ask the locals when leaving Cantiano for Monte Tenetra (and ended up on M. Alto instead anyway).
  • There are a lot of rifugi (mountain huts), but most of them are closed at this time of year. We passed by six or seven, out of which only one was available for sleep: Rifugio Fonte del Faggio (depicted), merely a small bothy with one worm-eaten bunk bed. Another one, Cupa delle Cotaline, with restaurant facilities and situated by a station of a local skilift, opened in the morning, but was closed for the night.

The pitfalls of `lein swank`

| Comments

A couple of weeks ago I finally got around to acquainting myself with Leiningen, one of the most popular build tools for Clojure. The thing that stopped me the most was that Leiningen uses Maven under the hood, which seemed a scary beast at first sight — but once I’ve overcome the initial fear, it turned out to be a quite simple and useful tool.

One feature in particular is very useful for Emacs users like me: lein swank. You define all dependencies in project.clj as usual, add a magical line to :dev-dependencies, then say

$ lein swank

and lo and behold, you can M-x slime-connect from your Emacs and have all the code at your disposal.

There is, however, an issue that you must be aware of when using lein swank: Leiningen uses a custom class loader — AntClassLoader to be more precise — to load the Java classes referenced by the code. Despite being a seemingly irrelevant thing — an implementation detail — this can bite you in a number of most surprising and obscure ways. Try evaluating the following code in a Leiningen REPL:

1
2
3
4
5
(str (.decode
       (java.nio.charset.Charset/forName "ISO-8859-2")
       (java.nio.ByteBuffer/wrap
         (into-array Byte/TYPE (map byte [-79 -26 -22])))))
==> "???"

The same code evaluated in a plain Clojure REPL will give you "ąćę", which is a string represented in ISO-8859-2 by the three bytes from the above snippet.

Whence the difference? Internally, each charset is represented as a unique instance of its specific class. These are loaded lazily as needed by the Charset/forName method. Presumably, the system class loader is used for that, and somewhere along the way a SecurityException gets thrown and caught.

Note also that there are parts of Java API which use the charset lookup under the hood and are thus vulnerable to the same problem, for example Reader constructors taking charset names. If you use clojure.contrib.duck-streams, then rebinding *default-encoding* will not work from a Leiningen REPL. Jars and überjars produced by Leiningen should be fine, though.

Downcasing strings

| Comments

I just needed to convert a big (around 200 MB) text file, encoded in UTF-8 and containing Polish characters, all into lowercase. tr to the rescue, right? Well, not quite.

$ echo ŻŹŚÓŃŁĘĆĄ | tr A-ZĄĆĘŁŃÓŚŹŻ a-ząćęłńóśźż
żźśóńłęćą

Looks reasonable (apart from the fact that I need to specify an explicit character mapping —– it would be handy to just have a lcase utility or suchlike); but here’s what happens on another random string:

$ echo abisyński | tr A-ZĄĆĘŁŃÓŚŹŻ a-ząćęłńóśźż
abisyŅski

I was just about to report this as a bug, when I spotted the following in the manual:

Currently tr fully supports only single-byte characters. Eventually it will support multibyte characters; when it does, the -C option will cause it to complement the set of characters, whereas -c will cause it to complement the set of values.

Turns out some of the basic tools don’t support multibyte encodings. dd conv=lcase, for instance, doesn’t even pretend to touch non-ASCII letters, and perl’s tr operator likewise fails miserably even when one specifies use utf8.

This is a sad, sad state of affairs. It’s 2010, UTF-8 has been around for seventeen years, and it’s still not supported by one of the core operating system components as other encodings are becoming more and more obsolete. I’m dreaming of the day my system uses it internally for everything.

Fortunately, not everything is broken. Gawk, for example, works:

$ echo koŃ i żÓłw | gawk '{ print tolower($0); }'
koń i żółw

and so does sed.

Update 2010-04-04: I should have been more specific. The above rant applies to the GNU tools (tr and dd) as found in most Linux distributions; other versions can be more featureful. As Alex Ott points out in an email comment, tr on OS X works as expected for characters outside of ASCII, and also supports character classes as in tr '[:upper:]' '[:lower:]'. This is yet another testimony to general high quality of Apple software; in this particular case, though, it may well be a direct effect of OS X’s BSD heritage. Does it work on *BSD?

Clojure SET

| Comments

I’ve just taken a short breath off work to put some code on GitHub that I had written over one night some two months ago. It is an implementation of the Set game in Clojure, using Swing for GUI.

I do not have time to clean up or comment the code, so I’m leaving it as is for now; however, I hope that even in its current state it can be of interest, especially for Clojure learners.

Some random notes on the code:

  • Clojure is concise! The whole thing is just under 250 lines of code, complete with game logic and the GUI. Of these, the logic is about 50 LOC. Despite this it reads clearly and has been a pleasure to write, thanks to Clojure’s supports for sets as a data structure (in vein of the game’s title and theme).
  • There are no graphics included. All the drawing is done in the GUI part of code (I’ve replaced the canonical squiggle shape by a triangle and stripes by gradients, for the sake of easier drawing).
  • I’ve toyed around with different Swing layout managers for this game. Back in the days when I wrote in plain Java, I used to use TableLayout, but it has a non-free license; JGoodies Forms is also nice, but has a slightly more complicated API (and it’s an additional dependency, after all). In the end I’ve settled with the standard GridBagLayout, which is similar in spirit to those two, but requires more boilerplate to set up. As it turned out, simple macrology makes it quite pleasurable to use; see add-gridbag in the code for details.
  • Other things of interest might be my function to randomly shuffle seqs, which strikes a nice balance between simplicity/conciseness of implementation and randomness; and a useful debugging macro.

Comments?

Reactivation (and some ramblings on my blogging infrastructure)

| Comments

This blog has not seen content updates in more than a year. Plenty of things can happen in such a long period, and in fact many aspect of my life have seen major changes over this time. I’m not, however, going to write a lengthy post about all that right now. Instead, I just would like to announce the reactivation of the blog.

You might have noticed that many things have changed. First, the blog has a new address: http://blog.danieljanus.pl; the address of the RSS feed has also changed and is now http://blog.danieljanus.pl/index.rss —– please update your readers!

Probably the most important change is that you now may post comments under the entries, even though this blog continues to be just a bunch of static HTML pages. This is possible thanks to the Disqus service. I wonder whether it will encourage people to give feedback: I have received very few email comments since I started blogging. Also, the static calendar at the top of each page is gone, replaced by a bunch of links to archive posts.

I have long been considering changing Blosxom to something else. The main reason for such a step is that it’s written in Perl, which makes it particularly hard to debug upon encountering an unexpected behaviour. The single most irritating thing was that Blosxom would unexpectedly change the date of a post that was edited (which did not let me fix typos and other glitches); I found a patch for this somewhere, but lost it.

On the other hand, I really liked —– and still like —– Blosxom’s minimalistic approach and the ease of adding posts. (The very idea of installing a monstrosity such as Wordpress, with its gazillion of features I don’t need, posts kept in a database and what not, makes me feel dizzy.) I fiddled for a while with the thought of reimplementing Blosxom in Common Lisp, but that turned out to be a more time-consuming project than it initially seemed. So when I found The Unofficial Blosxom User Group and learned that, contrary to my belief, Blosxom is still actively maintained and has a thriving community, I ended up staying with the original Perl version, refining my installation so that it no longer gets in the way (this FAQ entry did the trick). I also rewrote all my source text files to Markdown, which made them vastly more readable and easy to edit, updating links and adding short followup notes where appropriate, but otherwise leaving old entries as they were.

I’d like to thank Maciek Pasternacki for inspiring me to finally get around to this. While my plans are not as ambitious as his —– I am not courageous enough to publicly prove my perseverance, so my blogging will likely continue to be irregular —– I plan to write more (having accumulated many ideas for blog posts) and I hope the periods of silence will be much shorter than hitherto.

I would like to take this opportunity to wish my readers all the best in the New Year!

Google Books

| Comments

Yesterday, upon a midnight dreary, while I pondered, weak and weary, over a renowned volume of the olden lore (and specifically, upon one of the problems contained in the Polish translation of the first edition), I suddenly felt a need to consult the original version, to check whether there are no mistranslations or unincluded corrections for my copy. So I headed for Google Book Search, and apart from finding what I needed, I followed a link that sounded interesting. Quoth the link, “Groundbreaking Agreement”.

Basically, what it all boils to is two pieces of news —– you guessed it, a good one and a bad one. The good news is that Google have come to agreement with several major U.S. publishers that will allow them to provide online access to digitized copies of out-of-print but still copyrighted books. Lots of books, and even though the service is not going to be free, that means all this richness will be at the fingertips — no more need to travel half the world to the Library of Congress to get one of the rare copies we’re after. Sounds cool, huh? Well, here comes the bad news: it will only be available to U.S. citizens.

Or will it?

I wonder how are they going to check for this precondition. IP-based geolocalization springs to mind. And unless they blacklist some IPs or restrict the credit cards used for payment, all I will need is some proxy on some server physically in the U.S. Say, a shell account on someone’s Linux box. I remember reading a Polish blog post about gaining access to American-exclusive content of some website (last.fm I believe it was) in a similar way. Hmm, hmm. We will see.

So, anyone got a shell account to spare?

anti-procrastination.el

| Comments

Fighting procrastination has been my major concern these days. I’ve devised a number of experimental tools to help me with that. One of them is called snafu and can generate reports of your activity throughout the whole day of work. It’s in a preliminary state, but works (at least since I’ve found and fixed a long-standing bug in it which would cause it to barf every now and then), and I already have a number of ideas for its further expansion.

Reports alone, however, do not quite muster enough motivation for work. I’m doing most of my editing/programming work in Emacs, so yesterday I grabbed the Emacs Lisp manual and came up with a couple of extra lines at the end of my .emacs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
;;; Written by Daniel Janus, 2008/12/18.
;;; This snippet is placed into the public domain.  Feel free
;;; to use it in any way you wish.  I am not responsible for
;;; any damage resulting from its usage.

(defvar store-last-modification-time t)
(defvar last-modification-time nil)
(defun mark-last-modification-time (beg end len)
  (let ((b1 (substring (buffer-name (current-buffer)) 0 1)))
    (when (and store-last-modification-time
               (not (string= b1 " "))
               (not (string= b1 "*")))
      (setq last-modification-time (current-time)))))
(add-hook 'after-change-functions 'mark-last-modification-time)
(defun write-lmt ()
  (setq store-last-modification-time nil)
  (when last-modification-time
    (with-temp-file "/tmp/emacs-lmt"
      (multiple-value-bind (a b c) last-modification-time
        (princ a (current-buffer))
        (terpri (current-buffer))
        (princ b (current-buffer)))))
  (setq store-last-modification-time t))
(run-at-time nil 1 'write-lmt)

Every second (to change that to every 10 seconds, change the 1 to 10 in the last line) it creates a file named /tmp/emacs-lmt which contains the time of last modification of any non-system buffer.

That’s all there is to it, at least on the Emacs side. The other part is a simple shell script, which uses MPlayer to display a nag-screen for five seconds, and then give me some time to start doing anything useful before nagging me again:

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/bin/bash
TIMEOUT=300
while true; do
   cat /tmp/emacs-lmt | (
      read a; read b;
      c="`date +%s`";
      let x=c-65536*a-b;
      if test $x -gt $TIMEOUT;
          then mplayer -fs $HOME/p.avi;
               sleep 15;
      fi)
   sleep 1
done

The nag-screen in my case is an animation which I’ve created using MEncoder from a single frame which looks like this. Beware the expletives! (This is one of the few cases I find their usage justified, as the strong message bites the conscience more strongly.)

I’ve only been testing this setup for one day, but so far it’s working flawlessly: I got more done yesterday than for the two previous days combined, and that’s excluding the hour or so that took me to write these snippets.

If anyone else happens to give it a try, I’d love to hear any comments.

The immensely powerful tool

| Comments

A pen and a sheet of paper are simple utilities; but there lies vast and sheer power in them that I was not aware of. Up until now. So what can they be used for that one might possibly not realize?

Short answer: serializing the stream of consciousness.

Yes, it’s simple, and you may laugh at me now. I myself am a little amazed why I haven’t noticed this before. But this answer lends itself to another question: what good is this serialization, and what exactly do I mean by it, anyway? And the answer to that is a little longer. So here goes.

I’m one of the people who tend to have problems with concentrating when thinking, especially when thinking hard. This is not to say that I am not capable of thinking hard: I am, but doing so requires a level of concentration that is tricky for me to exert for a prolonged period. (Unless, of course, I am in the state of absolute fascination, where this is taken care of subconsciously. But that’s another story.) More often than not, a tough problem requiring a significant amount of work just has to be dealt with. And then things start to distract attention. There is an itch to scratch, thoughts are shreds, each one pertaining to a tiny bit of the problem, but intertwined with hundreds of other bits of other problems, forming a dense, tangled web, hard to navigate over, and jumping fast from one to another, it becomes more and more unclear what’s next.

So what can one do? One way is to grab a writing device and just start writing. Running text is linear in nature, so you end up traversing the thought graph depth-first and writing down each thought as you traverse its node. And what’s more, translating ideas to written language slows you down, which is a Good Thing because it makes you see your way through the graph more consciously. It might take you longer to walk from point A to point B than to drive there by car, but definitely you will see more of the landscape as you go. Arriving at the final destination, or simply putting down the pen because enough thoughts have been collected and serialized (there’s never really any end of the stream), makes you end up with a half-product: an unsmithed lump of ore out of which you can forge ingots.

But why a pen and paper, as opposed to, say, a text editor? I think any writing utensil would work to some extent, but for me this seems to be the best option, for several reasons. First of all, I can type on the keyboard much faster than I can write legibly by hand, so this further slows down the pace (which is a Good Thing as we have observed already).

Second, there is something magical in handwriting which a text editor will never be able to achieve: it’s hard to describe. But the net effect is a very evident focus on Here and Now, the pen moving across the paper, the sheet filling up with more and more lines of script. This environment is naturally single-tasked: no Alt-Tab to press to switch to another terminal, no blinking icon of an instant-messaging program (unless a phone happens to ring). This causes synergy with the concentration caused by serializing thoughts.

If you have never tried this approach, feel free to do so. Although I cannot guarantee it will work for you, it certainly does work for me.

Who said Common Lisp programs cannot be small?

| Comments

So, how much disk space does your average CL image eat up? A hundred megs? Fifty? Twenty? Five, perhaps, if you’re using LispWorks with a tree-shaker? Well then, how about this?

[nathell@chamsin salza2-2.0.4]$ ./cl-gzip closures.lisp test.gz
[nathell@chamsin salza2-2.0.4]$ gunzip test
[nathell@chamsin salza2-2.0.4]$ diff closures.lisp test 
[nathell@chamsin salza2-2.0.4]$ ls -l cl-gzip 
-rwxr-xr-x 1 nathell nathell 386356 2008-08-09 11:08 cl-gzip

That’s right. A standalone executable of a mini-gzip, written in Common Lisp, taking up under 400K! And it only depends on glibc and GMP, which are available by default on pretty much every Linux installation. (This is on a 32-bit x86 machine, by the way).

I used the most recent version of ECL for compiling this tiny example. The key to the size was configuring ECL with --disable-shared --enable-static CFLAGS="-Os -ffunction-sections -fdata-sections" LDFLAGS="-Wl,-gc-sections". This essentially gives you a poor man’s tree shaker for free at a linker level. And ECL in itself produces comparatively tiny code.

I build this example from Salza2’s source by loading the following code snippet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(defvar salza
  '("package" "reset" "specials"
    "types" "checksum" "adler32" "crc32" "chains"
    "bitstream" "matches" "compress" "huffman"
    "closures" "compressor" "utilities" "zlib"
    "gzip" "user"))

(defvar salza2
  (mapcar (lambda (x) (format nil "~A.lisp" x))
          salza))

(defvar salza3
  (mapcar (lambda (x) (format nil "~A.o" x))
          salza))

(defun build-cl-gzip ()
  (dolist (x salza2)
          (load x)
          (compile-file x :system-p t))
  (c:build-program
   "cl-gzip"
   :lisp-files salza3
   :epilogue-code
     '(progn
       (in-package :salza2)
       (gzip-file (second (si::command-args))
                  (third (si::command-args))))))

(build-cl-gzip)

(Sadly enough, there’s no ASDF in here. I have yet to figure out how to leverage ASDF to build small binaries in this constrained environment.)

This gave me a standalone executable 1.2 meg in size. I then proceeded to compress it with UPX (with arguments --best --crp-ms=999999) and got the final result. How cool is that?

I am actively looking for a new job. If you happen to like my writings and think I might be just the right man for the team you’re building up, please feel free to consult my résumé or pass it on.

Update 2010-Jan-17: the above paragraph is no longer valid.

cl-morfeusz: A ninety minutes’ hack

| Comments

Here’s what I came up with today, after no more than 90 minutes of coding (complete with comments and all):

MORFEUSZ> (morfeusz-analyse "zażółć gęślą jaźń")
((0 1 "zażółć" "zażółcić" "impt:sg:sec:perf")
 (1 2 "gęślą" "gęśl" "subst:sg:inst:f")
 (2 3 "jaźń" "jaźń" "subst:sg:nom.acc:f"))

This is cl-morfeusz in action, a Common Lisp interface to Morfeusz, the morphological analyser for Polish.

It’s a single Lisp file, so there’s no ASDF system definition or asdf-installability for now. I’m not putting it under version control, either. Or, should I say, not yet. When I get around to it, I plan to write a simple parser and write a Polish-language version of the text adventure that started it all.

Meanwhile, you may use cl-morfeusz for anything you wish (of course, as long as you comply with Morfeusz’s license). Have fun!

Update 2010-Jan-17: With the advent of UTF-8 support in CFFI, the ugly workarounds in the code are probably no longer necessary; I don’t have time to check it right now, though.