The Dijkstran wheel of fortune: SPSS, Excel, VBA

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.

       — Edsger W. Dijkstra, EWD 498

I like to think of myself somewhat egotistically as a counterexample to the above Dijkstra’s statement. Granted, some of my code is definitely of poor quality, and I dare not call myself a good programmer. But, having started with BASIC on a Commodore 64, then proceeding to learn Pascal (of the Turbo/Borland flavour), then C, x86 assembly, OCaml, Smalltalk, Java, C++, Haskell, Common Lisp, Clojure, and a couple of other languages, with a few enlightenments achieved along the way, I do think I managed to regenerate from the mental wounds that BASIC had inflicted upon me. And now I feel a strange sensation, now that the Dijkstran wheel of fortune has made a full spin: I’ve spend the last few days writing BASIC code. I’ve written several Excel macros in Visual Basic for Applications.

Why the strange selection of a language? Well, this was simply the best tool for the job. What I needed to do was postprocess the output of some statistical analyses performed in SPSS running under Windows, altering the way the results were presented. SPSS can export data to HTML, Word, and Excel; of these three, the latter is most convenient, because it preserves the structure of the output tables most thoroughly. (In principle, HTML does too, and in fact my first stab was with Clojure, but I stopped after realizing just how much ad-hoc, throwaway code that parses the SPSS-generated HTML, munges it several times to and fro, and then outputs back HTML I’d have to write). So I went the Excel way, and in this post I’d like to share my mixed feelings from that encounter.

Visual Basic the language is icky. It is certainly a step forward from the BASIC I remember from decades ago, in that I didn’t have to number my lines, and it is possible to structure the code nicely so that it doesn’t contain any GOTOs, GOSUBs or RETURNs. And it has this object-oriented feel to it. But compared to modern languages, programming in it resembles voluntarily putting on handcuffs, and then jumping around to avoid stumbling over the logs it throws under your legs. Not quite so big and scary logs as C++ does, but still. I mean, why on earth does VB have to distinguish between expressions and statements? Many languages do, but in most of them an expression is at least a valid statement. Not so in VB. Also, VB is still line-oriented: whether or not you require an End If in the conditional statement depends on whether it fits in one line or not. But my biggest pain was with the assignments. VB makes a distinction between reference assignments and other assignments, requiring a Set statement in the first case, and disallowing it in the second. So, Set myCell = thatOtherCell but foo = 42. Worse, forgetting the Set in the first case does not result in an error, which makes such bugs very hard to debug. Yurgh.

Also, the IDE built into Excel for developing VB macros is mediocre. There is an editor, which highlights the syntax and automatically reformats the code, inserting spaces as appropriate, which is nice. It slaps me in face with a modal dialog whenever I make a syntax error and move off the line, which is not so nice. There is a REPL of sorts, taking the form of an “Immediate” window, into which you can type statements (not expressions, remember?) and tap Enter to execute them. You can also Debug.Print to them, like to a JavaScript console. It is not reachable by Ctrl-Tab from the editor, so I ended up using mouse much more often than normally. I want my Emacs back!

On the other hand, I find the object-oriented API for actually accessing the spreadsheets quite well-designed and pleasant to use. You just grab the object representing your worksheets from the global Worksheets object (indexable by number or by name), and from there you access your cells. The basic object you work with is the Range object, representing either a single cell or a bunch of them; you can get or set cell values, change the formatting, call Offset to navigate around as if with cursor keys. You also can search for specific content in the sheet. Simple enough, easy to use and pick up; and above all, allows to get the job done without getting in the way much.

As for SPSS itself: it sucks. In fact, it sucks so great and in so many different ways that it merits its own blog entry (which will follow someday). For now, I’ll only note down the things pertaining to Excel interop; hopefully it will save somebody’s time.

Problem is, SPSS 19’s Excel export is buggy. In fact, it’s so unreliable that I’ve wasted more hours struggling with it than actually writing my macros. (We’re talking SPSS 19 here; I’ve also tried version 17, with the same results.) It exports small data chunks fine, but the larger your output, the more likely it is that Excel alerts about unreadable content in your file. Excel then offers to repair the data, which mostly succeeds, but inevitably loses the formatting — which for me was a no-no.

So, after long hours of experimentation and attempting different workarounds, I found that it is much, much more reliable to just copy your data and paste it into Excel directly, without exporting to a temporary file. Just do Edit -> Copy special and select Excel BIFF format, to make sure you’re copying the right data. If Excel complains about not being able to understand the copied content (turn on the Clipboard preview to find out), save your output to .spv, restart SPSS, re-run your syntax and try again. With luck, it will eventually work. At least for me it did.

Hello world, again

I’ve been quiet on the front of blogging in English recently. But that doesn’t mean I’ve given up.

After more than a year, I had become tired of maintaining a Blosxom installation. I greatly admire Blosxom, its minimalism and extensibility, but the default installation is just too minimal for my needs. And the plugins tend to have rough edges. Like the Disqus comments that I’ve enabled at one time on the otherwise static blog pages: the correct number of comments appears in some places but not all; besides, they just don’t feel right.

So I’ve embarked on an experiment with a blogging platform, namely Posterous. I’ve started a blog in Polish there to comment on local affairs in my mother tongue and to popularize Clojure among Polish programmers. And after a few months, I consider this experiment successful. Posterous supports Markdown, which I grew accustomed to while using Blosxom. It automatically syntax-highlights snippets of Clojure code that I post, which is a big win. It is highly customizable, easy to use (blogging via email FTW!), and lets me control my data. It does have its deficiencies, but on the whole it gets in the way less. So I’m switching to Posterous for “Musings of a Lispnik” too.

It is unclear for me how to migrate the old content to new platform, so for now I’ll leave it as is under a temporary address, while posting new things exclusively here.

In the near future, I plan to translate a few articles about Clojure I’d written in Polish and post their English versions here. Stay tuned!

1 of 1
Posterous theme by Cory Watilo