code • words • emotions

Daniel Janus’s blog

The Dijkstran wheel of fortune: SPSS, Excel, VBA

28 March 2011

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.

— Edsger W. Dijkstra, EWD 498

I like to think of myself somewhat egotistically as a counterexample to the above Dijkstra’s statement. Granted, some of my code is definitely of poor quality, and I dare not call myself a good programmer. But, having started with BASIC on a Commodore 64, then proceeding to learn Pascal (of the Turbo/Borland flavour), then C, x86 assembly, OCaml, Smalltalk, Java, C++, Haskell, Common Lisp, Clojure, and a couple of other languages, with a few enlightenments achieved along the way, I do think I managed to regenerate from the mental wounds that BASIC had inflicted upon me. And now I feel a strange sensation, now that the Dijkstran wheel of fortune has made a full spin: I’ve spend the last few days writing BASIC code. I’ve written several Excel macros in Visual Basic for Applications.

Why the strange selection of a language? Well, this was simply the best tool for the job. What I needed to do was postprocess the output of some statistical analyses performed in [SPSS][1] running under Windows, altering the way the results were presented. SPSS can export data to HTML, Word, and Excel; of these three, the latter is most convenient, because it preserves the structure of the output tables most thoroughly. (In principle, HTML does too, and in fact my first stab was with Clojure, but I stopped after realizing just how much ad-hoc, throwaway code that parses the SPSS-generated HTML, munges it several times to and fro, and then outputs back HTML I’d have to write). So I went the Excel way, and in this post I’d like to share my mixed feelings from that encounter.

Visual Basic the language is icky. It is certainly a step forward from the BASIC I remember from decades ago, in that I didn’t have to number my lines, and it is possible to structure the code nicely so that it doesn’t contain any GOTOs, GOSUBs or RETURNs. And it has this object-oriented feel to it. But compared to modern languages, programming in it resembles voluntarily putting on handcuffs, and then jumping around to avoid stumbling over the logs it throws under your legs. Not quite so big and scary logs as C++ does, but still. I mean, why on earth does VB have to distinguish between expressions and statements? Many languages do, but in most of them an expression is at least a valid statement. Not so in VB. Also, VB is still line-oriented: whether or not you require an End If in the conditional statement depends on whether it fits in one line or not. But my biggest pain was with the assignments. VB makes a distinction between reference assignments and other assignments, requiring a Set statement in the first case, and disallowing it in the second. So, Set myCell = thatOtherCell but foo = 42. Worse, forgetting the Set in the first case does not result in an error, which makes such bugs very hard to debug. Yurgh.

Also, the IDE built into Excel for developing VB macros is mediocre. There is an editor, which highlights the syntax and automatically reformats the code, inserting spaces as appropriate, which is nice. It slaps me in face with a modal dialog whenever I make a syntax error and move off the line, which is not so nice. There is a REPL of sorts, taking the form of an “Immediate” window, into which you can type statements (not expressions, remember?) and tap Enter to execute them. You can also Debug.Print to them, like to a JavaScript console. It is not reachable by Ctrl-Tab from the editor, so I ended up using mouse much more often than normally. I want my Emacs back!

On the other hand, I find the object-oriented API for actually accessing the spreadsheets quite well-designed and pleasant to use. You just grab the object representing your worksheets from the global Worksheets object (indexable by number or by name), and from there you access your cells. The basic object you work with is the Range object, representing either a single cell or a bunch of them; you can get or set cell values, change the formatting, call Offset to navigate around as if with cursor keys. You also can search for specific content in the sheet. Simple enough, easy to use and pick up; and above all, allows to get the job done without getting in the way much.

As for SPSS itself: it sucks. In fact, it sucks so great and in so many different ways that it merits its own blog entry (which will follow someday). For now, I’ll only note down the things pertaining to Excel interop; hopefully it will save somebody’s time.

Problem is, SPSS 19’s Excel export is buggy. In fact, it’s so unreliable that I’ve wasted more hours struggling with it than actually writing my macros. (We’re talking SPSS 19 here; I’ve also tried version 17, with the same results.) It exports small data chunks fine, but the larger your output, the more likely it is that Excel alerts about unreadable content in your file. Excel then offers to repair the data, which mostly succeeds, but inevitably loses the formatting — which for me was a no-no.

So, after long hours of experimentation and attempting different workarounds, I found that it is much, much more reliable to just copy your data and paste it into Excel directly, without exporting to a temporary file. Just do Edit → Copy special and select Excel BIFF format, to make sure you’re copying the right data. If Excel complains about not being able to understand the copied content (turn on the Clipboard preview to find out), save your output to .spv, restart SPSS, re-run your syntax and try again. With luck, it will eventually work. At least for me it did.