Daniel Janus – programming

On LLMs in programming

2025-12-27T00:00:00Z

’Tis the year’s midnight, and it is the day’s.

Well, it’s two weeks past Lucy’s as I write this, but it is still that time of the year: the time of slowing down, taking a step back from the hustle and bustle of everyday, reflecting on what has happened and what is yet to come. Perhaps no better time to put together my thoughts on LLMs, and more specifically on their use in programming.

This post is not meant to take a stance in the ongoing discussion about the merits and risks of using LLMs. A lot has been said already about things like their moral and legal aspects, their energy footprint, how they impact society at large — I will skip all of this stuff, as I don’t have anything of value to add here. I do have opinions, but I don’t want to convince anyone. Rather, I hope to find some words to express and let out the anxiety that has been brewing in me for the last couple of months.

Where I stand

I might call myself a “conscious LLM-skeptic”. That is, while my attitude towards LLMs is far from enthusiastic, I do use them and have found them genuinely useful on multiple occasions in day-to-day programming. I used to have a Claude Pro subscription (I’ve cancelled it for the moment but it’s not unlikely I’ll renew at some point). I have no doubt that they’re here to stay, they’ve significantly changed the landscape of software development and will continue to do so.

So why the anxiety, instead of, say, enthusiasm?

I guess a large part of it is due to the change itself. Especially given the pace of it. A lot of things that I have grown accustomed to are now different than what they used to be; things that I had taken for granted no longer are. Change naturally breeds anxiety.

But I think there’s more to it.

Conscious excitement

Let me quote first a passage from “A Study in Scarlet”, where Dr. Watson discovers the extent of Sherlock Holmes’ knowledge of the world:

His ignorance was as remarkable as his knowledge. Of contemporary literature, philosophy and politics he appeared to know next to nothing. Upon my quoting Thomas Carlyle, he inquired in the naivest way who he might be and what he had done. My surprise reached a climax, however, when I found incidentally that he was ignorant of the Copernican Theory and of the composition of the Solar System. That any civilized human being in this nineteenth century should not be aware that the earth travelled round the sun appeared to be to me such an extraordinary fact that I could hardly realize it.

“You appear to be astonished,” he said, smiling at my expression of surprise. “Now that I do know it I shall do my best to forget it.”

“To forget it!”

“You see,” he explained, “I consider that a man’s brain originally is like a little empty attic, and you have to stock it with such furniture as you choose. A fool takes in all the lumber of every sort that he comes across, so that the knowledge which might be useful to him gets crowded out, or at best is jumbled up with a lot of other things so that he has a difficulty in laying his hands upon it. Now the skilful workman is very careful indeed as to what he takes into his brain-attic. He will have nothing but the tools which may help him in doing his work, but of these he has a large assortment, and all in the most perfect order. It is a mistake to think that that little room has elastic walls and can distend to any extent. Depend upon it there comes a time when for every addition of knowledge you forget something that you knew before. It is of the highest importance, therefore, not to have useless facts elbowing out the useful ones.”

I remember my astonishment at this thought process when I first read this. I might have been 11 or so at the time. Surely, I thought, the mind isn’t quite as rigid as Holmes paints it here? Surely you can keep throwing new information at it – the more, the better – and it will adapt, processing it, improving the mental model and discarding unuseful things?

But the older I get, the more I keep coming back to this attic analogy. Especially since I’ve learned about my ADHD a few years ago. I now believe that Holmes has a point here: that one needs to be careful and conscious when deciding what to furnish their brain with. Not necessarily with respect to facts, as per Holmes, but certainly with attention.

Because it’s attention that’s the scarcest, most precious resource that my brain has to offer. Even more precious than time (which, too, is scarce). Corollary: I need to choose very consciously what I devote my attention to.

Another thing I discovered is that attention and excitement are very much intertwined. Excitement arises when thinking about something deeply, and it is possible, to a certain extent, to induce excitement. It’s conscious, too. Corollary #2: I need to choose very consciously what to be excited about. It’s just basic fuckonomics.

And I’ve made a choice for those areas not to include LLMs – lest they divert my attention from things I care about.

I care about the fundamentals of my craft. I care about programming languages and their theory. I care about the trade-offs of static vs dynamic typing; of manual vs automatic memory management. I care about algorithms and data structures. I care about clarity when expressing ideas with code. I care about abstractions.

Most importantly, I care about the fun I’ve had from learning, exploring, and applying all these things. I choose to continue to derive the fun from them.

(Incidentally, some of the most productive uses of LLMs I’ve had were when I used them as a glorified rubber duck – having casual conversations that sought to increase my understanding of the problem at hand – rather than asking them to write code.)

Are we all 10x programmers now?

In the paper “A Century of Work and Leisure”, Valerie A. Ramey and Neville Francis look at how people have been using their time throughout the 20th century – in particular, how much time they’ve been spending on domestic chores (or activities that they dub “home production”: planning, buying things, cleaning, housecare, preparing food, laundries, etc). They find that per-household amount of time spent on this has largely remained the same over time: women now spend significantly less time than they used to, but this is largely offset by the increase of time invested into home chores by men.

Why, given the revolutionary advances in technology over that time and the proliferation of washing machines, microwave ovens and other appliances?

Ulrich Schnabel in “Leisure: The Happiness of Doing Nothing” (a great book that, sadly, has not been translated to English as far as I know) cites Ramey and Francis’ work and argues the answer to be Parkinson's law:

Parkinson’s law provides the first answer: “A job expands to exactly the extent that time is available for its completion—regardless of the actual amount of work.” […]
Put more kindly, Parkinson’s law can also be interpreted as follows: As technology saves us time, our expectations and demands increase. A hundred years ago, well-groomed clothes, a clean house, and a multi-course meal were still considered luxuries; today, these things have largely become the norm.

Now, with the advent of LLMs, I can’t help feeling that we’re going through this all over again. New technology appears that purports to save us all precious time; and then, some time later, we discover that we have just as little free time as we used to. Parkinson’s law at work again. But unlike with home management, in this case it’s not due to our increased expectations about what is or is not luxurious – it’s that expectations towards us have increased. Deliver more features! Write more code! Review more PRs! More, more, more!

We may have all become 10x programmers, but the reference point has shifted, too, so the 10x doesn’t apply anymore: the factors have to be recomputed. Where is my peace of mind?

(Yes, this is tongue-in-cheek and I can’t quantify it in any way. Remember, this is ventilation, not a meritorical argument.)

Adapt or perish

I keep having these doubts and fears. Will I still have a job in a few years? Is my ability to think deeply about problems still a valuable asset? Will I be forced to use LLMs if I want to continue working as a software engineer?

At the end of the day, I am a human. I know what it’s like to experience beauty – through my senses, in my mind, with all my flesh – and I know that code can be beautiful. At least nothing will strip me of that.

Final words

I don’t have a good conclusion. So, instead, I’ll make a statement: this blog (by which I mean the whole site, not just this article) is mine, a human’s, and does not nor will it ever contain any LLM-written content (except for purposes of commentary when clearly marked as such). I’m not saying that I won’t write here about LLMs ever again, but I don’t think it’s likely to happen anytime soon: I prefer to write about things that I’m excited about.

Here are some links to texts by other people that resonate with me. That’s not to say that I necessarily agree with every single word they say; I just found myself nodding along as I read them.

No, really, you can’t branch Datomic from the past (and what you can do instead)

2025-04-22T00:00:00Z

I have a love-hate relationship with Datomic. Datomic is a Clojure-based database based on a record of immutable facts; this post assumes a passing familiarity with it – if you haven’t yet, I highly recommend checking it out, it’s enlightening even if you end up not using it.

I’ll leave ranting on the “hate” part for some other time; here, I’d like to focus on some of the love – and its limits.

Datomic has this feature called “speculative writes”. It allows you to take an immutable database value, apply some new facts to it (speculatively, i.e., without sending them over to the transactor – this is self-contained within the JVM), and query the resulting database value as if those facts had been transacted for real.

This is incredibly powerful. It lets you “fork” a Datomic connection (with the help of an ingenious library called Datomock), so that you can see all of the data in the source database up to the point of forking, but any new writes happen only in memory. You can develop on top of production data, but without any risk of damaging them! I remember how aghast I was upon first hearing about the concept, but now can’t imagine my life without it. Datomock’s author offers an analogy to Git: it’s like database values being commits, and connections being branches.

Another awesome feature of Datomic is that it lets you travel back in time. You can call as-of on a database value, passing a timestamp, and you get back a db as it was at that point in time – which you can query to your heart’s content. This aids immensely in forensic debugging, and helps answer questions which would have been outright impossible to answer with classical DBMSs.

Now, we’re getting to the crux of this post: as-of and speculative writes don’t compose together. If you try to create a Datomocked connection off of a database value obtained from as-of, you’ll get back a connection to which you can transact new facts, but you’ll never be able to see them. The analogy to Git falls down here: it’s as if Git only let you branch HEAD.

This is a well-known gotcha among Datomic users. From Datomic’s documentation:

as-of Is Not a Branch

Filters are applied to an unfiltered database value obtained from db or with. In particular, the combination of with and as-of means "with followed by as-of", regardless of which API call you make first. with plus as-of lets you see a speculative db with recent datoms filtered out, but it does not let you branch the past.

So it appears that this is an insurmountable obstacle: you can’t fork the past with Datomic.

Or can you?

Reddit user NamelessMason has tried to reimplement as-of on top of d/filter, yielding what seems to be a working approach to “datofork”! Quoting his post:

Datomic supports 4 kinds of filters: as-of, since, history and custom d/filter, where you can filter by arbitrary datom predicate. […]

d/as-of sets a effective upper limit on the T values visible through the Database object. This applies both to existing datoms as well as any datoms you try to add later. But since the tx value for the next transaction is predictable, and custom filters compose just fine, perhaps we could just white-list future transactions?

(defn as-of'' [db t]
  (let [tx-limit (d/t->tx t)
        tx-allow (d/t->tx (d/basis-t db))]
    (d/filter db (fn [_ [e a v tx]] (or (<= tx tx-limit) (> tx tx-allow))))))

[…] Seems to work fine!

Sadly, it doesn’t actually work fine. Here’s a counterexample:

(def conn (let [u "datomic:mem:test"] (d/create-database u) (d/connect u)))

;; Let's add some basic schema
@(d/transact conn [{:db/ident :test/id :db/valueType :db.type/string
                    :db/cardinality :db.cardinality/one :db/unique :db.unique/identity}])
(d/basis-t (d/db conn)) ;=> 1000

;; Now let's transact an entity
@(d/transact conn [{:test/id "test", :db/ident ::the-entity}])
(d/basis-t (d/db conn)) ;=> 1001

;; And in another transaction let's change the :test/id of that entity
@(d/transact conn [[:db/add ::the-entity :test/id "test2"]])
(d/basis-t (d/db conn)) ;=> 1003

;; Trying a speculative write, forking from 1001
(def db' (-> (d/db conn)
             (as-of'' 1001)
             (d/with [[:db/add ::the-entity :test/id "test3"]])
             :db-after))
(:test/id (d/entity db' ::the-entity)) ;=> "test" (WRONG! it should be "test3")

To recap what we just did: we transacted version A of an entity, then an updated version B, then tried to fork C off of A, but we’re still seeing A’s version of the data. Can we somehow save the day?

To see what d/filter is doing, we can add a debug println to the filtering function, following NamelessMason’s example (I’m translating tx values to t for easier understanding):

(defn as-of'' [db t]
  (let [tx-limit (d/t->tx t)
        tx-allow (d/t->tx (d/basis-t db))]
    (d/filter db (fn [_ [e a v tx :as datom]]
                   (let [result (or (<= tx tx-limit) (> tx tx-allow))]
                     (printf "%s -> %s\n" (pr-str [e a v (d/tx->t tx)]) result)
                     result)))))

Re-running the above speculative write snippet now yields:

[17592186045418 72 "test" 1003] -> false
[17592186045418 72 "test" 1001] -> true

So d/filter saw that tx 1003 retracts the "test" value for our datom, but it’s rejected because it doesn’t meet the condition (or (<= tx tx-limit) (> tx tx-allow)). And at this point, it never even looks at datoms in the speculative transaction 1004, the one that asserted our "test3". It looks like Datomic’s d/filter does some optimizations where it skips datoms if it determines they cannot apply based on previous ones.

But even if it did do what we want (i.e., include datoms from tx 1001 and 1004 but not 1003), it would have been impossible. Let’s see what datoms our speculative transaction introduces:

(-> (d/db conn)
    (as-of'' 1001)
    (d/with [[:db/add ::the-entity :test/id "test3"]])
    :tx-data
    (->> (mapv (juxt :e :a :v (comp d/tx->t :tx) :added))))
;=> [[13194139534316 50 #inst "2025-04-22T12:48:40.875-00:00" 1004 true]
;=>  [17592186045418 72 "test3" 1004 true]
;=>  [17592186045418 72 "test2" 1004 false]]

It adds the value of "test3" but retracts "test2"! Not "test"! It appears that d/with looks at the unfiltered database value to produce new datoms for the speculative db value (corroborated by the fact that we don’t get any output from the filtering fn at this point; we only do when we actually query db'). Our filter cannot work: transactions 1001 plus 1004 would be “add "test", retract "test2", add "test3"”, which is not internally consistent.

So, no, really, you can’t branch Datomic from the past.

Which brings us back to square one: what can we do? What is our usecase for branching the past, anyway?

Dunno about you, but to me the allure is integration testing. Rather than having to maintain an elaborate set of fixtures, with artificial entity names peppered with the word “example”, I want to test on data that’s close to production; that feels like production. Ideally, it is production data, isolated and made invincible by forking. At the same time, tests have to behave predictably: I don’t want a test to fail just because someone deleted yesterday an entity from production that the test depends on. Being able to fork the past would have been a wonderful solution if it worked, but… it’s what it is.

So now I’m experimenting with a different approach. My observation here is that my app’s Datomic database is (and I’d wager a guess that most real-world DBs are as well) “mostly hierarchical”. That is, while its graph of entities might be a giant strongly-connected blob, it can be subdivided into many small subgraphs by judiciously removing edges.

This makes sense for testing. A test typically focuses on a handful of “top-level entities” that I need to be present in my testing database like they are in production, along with all their dependencies – sub-entities that they point to. Say, if I were developing a UI for the MusicBrainz database and testing the release page, I’d need a release entity, along with its tracks, label, medium, artist, country etc to be present in my testing DB. But just one release is enough; I don’t need all 10K of them.

My workflow is thus:

create an empty in-memory DB
feed it with the same schema that production has
get hold of a production db with a fixed as-of
given a “seed entity”, perform a graph traversal (via EAVT and VAET indexes) starting from that entity to determine reachable entities, judiciously blacklisting attributes (and whitelisting “backward-pointing” ones) to avoid importing too much
copy those entities to my fresh DB
run the test!

This can be done generically. I’ve written some proof-of-concept code that wraps a Datomic db to implement the Loom graph protocol, so that one can use Loom’s graph algorithms to perform a breadth-first entity scan, and a function to walk over those entities and convert them to a transaction applicable on top of a pristine DB. So far I’ve been able to extract meaningful small sub-dbs (on the order of ~10K datoms) from my huge production DB of 17+ billion datoms.

This is a gist for now, but let me know if there’s interest and I can convert it into a proper library.

Cleaner codebase, happier mind

2025-03-02T00:00:00Z

This is my home-office desk on a typical day. Yuck – look at those mugs, cables and rubbish!

As a person with ADHD, I have a hard time maintaining cleanliness – and a high tolerance to mess around me. However, being in a cluttered environment does take its toll. Often I find myself frustrated by it, but also overwhelmed by tasks at hand, to the point of cleaning up feeling almost an insurmountable chore; often, when I start my workday by physically cleaning things up, I find it giving me a dopamine boost that impacts my productivity for the rest of the day.

I’m not alone. There is a known link between office cleanliness and wellbeing; some companies have clean desk policies. If nothing else, keeping the work environment clean has a positive psychological effect on people.

Increasingly often, I find myself wondering: why don’t we apply the same thinking to codebases?

Let me stress that I’m not talking about “clean code” in the Uncle Bob sense; I mean the chores that everyone would like to see done, but nobody apparently has time for doing — the cruft that has accumulated as tech debt. Every codebase has something like this, and you know it when you see it. That flaky test that’s been failing once in 20 times or so, for no apparent reason. That legacy component that could plausibly be implemented with more modern infrastructure. That Jenkins instance you keep around just for CI-ing it. Those three in-house libraries that all do the same thing, but in slightly different ways. Those modules that are only there because you ran an A/B test involving them a year ago, which has since been rolled back. And so on.

Yes, this is tech debt, and has to be managed economically. Sometimes it makes sense to bear with things as they are, because your time is needed elsewhere. Or there’s no clear financial gain to be had from investing effort in cleaning up that stuff.

But I believe there are psychological gains. It can give a sense of accomplishment; it can make the codebase more pleasant to work with; it can reduce frustration. It can make people happier in the long run.

One of the strategies I’ve found useful for home cleaning is allocating regular but short time slots in the calendar. I call them “a quarter for home.” While 15 minutes is not enough for a thorough cleanup of a room, it can still make a night-and-day difference in how it feels to be in that room.

And so, going forward, I’m instituting the same policy for myself, for codebases I work with. A daily quarter for code. Or half an hour every second morning. I might be busy, but it happens very rarely that I don’t have a half an hour to spare. Sure, some cleanups might require multi-day refactoring to complete, but so what? There are smaller ones, requiring just one or a few sessions. The fact that I have a limited, dedicated time slot means that I’ll ask myself “how can I use it effectively?” And even larger undertakings can be done in multiple half-an-hour-long sittings: there are no deadlines.

Whether or not this works out remains to be seen (forming habits is another thing that ADHD makes harder). I plan to follow up with a retrospective post in a few months. In the meantime, if you have done or plan to do something similar, I’m keen to hear from you!

Double, double toil and trouble or, Corner-Cases of Comparing Clojure Numbers

2025-02-21T00:00:00Z

Let’s talk about Clojure.

In Clojure, comparing two numbers can throw an exception.

Check this out:

(< 1/4 0.5M)
;=> true        ; as expected

(< 1/3 0.5M)
; Execution error (ArithmeticException) at java.math.BigDecimal/divide (BigDecimal.java:1783).
; Non-terminating decimal expansion; no exact representable decimal result.

But why? Why would comparing two perfectly cromulent numbers throw an ArithmeticException?! Everybody knows that ⅓ < 0.5 – we aren’t dividing by zero or anything like that, are we?

Well, the problem is that we’re comparing a ratio to a BigDecimal (a decimal number of arbitrary precision). Java doesn’t offer a built-in way of comparing these (Clojure’s ratios aren’t part of the Java standard library), so it has to coerce one into the other. It chooses to coerce the ratio into a BigDecimal, so divides (bigdec 1) by (bigdec 3)…

…and that throws! The decimal representation of ⅓ is infinite, so you can’t keep all the digits in finite memory.

You may ask: how exactly does Clojure know what coercions to apply and how to produce the result? Let’s look at the code.

The implementation of clojure.core/< calls the Java method clojure.lang.Numbers.lt, which is implemented like this:

static public boolean lt(Object x, Object y){
	return ops(x).combine(ops(y)).lt((Number)x, (Number)y);
}

What’s ops? It’s an implementation of the Ops interface, which has methods for addition, subtraction, etc.; each number class has its own implementation: there is a LongOps, RatioOps, BigDecimalOps etc.

The combine method can alter the behaviour of an Ops depending on the type of the other argument – for example, RatioOps switches to BigDecimalOps if the other argument is a BigDecimal. It’s like a poor man’s implementation of multiple dispatch, which Java doesn’t have.

BigDecimalOps.lt calls toBigDecimal on both arguments, and it’s that method that performs the failing division:

static BigDecimal toBigDecimal(Object x) {
    // ... other cases ...
    if (x instanceof Ratio) {
        Ratio r = (Ratio)x;
        return (BigDecimal)divide(new BigDecimal(r.numerator), r.denominator);
    }
}

Incidentally, this used to produce the expected result in Clojure up to 1.2.1. At that version, Clojure already used the Ops-based multiple dispatch, but combining RatioOps with BigDecimalOps would yield the former, not the latter.

Is the current behaviour a bug? I’m not sure. It seems so, but maybe 1.3.0’s optimizations warrant this behaviour in the admitedly rare case. There’s an ongoing discussion on the Ask Clojure Q&A.

So, in current Clojure, how do you compare ratios to bigdecs? Simple, you think: just coerce the bigdec to a double!

(< 1/3 (double 0.5M))
;=> true

(> 2/3 (double 0.5M))
;=> true

(= 1/2 (double 0.5M))
;=> false

Wait, WHAT?

Yep. Comparing ratios to doubles for inequality works fine, but a ratio is never equal to a double (nor a bigdec), even if said double is an exact representation of the ratio.

This one is documented, but often forgotten about (and not hinted at by the docstring). From Clojure’s equality guide:

Clojure’s = is true when called with two immutable scalar values, if:
Both arguments are nil, true, false, the same character, or the same string (i.e. the same sequence of characters).
Both arguments are symbols, or both keywords, with equal namespaces and names.
Both arguments are numbers in the same 'category', and numerically the same, where category is one of:
integer or ratio
floating point (float or double)
BigDecimal.

And indeed, the code for Numbers.equal has a check for both operands’ categories before it delves to the Ops business that we’ve seen. Remember also that Clojure has a numbers-only == which doesn’t trigger that category check:

(== 1/2 (double 0.5M))
;=> true ; yay

Corollary: if you want to compare a ratio to a BigDecimal, you could coerce the bigdec to a double. That can return an incorrect result only in a very narrow range of cases: when the BigDecimal’s value is close enough to the ratio that it would be lost in the double conversion.

For 100% certainty, the only way I’m aware of is to remember to always use == when comparing for equality, and explicitly coerce the bigdec to ratio:

(defn exactly-equals? [ratio bigdec]
  (== (* 1 (clojure.lang.Numbers/toRatio bigdec)) ratio))

(exactly-equals? 1/18446744073709551616 5.42101086242752217003726400434970855712890625E-20M)
;=> true ; correct even in this pathological case!

(Multiplying by 1 forces Clojure to normalize the ratio. Otherwise, converting 0.5M would have yielded 5/10 which doesn’t test == to 1/2. Go figure.)

Lossy CSS compression for fun and loss (or profit)

2024-01-26T00:00:00Z

What

Late last year, I had an idea that’s been steadily brewing in my head. I’ve found myself with some free time recently (it coincided with vacation, go figure), and I’ve hacked together some proof-of-concept code. Whether or not it is actually proving the concept I’m not sure, but the results are somewhat interesting, and I believe the idea is novel (I haven’t found any other implementation in the wild). So it’s at least worthy of a blog post.

I wrote cssfact, a lossy CSS compressor. That is, a program that takes some CSS and outputs back some other CSS that hopefully retains some (most) of the information in the input, but contains fewer rules than the original. Exactly how many rules it produces is configurable, and the loss depends on that number.

The program only works on style rules (which make up the majority of a typical CSS). It leaves the non-style rules unchanged.

Here’s the source. It’s not exactly straightforward to get it running, but it shouldn’t be very hard, either. It’s very simple – the program itself doesn’t contain any fancy logic; the actual decisions on what the output will contain are made by an external program.

If you just want to see some results, here is a sample with my homepage serving as a patient etherized upon a table. Its CSS is quite small – 55 style rules that cssfact can work on – and here’s how the page looks with various settings:

Original: page, CSS, source SASS
1 style rule: page, CSS (93% information loss)
5 style rules: page, CSS (74% information loss)
10 style rules: page, CSS (55% information loss)
20 style rules: page, CSS (31% information loss)
30 style rules: page, CSS (17% information loss)

My homepage and both of my blogs all use the same CSS, so you can try to replace the CSS in your browser’s devtools elsewhere on the site and see how it looks.

How

Three words: binary matrix factorization (BMF, in the Boolean algebra).

I guess I could just stop here, but I’ll elaborate just in case it isn’t clear.

Consider a simple CSS snippet:

h1, h2 {
   padding: 0;
   margin-bottom: 0.5em;
}

h1 {
   font-size: 32px;
   font-weight: bold;
}

h2 {
   font-size: 24px;
   font-weight: bold;
}

The first rule tells you that for all elements that match either the h1 or h2 selectors, the two declarations should apply.

You could visualize this CSS as a 5x2 binary matrix A^T where the n columns correspond to simple selectors (i.e., without commas in them) and the m rows correspond to declarations:

	`h1`	`h2`
`padding: 0`	1	1
`margin-bottom: 0.5em`	1	1
`font-size: 32px`	1	0
`font-size: 24px`	0	1
`font-weight: bold`	1	1

You could also transpose the matrix, yielding A with m rows denoting selectors and n columns denoting declarations. For my homepage’s CSS, m = 60 and n = 81; for bigger stylesheets, several thousand in either direction is not uncommon.

Now, linear algebra gives us algorithms to find a matrix A′ ≈ A such that there exists a decomposition A′ = B × C, where B has dimensions m × r, C has dimensions r × n, and r is small – typically much smaller than m or n. So this is a way of dimensionality reduction.

In the usual algebra of real numbers, there’s no guarantee that B or C will themselves be binary matrices – in fact, most likely they won’t. But if we operate in Boolean algebra instead (i.e. one where 1 + 1 = 1), then both B and C will be binary. The flip side is that the Boolean BMF problem is NP-hard, so the algorithms found in the wild perform approximate decompositions, not guaranteed to be optimal.

But that’s okay, because lossiness is inherent in what we’re doing anyway, and it turns the binary matrices B and C are readily interpretable. Look again at the CSS matrix above: why is there a 1 in the top-left cell? Because at least one of the CSS rules stipulates the declaration padding: 0 for the selector h1.

This is exactly the definition of matrix multiplication in the Boolean algebra. The matrix A′ will have a 1 at coordinates [i, j] iff there is at least one k ∈ {1, …, r} such that B[i, k] = 1 and C[k, j] = 1. So the columns of B and rows of C actually correspond to CSS rules! Every time you write CSS, you’re actually writing out binary matrices – and the browser is multiplying them to get at the actual behaviour.

Well, not really, but it’s one way to think about it. It’s not perfect – it completely glosses over rules overlapping each other and having precedence, and treats them as equally important – but it somewhat works!

You could plug in any BMF algorithm to this approach. For cssfact, I’ve picked the code by Barahona and Goncalves 2019 – sadly, I wasn’t able to find the actual paper – not because it performs spectacularly well (it’s actually dog-slow on larger stylesheets), but because I was easily able to make it work and interface with it.

Why

Why not?

The sheer joy of exploration is reason enough, but I believe there are potential practical applications. CSS codebases have the tendency to grow organically and eventually start collapsing under their own weight, and they have to be maintained very thoughtfully to prevent that. In many CSS monstrosities found in the wild, there are much cleaner, leaner, essence-capturing cores struggling to get out.

This tool probably won’t automatically extract them for you – so don’t put it in your CI pipeline – but by perusing the CSS that it produces and cross-checking it with the input, you could encounter hints on what redundancy there is in your styles. Things like “these components are actually very similar, so maybe should be united” may become more apparent.

My mental model of transducers

2023-09-09T00:00:00Z

Intro

I’ve been programming in Clojure for a long time, but I haven’t been using transducers much. I learned to mechanically transform (into [] (map f coll)) to (into [] (map f) coll) for a slight performance gain, but not much beyond that. Recently, however, I’ve found myself refactoring transducers-based code at work, which prompted me to get back to speed.

I found Eero Helenius’ article “Grokking Clojure transducers” a great help in that. To me, it’s much more approachable than the official documentation – in a large part because it shows you how to build transducers from the ground up, and this method of learning profoundly resonates with me. I highly recommend it. However, it’s also useful to have a visual intuition of how transducers work, a mental model that hints at the big picture without zooming into the details too much. In this post, I’d like to share mine and illustrate it with a REPL session. (Spoiler alert: there’s core.async ahead, but in low quantities.)

Pictures

Imagine data flowing through a conveyor belt. Say, infinitely repeating integers from 1 to 5:

I’m using the abstract term “conveyor belt”, rather than “sequence” or something like this, to avoid associations with any implementation details. Just pieces of data, one after another. These data may be anything; they may flow infinitely or stop at some point; may or may not all exist in memory at the same time. Doesn’t matter. That’s the beauty of transducers: they completely abstract away the implementation of sequentiality.

So, what is a transducer, intuitively? It’s a mechanism for transforming conveyor belts into other conveyor belts.

For example, (map inc) is a transducer that says: “take this conveyor belt and produce one where every number is incremented”. Applying it to the above belt yields this one:

An important thing about transducers is that they’re composable. To understand that, imagine further transforming the above belt by removing all the odd numbers. Intuitively, that’s what (remove odd?) does:

(I’ve left the spacing between boxes the same as before, because it helps me visualise (remove odd?) better. I imagine an invisible gnome sitting above the belt, watching carefully all the boxes that pass below it, and snatching greedily every one that happens to contain an odd number.)

Composability means that Clojure lets you say (comp (map inc) (remove odd?)) to mean the transducer that transforms the first belt to the third one. By putting together two simple building blocks, we produced a more complex one – that it itself reusable and can be used as another building block in an ever more complex data pipeline.

Notice we still haven’t said anything about the actual representation of the data, but are already able to model complex processes. We can then apply them to actual data, whether it’s a simple vector-to-vector transformation within the same JVM, or listening to a topic on a Kafka cluster, summarizing the incoming data and sending them to a data warehouse.

Code

OK, enough handwaving, time for a demo. Let’s fire up a REPL and load core.async (I’m assuming you’ve added it to your dependencies already). I won’t reproduce here the resulting values of expressions we evaluate (they’re mostly nils anyway), but I will reproduce output from the REPL (as comments).

(require '[clojure.core.async :refer [chan <!! >!! thread close!]])

Why core.async? Because I find it a great way to implement a conveyor belt that you can play with interactively. This can help you understand how the various Clojure-provided transducers work. For the noncognoscenti: core.async is a Clojure library that allows you to implement concurrent processes that communicate over channels. By default, that communication is synchronous, meaning that if a process tries to read from a channel, it blocks until another process writes something to that channel.

As it happens, we can pass a transducer to the function that creates channels, chan. It will put the invisible gnomes to work on values that pass through the channel. So you can view that channel as a conveyor belt!

For easy tinkering, we can do this:

(defn transformed-belt [xf]
  (let [ch (chan 1 xf)]
    (thread
      (loop []
        (when-some [value (<!! ch)]
          (println "Value:" (pr-str value)))
          (recur)))
    ch))

This fires up a process working at the receiving end of the conveyor belt. It will print out any transformed values as soon as they become available. Typing at the REPL, we will assume the role of producer, putting data on the belt.

Like this:

(def b (transformed-belt (map inc)))
(>!! b 2)
; Value: 3
(>!! b 42)
; Value: 43

It works! We’re putting in numbers, and out come the incremented ones.

When we’re done experimenting with the belt, we need to close! it. This will cause the worker thread to shutdown.

(close! b)

We can now experiment with something more complex, like that combined transducer we’ve talked about before:

(def b (transformed-belt (comp (map inc) (remove odd?))))
(>!! b 1)
; Value: 2
(>!! b 2)
(>!! b 3)
; Value: 4

We got the transformed 1 and 3, but the intermediate value for 2 was odd, so it was snatched by the gnome and we never saw it.

There’s even more fun to be had! Let’s try (partition-all 3):

(close! b)
(def b (transformed-belt (partition-all 3)))
(>!! b 1)

Nothing…

(>!! b 2)

Still nothing…

(>!! b 3)
; Value: [1 2 3]

Blammo! Our gnome is now packaging together incoming items into bundles of three, caching them in the interim while the bundle is not complete yet. But if we close the input prematurely, it will acknowledge and produce the incomplete bundle:

(>!! b 4)
(>!! b 5)
(close! b)
; Value: [4 5]

In fact, partition-all is what prompted me to write this post. That code at work I mentioned actually included a transducer composition that had a (net.cgrand.xforms/into []) in it. That transducer (from Christophe Grand’s xforms library) accumulates data until there’s nothing more to accumulate, and then emits all of it as one large vector. By replacing it with partition-all, I altered the downstream processing to handle multiple smaller batches rather than one huge batch, improving the system’s latency.

A small change for a huge win. Clojure continues to amaze me.

Plus, it’s fun to make JS-less animations in SVG. :)

A visual tree iterator in Rust

2023-07-20T00:00:00Z

My adventure with learning Rust continues. As a quick recap from the previous post, I’m writing a tree viewer. I have now completed another major milestone, which is to rewrite the tree-printing function to use an iterator. (Rationale: it makes the code more reusable – I can, for instance, easily implement a tree-drawing view for Cursive with it.)

And, as usual, I’ve fallen into many traps before arriving at a working version. In this post, I’ll reflect on the mistakes I’ve made.

The problem

Let’s start with establishing the problem. Given a Tree struct defined as:

pub struct Tree<T> {
    value: T,
    children: Vec<Tree<T>>,
}

I want it to have a lines() method returning an iterator, so that I can implement print_tree as:

fn print_tree<T: Display>(t: &Tree<T>) {
    for line in t.lines() {
        println!("{}", line);
    }
}

and have the output identical to the previous version.

The algorithm

Before we dive into the iterator sea, let’s have a look at the algorithm. Imagine that we’re printing the tree (in sexp-notation) (root (one (two) (three (four))) (five (six))). This is its dissected visual representation:

Each line consists of three concatenated elements, which I call “parent prefix”, “immediate prefix”, and “node value”. The immediate prefix is always (except for the root node) "└─ " or "├─ ", depending on whether the node in question is the last child of its parent or not. The parent prefix has variable length that depends on the node’s depth, and has the following properties:

For any node, all its subnodes’ parent prefixes start with its parent prefix.
For any node, the parent prefixes of its direct children are obtained by appending " " or "│ " to its own parent prefix, again depending on whether the node is its parent’s last child or not.

This gives rise to the following algorithm that calls itself recursively:

fn print_tree<T>(t: &Tree<T>,
                 parent_prefix: &str,
                 immediate_prefix: &str,
                 parent_suffix: &str)
    where T: Display
{
    // print the line for node t
    println!("{0}{1}{2}", parent_prefix, immediate_prefix, t.value);

    // print all children of t recursively
    let mut it = t.children.iter().peekable();
    let child_prefix = format!("{0}{1}", parent_prefix, parent_suffix);

    while let Some(child) = it.next() {
        match it.peek() {
            None    => print_tree(child, &child_prefix, "└─ ", "   "),
            Some(_) => print_tree(child, &child_prefix, "├─ ", "│  "),
        }
    }
}

The three extra string arguments start out as empty strings and become populated as the algorithm descends into the tree. The implementation uses a peekable iterator over the children vector to construct the prefixes appropriately.

Building an iterator, take 1

So the printing implementation is recursive. How do we write a recursive iterator in Rust? Is it even possible? I initially thought I would have to replace the recursion with an explicit stack stored in the iterator’s mutable state, started to write some code, and promptly got lost.

I then searched for the state-of-the-art on iterating through trees, and found this post by Amos Wenger. You might want to read it first before continuing; my final implementation ended up being an adaptation of one of the techniques described there.

My definition of tree is slightly different than Amos’s (mine has only one value in a node), but it’s easy enough to adapt his final solution to iterate over its values:

impl<T> Tree<T> where T: Display {
    pub fn lines<'a>(&'a self) -> Box<dyn Iterator<Item = String> + 'a> {
        let child_iter = self.children.iter().map(|n| n.lines()).flatten();

        Box::new(
            once(self.value.to_string()).chain(child_iter)
        )
    }
}

(Note the dyn keyword; Rust started requiring it in this context sometime after Amos’s article was published.)

Clever! This sidesteps the issue of writing a custom iterator altogether, by chaining some standard ones, wrapping them in a box and sprinkling some lifetime annotation magic powder to appease the borrow checker. We also make it explicit that the iterator is returning strings, no matter what the type of tree nodes is.

But… while it compiles and produces a sequence of strings, they don’t reflect the structure of the tree: there’s no pretty prefixing going on.

Let’s try to fix that. Clearly, the iterator-returning function will now need to take three additional arguments, just like print_tree – the first one will now be a String because we’ll be building it at runtime, and the other two are string literals so can just be &'static strs. Let’s try:

// changing the name because we now accept extra params
// I want the original lines() to keep its signature
pub fn prefixed_lines<'a>(&'a self,
                          parent_prefix: String,
                          immediate_prefix: &'static str,
                          parent_suffix: &'static str)
                         -> Box<dyn Iterator<Item = String> + 'a>
{
    let value = format!("{0}{1}{2}", parent_prefix, immediate_prefix, self.value);
    let mut peekable = self.children.iter().peekable();
    let child_iter = peekable
        .map(|n| {
            let child_prefix = format!("{0}{1}", parent_prefix, parent_suffix);
            let last = !peekable.peek().is_some();
            let immediate_prefix = if last { "└─ " } else { "├─ " };
            let parent_suffix = if last { "   " } else { "│  " };
            n.prefixed_lines(child_prefix, immediate_prefix, parent_suffix)
        })
        .flatten();

    Box::new(
        once(value).chain(child_iter)
    )
}

And, sure enough, it doesn’t compile. One of the things that Rust complains about is:

error[E0373]: closure may outlive the current function,
    but it borrows `peekable`, which is owned by the current function
  --> src/main.rs:55:18
   |
55 |     .map(|n| {
   |          ^^^ may outlive borrowed value `peekable`
56 |         let child_prefix = format!("{0}{1}"...
57 |         let last = !peekable.peek().is_some();
   |                     -------- `peekable` is borrowed here
   |
note: closure is returned here
  --> src/main.rs:64:9
   |
64 | Box::new(once(value).chain(child_iter))
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: to force the closure to take ownership of `peekable`
      (and any other referenced variables), use the `move` keyword
   |
55 |     .map(move |n| {
   |          ++++

So trying to borrow the iterator from within the closure passed to map() is non-kosher. I’m not sure where the “may outlive the current function” comes from, but I think this is because the iterator returned by map is lazy, and so the closure needs to be able to live for at least as long as the resulting iterator does. The suggestion of using move doesn’t work, because it then invalidates the map call. (Rust complained about borrowing parent_prefix and parent_suffix as well, and move does work for those.)

Taking a step back

I was not able to find a way out of this conundrum. But after re-reading Amos’s post, I’ve decided to revisit his “bad” approach, with a custom iterator (which I now think is actually not bad at all). It made all the more sense to me when I considered future extensibility: eventually I want to be able to render certain subtrees collapsed, and I want the iterator to know about that.

It took me a while to understand how that custom iterator works. It doesn’t have an explicit stack and doesn’t try to “de-recursivize” the process! Instead, it holds two sub-iterators, one initially iterating over the node values (viter) and the other over children (citer). The next() method just tries viter first; if it returns nothing, then a next subtree is picked from citer, and viter (by now already consumed) is replaced by another instance of the same iterator, but for that subtree.

Meditate on this for a while. There’s a lot going on here.

viter starts out as an iterator over a vector (a std::slice::Iter), and then gets replaced by a tree iterator (Amos’s NodeIter).
This is possible because it’s declared as a Box<Iterator<Item = &'a i32> + 'a>. TIL: in Rust, you can’t use a trait directly as a type for a struct field (because there’s no telling what its size will be), but you can put it into a Box (or, I guess, Rc or Arc). Polymorphism, baby!
Recursion is achieved by having NodeIter contain a member that, at times, is itself another NodeIter; whereas the correct behaviour is obtained by having those NodeIters instantiated at the right moment.

Whoa. Now that’s clever. I probably wouldn’t have thought about this. It’s good to be standing on the shoulders of giants. Thanks, Amos.

Anyway, let’s adapt it to our use-case and add the prefixes to the iterator’s state:

pub struct TreeIterator<'a, T> {
    parent_prefix: String,
    immediate_prefix: &'static str,
    parent_suffix: &'static str,
    viter: Box<dyn Iterator<Item = String> + 'a>,
    citer: Box<dyn Iterator<Item = &'a Tree<T>> + 'a>,
}

And our iterator implementation follows Amos’s, except that we handle the prefixes and initialize viter with a Once iterator:

impl<T> Tree<T> where T: Display {
    pub fn prefixed_lines<'a>(&'a self,
                      parent_prefix: String,
                      immediate_prefix: &'static str,
                      parent_suffix: &'static str)
                     -> TreeIterator<'a, T>
    {
        TreeIterator {
            parent_prefix: parent_prefix,
            immediate_prefix: immediate_prefix,
            parent_suffix: parent_suffix,
            viter: Box::new(once(format!("{}", &self.value))),
            citer: Box::new(self.children.iter().peekable()),
        }
    }
}

impl<'a, T> Iterator for TreeIterator<'a, T> where T: Display {
    type Item = String;

    fn next(&mut self) -> Option<Self::Item> {
        if let Some(val) = self.viter.next() {
            Some(format!("{0}{1}{2}", self.parent_prefix, self.immediate_prefix, val))
        } else if let Some(child) = self.citer.next() {
            let last = !self.citer.peek().is_some();
            let immediate_prefix = if last { "└─ " } else { "├─ " };
            let parent_suffix = if last { "   " } else { "│  " };
            let subprefix = format!("{0}{1}", self.parent_prefix, self.parent_suffix);
            self.viter = Box::new(child.prefixed_lines(subprefix, immediate_prefix, parent_suffix));
            self.next()
        } else {
            None
        }
    }
}

Looks sensible, right? Except (you guessed it!) it doesn’t compile:

error[E0599]: no method named `peek` found for struct
    `Box<(dyn Iterator<Item = &'a Tree<T>> + 'a)>` in the current scope
  --> src/main.rs:38:36
   |
38 |     let last = !self.citer.peek().is_some();
   |                            ^^^^ help: there is a method with a
   |                                 similar name: `peekable`

Ah, right. We’ve forgotten to tell Rust that citer contains a Peekable. Let’s fix that:

pub struct TreeIterator<'a, T> {
    // … other fields as before
    citer: Box<Peekable<dyn Iterator<Item = &'a Tree<T>> + 'a>>,
}

Nope, that doesn’t compile either:

error[E0277]: the size for values of type `(dyn Iterator<Item = &'a Tree<T>> + 'a)`
    cannot be known at compilation time
  --> src/main.rs:16:12
   |
16 |     citer: Box<Peekable<dyn Iterator<Item = &'a Tree<T>> + 'a>>,
   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |            doesn't have a size known at compile-time
   |
   = help: the trait `Sized` is not implemented for
           `(dyn Iterator<Item = &'a Tree<T>> + 'a)`
note: required by a bound in `Peekable`

Bummer. We can put a trait of unknown size in a Box, but we can’t put a Peekable in between! Peekable needs to know the size of its contents at compile time. Trying to convince it by sprinkling + Sized in various places doesn’t work.

Fortunately, we know the actual type of citer. It’s an iterator over Vec<Tree<T>>, so it’s a std::slice::Iter<Tree<T>>. Let’s put it in the definition of TreeIterator:

use std::slice::Iter;

pub struct TreeIterator<'a, T> {
    // … other fields as before
    citer: Box<Peekable<Iter<'a, Tree<T>>>>,
}

And it compiles!

Removing the root

Here’s what happens when you try to run treeviewer with this implementation on a very simple tree:

$ echo -e 'one\ntwo' | ./target/debug/treeviewer

├─ one
└─ two

Seems good, but that empty line is worrying. That’s because treeviewer takes slash-separated paths as input, and because the paths can begin with anything, it puts everything under a pre-existing root node with an empty value. We don’t want the output to contain that root node.

Simple, right? We just need to initialize viter with an empty iterator if one of the prefixes is also empty:

pub fn prefixed_lines<'a>(&'a self,
                          parent_prefix: String,
                          immediate_prefix: &'static str,
                          parent_suffix: &'static str)
                         -> TreeIterator<'a, T>
{
    TreeIterator {
        // … other fields as before
        viter: Box::new(if immediate_prefix.is_empty() {
                           empty()
                        } else {
                           once(format!("{}", &self.value))
                        }),
    }
}

And (this is becoming obvious by now) we’re rewarded by yet another interesting error message:

error[E0308]: `if` and `else` have incompatible types
  --> src/main.rs:49:32
   |
46 |   viter: Box::new(if immediate_prefix.is_empty() {
   |  _________________-
47 | |                    empty()
   | |                    ------- expected because of this
48 | |                 } else {
49 | |                    once(format!("{}", &self.value))
   | |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   | |                      expected `Empty<_>`, found `Once<String>`
50 | |                 }),
   | |_________________- `if` and `else` have incompatible types
   |
   = note: expected struct `std::iter::Empty<_>`
              found struct `std::iter::Once<String>`

Ahhh. Even though both branches of the if expression have types that meet the trait requirement (Iterator), these are different types. Apparently, if insists on both branches being the same type.

What we can do is lift the if upwards:

pub fn prefixed_lines<'a>(&'a self,
                          parent_prefix: String,
                          immediate_prefix: &'static str,
                          parent_suffix: &'static str)
                         -> TreeIterator<'a, T>
{
    if immediate_prefix.is_empty() {
        TreeIterator {
            // … other fields as before
            viter: Box::new(empty()),
        }
    } else {
        TreeIterator {
            // … other fields as before, repeated
            viter: Box::new(once(format!("{}", &self.value))),
        }
    }
}

Yuck. We needed to duplicate most of the instantiation details of TreeIterator. But at least it compiles and works – the root is gone!

$ echo -e 'one\ntwo' | ./target/debug/treeviewer
├─ one
└─ two

Fixing a bug

Or does it? Let’s try the original tree from our illustration:

$ echo -e 'one/two\none/three/four\nfive/six' | ./target/debug/treeviewer
├─ one
├─ │  ├─ two
├─ │  └─ three
├─ │  └─ │     └─ four
└─ five
└─    └─ six

Uh oh. It’s totally garbled. Time to go back to the drawing board.

It took me quite a few println!() debugging statements to figure out what was going on. Remember, the TreeIterator for the whole tree will contain a nested TreeIterator in its viter field, which in turn may contain another nested TreeIterator, and so on. Each of these nested iterators eventually passes its value to the “parent” iterator… decorating it with prefixes, again and again!

To fix this, we need to differentiate between two cases:

We’re producing the value for the node we’re holding (that’s when we need the prefixes);
We’re propagating up the value returned by viter that holds a nested TreeIterator (in this case we need to return it unchanged).

We’ll add two more fields to TreeIterator: a boolean indicating whether we’ve already emitted the value at the node in question, and a reference to that value itself.

pub struct TreeIterator<'a, T> {
    // … other fields as before
    emitted: bool,
    value: &'a T,
}

And we initialize them as follows:

pub fn prefixed_lines<'a>(&'a self,
                          parent_prefix: String,
                          immediate_prefix: &'static str,
                          parent_suffix: &'static str)
                         -> TreeIterator<'a, T>
{
    TreeIterator {
        emitted: immediate_prefix.is_empty(),
        value: &self.value,
        viter: Box::new(empty()),
        // … other fields as before
    }
}

Note that the logic of skipping emitting the root has been moved to the initialization of emitted. This lets us kill the duplication! We now initialize viter to empty() – it no longer matters; this initial value will be unused and eventually replaced by child TreeIterators.

Finally, we need to amend the implementation of next():

fn next(&mut self) -> Option<Self::Item> {
    if !self.emitted {
        self.emitted = true;
        // decorate value with prefixes
        Some(format!("{0}{1}{2}", self.parent_prefix, self.immediate_prefix, self.value))
    } else if let Some(val) = self.viter.next() {
        Some(val) // propagate unchanged
    } else if let Some(child) = self.citer.next() {
        // … this part doesn’t change
    } else {
        None
    }
}

And this version, finally, compiles and works as expected:

$ echo -e 'one/two\none/three/four\nfive/six' | ./target/debug/treeviewer
├─ one
│  ├─ two
│  └─ three
│     └─ four
└─ five
   └─ six

Takeaways

There are quite a few things I learned about Rust in the process, and then there are meta-learnings. Let’s recap the Rust-specific ones first.

You can’t put a trait in a struct directly, but you can put a Box of traits.
But not a Box of Foo of traits, where Foo expect its parameter to be Sized.
If you’re map()ping a closure over an iterator, you can’t access that iterator itself from within the closure.
Closures by default borrow stuff that they close over, but you can move that stuff to the closure instead with the move keyword. If I understand correctly, it’s an all-or-nothing move; no mix and match.
In an if expression, all branch expressions must be of the same type; conforming to the same trait is not enough.

And now the general ones.

First off, Rust is hard. (The least wonder in the world.) Most of the traps I’ve fallen into are accidental complexity, not inherent in the simple problem. I guess that it’s really a matter of the initial steepness of Rust’s learning curve, and that things become easier once you’re past the initial hurdles – you train your instincts to avoid these tarpits and keep the compiler happy.

I’m still very much a newcomer to Rust, so I’m pretty sure I ended up taking a suboptimal approach. A seasoned Rustacean would probably write this code in an altogether different way. If you have suggestions how to improve my code, or how to attack the problem from different angles, tell me!

As an experiment in learning, I’ve decided to reflect on my mistakes more frequently. I elaborate on it in my previous post, which also discusses changes I’ve made to my workflow to make learning easier.

Writing the present post showed me how much time it takes. It took me just over an hour to fall into all the traps described in this post and find a way out. A few hours, if you count reading Amos’s post and contemplating the problem. In contrast, this write-up took about two days, plus some yak shaving it led me to. Part of the reason is that the actual road that I went through was much more bumpy than described here. While writing this, I had to go through no fewer than fifty-six compilation attempts. Here are some of them, with one-line descriptions and a tick or cross to indicate whether the compilation attempt was successful:

Yet I think it’s worth it. Some of the errors I’ve fixed groping in the dark, kind of randomly: I have now revisited them and I feel I have a much more solid understanding of what’s going on.

And finally: if you’re into Rust, Amos’s blog (fasterthanli.me) is an excellent resource. Go sponsor him on GitHub if these articles are of value to you.

Learning to learn Rust

2023-07-06T00:00:00Z

I’m enjoying a two-month sabbatical this summer. It’s been great so far! I’ve used almost half of the time to cycle through the entire Great Britain and let my body work physically and my mind rest (usually, the opposite is true). And now that I’m back, I’ve switched focus to a few personal projects that I have really wanted to work on for a while but never found time.

One of these projects is to learn Rust. Clojure has made me lazy and it’s really high time for me to flex the language-learning muscles. But while the title says “Rust,” there is nothing Rust-specific about the tip I’m about to share: it can be applied to many programming languages.

I learn best by doing, so after learning the first few chapters of the Rust book, I set off to write a simple but non-trivial program: a console-based tree viewer. The idea is to have a TUI that you could feed with a set of slash-separated paths:

one/two
one/three/four
five/six

and have it render the tree visually:

├─ one
│  ├─ two
│  └─ three
│     └─ four
└─ five
   └─ six

allowing to scroll it, search it and (un)fold individual subtrees. The paths may come from the filesystem (e.g. you could pipe find . -type f into it), but not necessarily: they might be S3 object paths, hierarchical names of RocksDB keys (my actual use case), or represent any other tree.

Today I hit a major milestone: I wrote a function, append_path, that, given a tree of strings and a slash-separated path, creates new nodes as needed and adds them to the tree. Needless to say, I didn’t get it right on the first attempt. I fought with the compiler and its borrow checker a lot.

I guess that’s a typical ordeal that a Rust newbie goes through. But along treeviewer’s code, I keep an org-mode file called LEARN where I jot down things that I might want to remember for the future. So after getting append_path right, I wanted to pause and look back at the failed attempts and the corresponding compiler errors, to try to make sense of them, armed with my new knowledge.

But… which versions of the code caused which errors? I had no idea! And the Emacs undo tree is really hard to dive in.

An obvious way out is to commit early and often. But this (1) requires a discipline that I don’t have at the moment, and (2) pollutes the Git history. So, instead, I automated it.

I’ve added a Makefile to my repo. Instead of cargo run, I will now be compiling and executing the code via make run. In addition to Cargo, this runs a script that:

Commits everything that’s uncommitted yet
Creates an annotated tag with that commit, named build-$TIMESTAMP, that serves as a snapshot of the code that was built
Reverts the working tree to the state it was in (whatever was staged stays staged, whatever was unstaged remains unstaged)

This workflow change has the nice property of being unintrusive. I can hack on the code, compile, commit and rebase to my heart’s delight. But when I need to look back at the most recent compilation attempts, all I need to do is git tag and from there I can meditate on individual mistakes I made.

Why tags and not branches, one might ask? I guess this is a matter of personal preference. I opted for tags because I want to minimise the chance of accidentally pushing the branch. The resulting tags are technically dangling, which I don’t see as an issue: the older the build tag, the less likely I am to need it in the future, so I see myself cleaning up old builds every now and then.

When working with a language I’m proficient in, I don’t need this. But as a learning aid, I already see the idea as indispensable. Feel free to reuse it!

Testing a compiler that can’t even print stuff out

2021-09-25T00:00:00Z

I’m enjoying a week-long vacation. In addition to other vacationy things (a trip to Prague, yay!), I wanted to do some off-work programming Just For Fun™ and revisit one of my dormant pet projects, to see if I can make some progress.

I opted for Lithium, my toy x86 assembler and Lisp compiler that hasn’t seen new development since 2014. But before that, I had blogged about it and even talked about it at EuroClojure one time.

Over the week, I’ve re-read the paper that I’ve been loosely following while developing Lithium. In it, Abdulaziz Ghuloum advocates to have a testing infrastructure from day one, so that one can ensure that the compiler continues to work after each small modification. I’d cut corners on it before, but today, I’ve finally added one.

What’s the big deal? And why not earlier?

One of the original goals that I set myself for Lithium is that it have no runtime dependencies. Not even a C library; not even an OS. It produces raw x86 binaries targetting real mode – non-relocatable blobs of raw machine code. I’m running them in DOSBox, because it’s convenient, but the point is it’s not necessary.

(Some day, I’ll write a mission statement to explain why. But that’s a story for another day.)

And because the setup is so minimalistic, the setup suggested by Ghuloum becomes unfeasible. Ghuloum presupposes the existence of a host C compiler and linker; I have no such privilege. By itself, Lithium can barely output stuff to screen. There’s a write-char primitive that emits one character, but nothing more than that. And there’s as yet no library to add things to, because there’s no defn and not much of a global environment.

So what to do? I thought about the invariant in Ghuloum’s design, one that Lithium inherits as well:

Every expression is compiled to machine code that puts its value in the AX register.

If I could somehow obtain the values that the CPU registers have at the end of executing a Lithium-compiled program, then I could compare them to the expected value in a test. But how to grab those registers?

That turned out to be easier than expected. Instead of extending Lithium to support printing decimal or hexadecimal numbers, I just grabbed some pre-existing assembly code to affix to the program as an epilog. (It does depend on DOS’s interrupt 21h, but hey, it doesn’t hurt to have it for debugging/testing only.) Surprise: the snippet failed to compile, because Lithium’s assembler is woefully incomplete! But it was easy enough to extend it until it worked.

So this gave me a way to view the program’s results.

But there’s another problem: these results are printed within DOSBox. In the emulated DOS machine. I needed a way to transfer them back to the host. Can you guess how?

Yes, you’re right: the simplest thing (DOS redirection to a file, as in PROG.COM >REG.TXT) works. And you’ll laugh at me that it hasn’t occurred to me until now, when I’m writing up the commit that’s already out in the wild. Another proof that it pays to write documentation.

My original idea was… SCREEN CAPTURE!

I’ve scavenged Google for a DOS screen grabber that can produce text files and is not a TSR, found one, bundled it with Lithium, and wrote some duct-tape code that invokes the compiled program and the screen grabber in turn and then parses the output. With that, I can finally have tests that check whether (+ 3 4) is really 7.

And now let me go refactor it…

Things I wish Git had: Commit groups

2021-07-01T00:00:00Z

Intro

Everyone ¹ and their dog ² loves Git. I know I do. It works, it’s efficient, it has a brilliant data model, and it sports every feature under the sun. In 13 years of using it, I’ve never found myself needing a feature it didn’t have. Until recently.

But before I tell you about it, let’s talk about GitHub.

There are three groups of GitHub users, distinguished by how they prefer to merge pull requests:

Merge commit, squash, or rebase? There’s no single best answer to that question. A number of factors are at play in choosing the merge strategy: the type of the project, the size, workflow and preferences of the team, business considerations, and so on. You probably have your own preference if you’ve used GitHub to collaborate with a team.

I’ll talk for a while about the pros and cons of each approach. But first, let’s establish a setting. Imagine that your project has a main branch, from which a feature branch was created off at one point. Since then, both branches have seen developments, and now after feature has undergone reviews and testing, it’s ready to be merged back to main:

Create a merge commit

Merge commits are the original answer that Git has to combining changes. A merge commit has two or more parents and brings in all the changes from them and their ancestors:

In this example, Git has created a new commit, number 9, that merges commits 6 and 8. The branch main now points to that new commit, and so contains all changes in the range 1–8.

Merge commits are extremely versatile and scale well, especially for complicated workflows with multiple maintainers, each responsible for different part of the code; for example, they’re pervasively used by the Linux kernel developers. However, for small, agile teams (especially in the business context), they can be overkill and pose potential problems.

In such a team, you typically have one eternal branch, from which production releases are made, and to which people merge changes from short-lived feature branches. In such a setting, it’s hard to tell how the history of a project has progressed. GitFlow, a popular way of working with Git, advocates merge commits everywhere, and people are struggling with it.

I’ll refer you to the visual argument from that last post:

Setting aside the fact that this history is littered with merge commits, the author makes a point that with this kind of an entangled graph, it’s practically impossible to find anything in it. Whether that’s true or not I’ll leave for you to decide, but there’s definitely a case for linear history there.

There’s another, oft-overlooked quirk here. Quick: look again at the second image above, the one with merge commit number 9. Can you tell, from the image alone, which commit was the tip of main before the merge happened? Surely it must be 8, because it’s on the gray line, right?

Yeah: on the image. But when you look at the merge commit itself, it’s not that obvious. Under the hood, all the commit really says is:

Merge: 8 6

So it tells you that these two parents have been merged together, but it doesn’t tell you which one used to be main. You might guess 8, because it’s the leftmost one, but you don’t know for sure. (Remember, branches in Git are just pointers to commits.) The only way (that I know of) to be sure is to use the reflog, but that is ephemeral: Git occassionally prunes old entries from reflogs.

So this prevents you from being able to confidently answer questions such as: “which features were released over the given time period?”, or “what was the state of main as of a given date?”.

That’s also why you can’t git revert a merge commit—that is, unless you tell Git which of the parent commits you want to keep and which to discard.

Squash and merge

In the merge commit-based approach, we don’t rewrite history: once a commit is made, it stays; repository only grows by accretion. In contrast, the other two approaches use Git’s facilities for rewriting history. As we’ll see, the fundamentals are the same: where they differ is commit granularity.

Coming back to our example: when squashing, we mash together the changes introduced by commits 4, 5, and 6 into a single commit (“S”), and then replay that commit on top of main.

The feature branch is still there, but I didn’t include it on this picture because it’s no longer relevant—it typically gets deleted upon merge (which, as we will see, might not actually be a good idea).

There’s a lot to like about this approach, and some teams advocate for it. The biggest and most obvious benefit is likely that the history becomes very legible. It’s linear and there’s a one-to-one correspondence between commits on main and pull requests (and, mostly, either features or bugfixes). Such a history can be of great help in project management: it becomes very easy to answer the questions which were nigh impossible to answer in the merge-commit approach.

Rebase and merge

This situation is similar to the previous one, except that we don’t squash commits 4–6 together. Instead, we directly replay them on top of main.

Let me start with a long digression. You might guess, from the GitHub screenshot at the top of this post, that I’m in this camp, and you’d be right. In fact, I used to squash and merge feature branches, but I switched to the rebase-and-merge approach after introducing probably the single biggest improvement to the quality of my work over recent years:

I started writing meaningful commit messages.

In the not-too-distant past, my commit messages used to be one-liners, as evidenced, for example, in the history of Skyscraper. These first lines haven’t changed much, but now I strive to augment them with explanation of why the change is being made. When it fixes a bug, I explain what was causing it and how the change makes the bug go away; when it implements a feature, I highlight the specifics of the implementation. I might not write more code these days, but I certainly write more prose: it’s not uncommon for me to write two or three paragraphs about a +1/−1 change.

So my commit messages now look like this (I’m taking a recent random example from the Fy! app’s repo):

app/tests: allow to mock config

Tests expected the code-push events to fire, but now that I’ve
disabled CP in dev, and the tests are built with the dev aero profile,
they’d fail.

This could have been fixed by building them with AERO_PROFILE=staging
in CI, but it doesn’t feel right: I think tests shouldn’t depend on
varying configuration. If a test requires a given bit of configuration
to be present, it’s better to configure it that way explicitly.

Hence this commit. It adds a wrap-config mock and a corresponding
:extra-config fixture, which, when present (and it is by default),
will merge the value onto generated-config.

I’m very conscious about having a clean history. I’m aiming for each commit to be small (with the threshold at approximately +20/−20 LOCs) and introduce a coherent, logical change.

That’s not to say I always develop that way, of course. If you looked at a git log of my work-in-progress branch, chances are you’d see something like this:

5d64b71 wip
392b1e0 wip
0a3ad89 more wip
3db02d3 wip

But before declaring the PR ready to review, I’ll throw this history away (by git reset --mixed $(git merge-base feature main)) and re-commit the changes, dividing them into logical units and writing the rationales, bit by bit.

The net result of rigorously applying this practice is that

you can do git annotate anywhere, and learn about why any line of code in the codebase is the way it is.

I can’t emphasize enough how huge, huge impact for the developer’s wellbeing this has. These commits messages, when I read them back weeks or months later, working on something different but related, almost read as little love letters from me-in-the-past to me-now. They reduce the all-important WTFs/minute metric to zero.

They’re also an aid in reviewing code. My PR notes usually say “please read each commit in isolation.” I’ve found it easier to follow a PR when it tries to tell a story, and each commit is a milestone down that road.

Ending the digression: can you see why I prefer rebase-and-merge over squash-and-merge? Because, all the benefits notwithstanding, squashing irrevocably loses context.

Now, instead of each line being a result of a small, +20/−20 change, you can only tell that it’s part of a set of such changes — maybe ten of them, maybe fifty. You don’t know. Sure you can go look in the original branch, but it’s an overhead, and what if it’s been deleted?

So yeah. Having those love letters all in place, each carefully placed and not glued to others, is just too much of a boon to let go. But it’s not to say that rebasing-and-merging is without downsides.

For example, it’s again hard to tell how many features were deployed over a given period of time. More troublesomely, it’s harder to revert changes: typically you want to operate on a feature level there. With squash-and-merge, it takes one git revert to revert a buggy feature. With rebase-and-merge, you need to know the range.

Worse yet: it’s more likely for a squashed-and-merged commit to be cleanly undone (or cherry-picked) than for a series of small commits. (I sometimes deliberately commit wrong or half-baked approaches that are changed in subsequent commits, just to tell the story more convincingly, and it’s possible that each of these changes individually causes trouble but that they cancel each other in squash.)

So I’m not completely happy with either of the three approaches. Which finally brings me to my preferred fourth approach, one that Git (yet?) doesn’t allow for:

Rebase, group and merge

You know the “group” facility of vector graphics programs? You draw a couple of shapes, you group them together, and then you can apply transformations to the entire group at once, operating on it as if it were an atomic thing. But when need arises, you can “ungroup” it and look deeper.

That’s because sometimes there’s a need to have a “high-level” view of things, and sometimes you need to delve deeper. Each of these needs is valid. Each is prompted by different circumstances that we all encounter.

I’d love to see that same idea applied to Git commits. In Git, a commit group might just be a named and annotated range of commits: feature-a might be the same as 5d64b71..3db02d3. Every Git command that currently accepts commit ranges could accept group names. I envision groups to have descriptions, so that git log, git blame, etc could take --grouped or --ungrouped options and act appropriately.

Obviously, details would need to be fleshed out (can groups overlap? can groups be part of other groups?), and I’m not that familiar with Git innards to say with confidence that it’s doable. But the more I think about it, the more sound the idea seems to me.

I think creating a group when doing a rebase-and-merge could bring together the best of all three worlds, so that we can have all our cakes and eat them too.

¹ Well, almost everyone.

² It’s Dog Day here in Poland as I write these words. Happy Dog Day!

I made a website to guess tomorrow’s number of COVID-19 cases, and here’s what happened

2020-11-08T00:00:00Z

Before

It seems so obvious in hindsight. Here in Poland, people have been guessing it ever since the pandemic breakout: in private conversations, in random threads on social media, in comments under governmental information outlets. It seemed a matter of time before someone came up with something like this. In fact, on one Sunday evening in October, I found myself flabbergasted that apparently no one yet has.

I doled out $4 for a domain, koronalotek.pl (can be translated as “coronalotto” or “coronalottery” – occurrences of the name on Twitter date back at least as far as April), and fired up a REPL. A few hours and 250 Clojure LOCs later, the site was up.

I wanted it to be as simple as possible. A form with two fields: “your name” and “how many cases tomorrow?” A top-ten list of today’s winners, sorted by the absolute difference between the guess and the actual number of cases, as reported daily on Twitter by the Polish Ministry of Health. The official number, prominently displayed. And that’s all.

On 17 October, I posted the link on my Facebook and Twitter feeds, and waited. The stream of guesses started to trickle in.

After

It never grew to be more than a stream, but it hasn’t gone completely unnoticed either.

The above plot shows daily number of accepted guesses (i.e., those that were used to generate the next day’s winners) over time – a metric of popularity. Each day’s number means guesses cast in the 24 hours up until 10:30 (Warsaw time) on that day, which is when the official numbers are published by the Ministry of Health.

I’ve been filtering out automated submissions, as well as excess manual submissions by the same IP that seemed to skew the results too much – I’ve arbitrarily set the “excess” threshold at 10. The missing datapoint for 19 October is not a zero, but a N/A: I’ve lost that datapoint due to a glitch. More on this below.

The interest peaked on October 23, with more than a thousand guesses for that day (I think it was reposted by someone with a significant outreach back then), and has been slowly declining since.

I have privately received some feedback. One person has pointed out that they found the site distasteful and that making fun of pandemic tragedies made them uncomfortable. (I empathise; for me it’s not so much making fun as it is a coping mechanism—a way to put distance between my thoughts and the difficult times we’re in and to keep fears at bay.) Some people, however, have thanked me for making them smile when they guessed more or less correctly.

Back to data. Being a data junkie, I looked at what I had been collecting. First things first: how accurate is the collective predictive power of the guessers?

Quite accurate, in fact! Data for this plot has only been slightly preprocessed, by filtering out “unreasonable” guesses that don’t fall within the range [100; 50000].

People have over- and underguesstimated the number of new cases, but not by much. There were only a few occasions where the actual case count didn’t fall within one standard deviation of the mean of guesses (represented by the whiskers around blue bars on the plot). Granted, the daily standard deviation tends to be large (on the order of a few thousand), but still, I’m impressed. A paper on estimating the growth of pandemic based on coronalottery results coming soon to a journal near you! ;-)

Just for the heck of it, I’ve also been looking at individual votes. Specifically, names. Here’s a snapshot of unique guessers’ names sorted by decreasing length, on 23 October. (NSFW warning: expletives ahead!)

Let me translate a few of these for those of you who don’t speak Polish:

1 is “Sasin has fucked over 70 million zlotys for elections that didn’t take place and was never held responsible.” This alludes to the ghost election in Poland from May. This news had gone memetic, going so far as Minister Sasin’s name being ironically used as a dimensionless unit of 70 million (think Avogadro’s number). You’ll discover the same theme in #2, #3, #5, and others.

6 is “CT {Constitutional Tribunal}, you focking botch, stop repressing my abortion”. Just a day before, the Polish constitutional court (whose current legality is disputed at best) has decreed a ban on almost all legal abortion in Poland, giving rise to the biggest street protests in decades.

Not all is political: 4 is “Why study for the exam if we’re not gonna survive until November anyway?”. I hope whoever wrote this is alive and well.

Corollary? Give people a text field, and they’ll use it to express themselves: politically or otherwise.

In fact, I have taken the liberty of chiming in. Shortly after, I altered the thank-you page (which used to just say “thanks for guessing”) to proudly display one of the emblems of the Women’s Strike, along with a link to a crowdfounding campaign for an NGO that supports women needing abortion.

Inside out

I’m not much of a DevOps person, so I deployed it the quick and dirty way, not caring about scalability or performance. The maxim “make it as simple as possible” permeates the setup.

I just started a REPL within a screen session on the tiny Scaleway C1 server that also hosts this blog and some of my other personal stuff. I launched a Jetty server within it, and set up a nginx proxy. And that’s pretty much it. I liberally tinker with the app’s state in “production,” evaluating all kinds of expressions when I feel like it.

Code changes are deployed by git pulling new developments and doing (require 'koronalotek.core :reload) in the REPL.

Someone tried a SQL injection attack. This is doomed to fail because there’s no SQL involved. In fact, there’s no database at all. The entire state is kept in an in-memory atom and periodically synced out to an EDN file. In addition, state is reset and archived daily at the time of announcing winners. (I’ve added the archiving after forgetting it on one occasion – hence the lack of data for 19 October.)

I also don’t yet have a mechanism of automatically pulling in the Ministry of Health’s data. Every morning, I spend two minutes checking if there’s excess automatic votes, removing them if any, and then filling in the blanks:

(new-data! #inst "2020-11-08T10:30+01:00" 24785)

For all the violations of good practices in this setup, it has worked out surprisingly well so far. I’ve resorted to removing automated votes a handful of times, and blacklisting IPs of voting bots in the nginx setup twice, but otherwise it’s been a low-maintenance toy. People seem to be willing to have fun, and I’m just not interfering.

Takeaways

You should call on your country’s authorities to exert pressure on the Polish government to respect women’s choices and stop actively repressing them.
Give people a text field, and they’ll use it to express themselves.
Release early, release often.

Making of “Clojure as a dependency”

2020-05-08T00:00:00Z

In my previous post, “Clojure as a dependency”, I’ve presented the results of some toy research on Clojure version numbers seen in the wild. I’m a big believer in reproducible research, so I’m making available a Git repo that contains code you can run yourself to reproduce these results. This post is an experience report from writing that code.

There are two main components to this project: acquisition and analysis of data (implemented in the namespaces versions.scrape and versions.analyze, respectively). Let’s look at each of these in turn.

Data acquisition

This step uses the GitHub API v3 to:

retrieve the 1000 most popular Clojure repositories (using the Search repositories endpoint and going through all pages of the paginated result);
for each of these repositories, look at its file list (in the master branch) and pick up any files named project.clj or deps.edn in the root directory, using the Contents endpoint);
parse each of these files and extract the list of dependencies.

As hinted by the namespace, I’ve opted to use Skyscraper to orchestrate the process. It would arguably have been simpler to use GitHub’s GraphQL v4 API, but I wanted to showcase Skyscraper’s custom parsing facilities.

There’s no actual HTML scraping going on (all processors use either JSON or Clojure parsers), but Skyscraper is still able to “restructure” the result – traverse the graph endpoint in a manner similar to that of GraphQL – with very little effort. It would have been possible with any other RESTful API. Plus, we get goodies like caching or tree pruning for free.

Most of the code is straightforward, but parsing of project.clj merits some explanation. Some of my initial assumptions proved incorrect, and it’s fun to see how. I initially tried to use clojure.edn, but Leiningen project definitions are not actually EDN – they are Clojure code, which is a superset of EDN. So I had to resort to read-string from core – with *read-eval* bound to nil (otherwise the code would have a Clojure injection vulnerability – think Bobby Tables). Needless to say, some project.cljs turned out to depend on read-eval.

Some projects (I’m looking at you, Closh, Babashka and sci) keep the version number outside of project.clj, in a text file (typically in resources/), and slurp it back into project.clj with a read-eval’d expression:

(defproject closh-sci
  #=(clojure.string/trim
     #=(slurp "resources/CLOSH_VERSION"))
  …)

A trick employed by one project, Metabase, is to dynamically generate JVM options containing a port number at parse time, so that test suites running at the same time don’t clash with each other:

#=(eval (format "-Dmb.jetty.port=%d" (+ 3001 (rand-int 500))))

Finally, it turned out that defproject is not always a first form in project.clj. Some projects, like bridge, only contain a placeholder project.clj with no forms; others, like aleph, first define some constants, and then refer to them in a defproject form. If those constants contain parts of the dependencies list, then those dependencies won’t be processed correctly. Fortunately, not a lot of projects do this, so it doesn’t skew the results much.

Anyway, the end result of the acquisition phase is a sequence of maps describing project definitions. They look like this:

{:name "clojure-koans",
 :full-name "functional-koans/clojure-koans",
 :deps-type :leiningen,
 :page 1,
 :deps {org.clojure/clojure #:mvn{:version "1.10.0"},
        koan-engine #:mvn{:version "0.2.5"}}},
 :profile-deps {:dev {lein-koan #:mvn{:version "0.1.5"}}}

Homogeneity is important: every dependency description has been converted to the cli-tools format, even if it comes from a project.clj.

Data analysis

I’ve long been searching for a way to do exploratory programming in Clojure without turning the code to a tangled mess, portable only along with my computer.

Exploratory (or research) programming is very different from “normal” programming. In the latter, most of the time you typically focus on a coherent project – a program or a library. In contrast, in the former, you spend a lot of time in the REPL, trying all sorts of different things and defing new values derived from already computed ones.

This is very convenient, but it’s extremely easy to get carried away in the REPL and get lost in a sea of defs. If you want to redo your computations from scratch, just about your only option is to take your REPL transcript and re-evaluate the expressions one by one, in the correct order. Cleaning up the code (e.g. deglobalizing) as you go is very difficult.

I’ve found an answer: Plumatic Graph, part of the plumbing library. There are a plethora of uses for it: for example, at Fy, my current workplace, we’re using it to define our test fixtures. But as it turns out, it makes exploratory programming enjoyable.

The bulk of code in versions.analyze consists of a big definition of a graph, with nodes representing computations – things that I’d normally have def’d in a REPL. Consequently, most of these definitions are short and to the point. I also gave the nodes verbose, descriptive, explicit names. Name and conquer. raw-repos is the output from data acquisition, repos is an all-important node containing those raw-repos that were successfully parsed, and most other things depend on it.

It also doesn’t obstruct much the normal REPL research flow. My normal workflow with REPL and Graph is something along the lines of:

(def result (main))
evaluate something using inputs from result
nah, it leads nowhere
evaluate something else
hey, that’s interesting!
add a new node to the graph definition
GOTO 1

Thanks to Graph’s lazy compiler, I can re-evaluate anything at need and have it evaluate only the things needed, and nothing else. Also, because the graph is explicit, it’s fairly easy to visualize it. (Click the image to open it in full-size in another tab.)

Because it’s lazy, it doesn’t hurt to put extra things in there just in case, even when you’re not going to report them. For example, I was curious what things besides a version number people put in dependencies. :exclusions, for sure, but what else? This is the :what-other-things-besides-versions node.

Imagine my surprise when I found :exlusions (sic) in there, which turned out to be a typo in shadow-cljs’ project.clj! I submitted a PR, and Thomas Heller merged it a few days after.

My only gripe with Graph is that it runs somewhat contrary to the current trends in the Clojure community: for example, it doesn’t support namespaced keywords (although there’s an open ticket for that). But on the whole, I’m sold. I’ll definitely be using it in the next piece of research in Clojure, and I’m on a lookout for something similar in pure R. If you know something, do tell me!

Some words on plotting

The plot from previous post has been generated in pure R, using ggplot2 (an extremely versatile API). Clojure generates a CSV with munged data, and then R reads that CSV as a data frame and generates the plot in a few lines.

I’ve briefly played around with clojisr, a bridge between Clojure and R. It was an enlightening experiment, and it would let me avoid the intermediate CSV, but I decided to ditch it for a few reasons:

It pulls in quite a few dependencies (I wanted to keep them down to a minimum), and requires some previous setup on the R side.
I’d much rather write my R as R, since I’m comfortable with it, rather than spend time wondering how it maps to Clojure. This is similar to the SQL story: these days I prefer HugSQL over Korma, unless I have good reasons to choose otherwise.
clojisr opens up a child R process just by requireing a namespace. I’m not a fan of that.

But it’s definitely very promising! I applaud the effort and I’ll keep a close eye on it.

Key takeaways

Skyscraper makes data acquisition bearable, if not fun.
Plumatic Graph makes writing research code in Clojure fun.
ggplot makes plotting data fun.
Clojure makes programming fun. (But you knew that already.)

Clojure as a dependency

2020-05-02T00:00:00Z

I have a shameful confession to make: I have long neglected an open-source library that I maintain, clj-tagsoup.

This would have been less of an issue, but this is my second-most-starred project on GitHub. Granted, I don’t feel a need for it anymore, but apparently people do. I wish I had spent some time reviewing and merging the incoming PRs.

Anyway, I’ve recently been prompted to revive it, and I’m preparing a new release. While on it, I’ve been updating dependencies to their latest versions, and upon seeing a dependency on [org.clojure/clojure "1.2.0"] in project.clj (yes, it’s been neglected for that long), I started wondering: which Clojure to depend on? Actually, should Clojure itself be a dependency at all?

I’ve googled around for best practices, but with no conclusive answer. So I set out to do some research.

TLDR: with Leiningen, add it with :scope "provided"; with cli-tools, you don’t have to, unless you want to be explicit.

Is it possible for a Clojure project to declare no dependency on Clojure at all?

Quite possible, as it turns out. But the details depend on the build tool.

Obviously, this only makes sense for libraries. Or, more broadly, for projects that are not meant to be used standalone, but rather included in other projects (which will have a Clojure dependency of their own).

Leiningen

If you try to create a Leiningen project that has no dependencies:

(defproject foo "0.1.0"
  :dependencies [])

then Leiningen (as of version 2.9.3, but I’d guess older versions behave similarly) won’t allow you to launch a REPL:

$ lein repl
Error: Could not find or load main class clojure.main
Caused by: java.lang.ClassNotFoundException: clojure.main
Subprocess failed (exit code: 1)

But all is not lost: lein jar works just fine (as long as you don’t AOT-compile any namespaces), as does lein install. The resulting library will happily function as a dependency of other projects.

The upside of depending on no particular Clojure version is that you don’t impose it on your consumers. If a library depends on Clojure 1.9.0, but a project that uses it depends on Clojure 1.10.1, then Leiningen will fetch 1.9.0’s pom.xml (it’s smart enough to figure out that the jar itself won’t be needed, as the conflict will always be resolved in favour of the direct dependency), and lein deps :tree will report “possibly confusing dependencies”.

It’s not very useful to have a library that you can’t launch a REPL against, though. So what some people do is declare a dependency on Clojure not in the main :dependencies, but in a profile.

(defproject foo "0.1.0"
  :dependencies []
  :profiles {:dev {:dependencies [[org.clojure/clojure "1.10.1"]]}})

This avoids conflicts and brings back the possibility to launch a REPL. Sometimes, people create multiple profiles for different Clojure versions; Leiningen’s documentation mentions this possibility.

Unfortunately, with this approach it’s still not possible to AOT-compile things or create uberjars with Leiningen. (Putting Clojure in the :provided profile causes building the uberjar to succeed, but the resulting -standalone jar doesn’t actually contain Clojure).

Another option is to add Clojure to the main :dependencies, but with :scope "provided". Per the Maven documentation, this means:

This is much like compile, but indicates you expect the JDK or a container to provide the dependency at runtime. For example, when building a web application for the Java Enterprise Edition, you would set the dependency on the Servlet API and related Java EE APIs to scope provided because the web container provides those classes. This scope is only available on the compilation and test classpath, and is not transitive.

The key are the last words: “not transitive.” If project A depends on a library B that declares a “provided” dependency C, then C won’t be automatically put in A’s dependencies, and A is expected to explicitly declare its own C.

This means that it’s adequate for both libraries and standalone projects when it comes to declaring a Clojure dependency. It doesn’t break anything, doesn’t cause any ephemeral conflicts, and can be combined with the profiles approach when multiple configurations are called for.

cli-tools

cli-tools will accept a deps.edn as simple as {}. Even passing -Srepro to clojure or clj (which excludes the Clojure dependency that you probably have in your ~/.clojure/deps.edn) doesn’t break anything: cli-tools will just use 1.10.1 (at least as of version 1.10.1.536).

With cli-tools, as a library author you probably don’t have to declare a Clojure dependency at all. But things are less uniform in this land than they are in Leiningen (for example, there are quite a few uberjarrers to choose from), so it’s reasonable to check with your tooling first.

Boot

I’m no longer a Boot user, so I can’t tell. But from what I know, it uses Aether just like Leiningen and Maven do, so I’d wager a guess the same caveats apply as for Leiningen. Haven’t checked, though.

So what do the existing projects do?

I figured it would be a fun piece of research to examine how the popular projects depend (or don’t depend) on Clojure. I queried GitHub’s API for the 1000 most starred Clojure projects, fetched and parsed their project.cljs and/or deps.edns, and tallied things up.

I’ll write a separate “making of” post, because it turned out to be an even more fun weekend project than I had anticipated. But for now, let me share the conclusions.

I ended up with 968 project definition files that I was able to successfully parse: 140 deps.edns and 828 project.cljs. Here’s a breakdown of Clojure version declared as a “main” dependency (i.e., not in a profile or alias):

N/A means that there’s no dependency on Clojure declared, and “other” is an umbrella for the zoo of alphas, betas and snapshots.

As expected, not depending on Clojure is comparatively more popular in the cli-tools land: almost half (48.6%) of cli-tools projects don’t declare a Clojure dependency, versus 21.5% (174 projects) for Leiningen.

That Leiningen number still seemed quite high to me, so I dug a little deeper. Out of those 174 projects, 100 have Clojure somewhere in their :profiles. The remaining 74 are somewhat of outliers:

some, like Ring or Pedestal, are umbrella projects composed of sub-projects (with the lein-sub plugin) that have actual dependencies themselves;
some, like Klipse or Reagent, are essentially ClojureScript-only;
some, like Overtone, use the lein-tools-deps plugin to store their dependencies in deps.edn while using Leiningen for other tasks.

Finally, the popularity of :scope "provided" is much lower. Only 68 Leiningen projects specify it (8.9% of those that declare any dependencies), and only two deps.edn files do so (re-frame and fulcro – note that re-frame actually has both a project.clj and a deps.edn).

Indenting cond forms

2020-02-10T00:00:00Z

Indentation matters when reading Clojure code. It is the primary visual cue that helps the reader discern the code structure. Most Clojure code seen in the wild conforms to either the community style guide or the proposed simplified rules; the existing editors make it easy to reformat code to match them.

I find both these rulesets to be helpful when reading code. But there’s one corner-case that’s been irking me: cond forms.

cond takes an even number of arguments: alternating test-expression pairs. They are commonly put next to each other, two forms per line.

(cond
  test expr-1
  another-test expr-2
  :else expr-3)

Sometimes, people align the expressions under one another, in a tabular fashion:

(cond
  test         expr-1
  another-test expr-2
  :else        expr-3)

But things get out of hand when either tests or exprs get longer and call for multiple lines themselves. There are several options here, all of them less than ideal.

Tests and expressions next to each other

In other words, keep the above rule. Because we’ll have multiple lines in a form, this tends to make the resulting code axe-shaped:

(cond
  (= (some-function something) expected-value) (do
                                                 (do-this)
                                                 (and-also-do-that))
  (another-predicate something-else) (try
                                       (do-another-thing)
                                       (catch Exception _
                                         (println "Whoops!"))))

This yields code that is indented abnormally far to the right, forcing the reader’s eyeballs to move in two dimensions – even more so if the tabular feel is desired. If both the test and the expression is multi-lined, it just looks plain weird.

Stack all forms vertically, no extra spacing

(cond
  (= (some-function something) expected-value)
  (do
    (do-this)
    (and-also-do-that))
  (another-predicate something-else)
  (try
    (do-another-thing)
    (catch Exception _
      (println "Whoops!"))))

This gets rid of the long lines, but introduces another problem: it’s hard to tell at a glance

where a given test or expression starts or ends;
which tests are paired with which expression;
whether a given line corresponds to a test or an expression, and which one.

Stack all forms vertically, blank lines between test/expr pairs

(cond
  (= (some-function something) expected-value)
  (do
    (do-this)
    (and-also-do-that))

  (another-predicate something-else)
  (try
    (do-another-thing)
    (catch Exception _
      (println "Whoops!"))))

The Style Guide says that this is an “ok-ish” thing to do.

But with the added blank lines, logical structure of the code is much more apparent. However, it breaks another assumption that I make when reading the code: functions contain no blank lines. The Style Guide even mentions it, saying that cond forms are an acceptable exception.

It is now harder to tell at a glance where the enclosing function starts or ends. And once this assumption is broken once, the brain expects it to be broken again, causing reading disruption across the entire file.

Forms one under another, extra indentation for expressions only

(cond
  (= (some-function something) expected-value)
    (do
      (do-this)
      (and-also-do-that))
  (another-predicate something-else)
    (try
      (do-another-thing)
      (catch Exception _
        (println "Whoops!"))))

I resorted to this several times. The lines are not too long; the visual cues are there; it’s obvious what is the condition, what is the test, and what goes with what.

Except… it’s against the rules. List items stacked vertically should be aligned one under the other. I have to actively fight my Emacs to enforce this formatting, and it will be lost next time I press C-M-q on this form. No good.

Forms one under another, expressions prefixed by `#_=>`

(cond
  (= (some-function something) expected-value)
  #_=> (do
         (do-this)
         (and-also-do-that))
  (another-predicate something-else)
  #_=> (try
         (do-another-thing)
         (catch Exception _
           (println "Whoops!"))))

This one is my own invention: I haven’t seen it anywhere else. But I think it manages to avoid most problems.

#_ is a reader macro that causes the next form to be elided and not seen by the compiler. => is a valid form. Thus, #_=> is effectively whitespace as far as the compiler is concerned, and the indentation rules treat it as yet another symbol (although it technically isn’t one). No tooling is broken, no assumptions are broken, and the #_=> tends to be syntax-highlighted unintrusively so it doesn’t stand out. I tend to read it aloud as “then.”

Meanwhile, in another galaxy

Other Lisps (Scheme and CL) wrap each test/expression pair in an extra pair of parens, thereby avoiding the blending of conditions and expressions when indented one under the other. But I’m still happy Clojure went with fewer parens. As I say, this is a corner case where additional pair of parens would somewhat help, but most of the time I find them less aesthetic and a visual clutter.

Careful with that middleware, Eugene

2020-01-21T00:00:00Z

Prologue

I’ll be releasing version 0.3 of Skyscraper, my Clojure framework for scraping entire sites, in a few days.

More than three years have passed since its last release. During that time, I’ve made a number of attempts at redesigning it to be more robust, more usable, and faster; the last one, resulting in an almost complete rewrite, is now almost ready for public use as I’m ironing out the rough edges, documenting it, and adding tests.

It’s been a long journey and I’ll blog about it someday; but today, I’d like to tell another story: one of a nasty bug I had encountered.

Part One: Wrap, wrap, wrap, wrap

While updating the code of one of my old scrapers to use the API of Skyscraper 0.3, I noticed an odd thing: some of the output records contained scrambled text. Apparently, the character encoding was not recognised properly.

“Weird,” I thought. Skyscraper should be extra careful about honoring the encoding of pages being scraped (declared either in the headers, or the tag). In fact, I remembered having seen it working. What was wrong?

For every page that it downloads, Skyscraper 0.3 caches the HTTP response body along with the headers so that it doesn’t have to be downloaded again; the headers are needed to ensure proper encoding when parsing a cached page. The headers are lower-cased, so that Skyscraper can then call (get all-headers "content-type") to get the encoding declared in headers. If this step is missed, and the server returns the encoding in a header named Content-Type, it won’t be matched. Kaboom!

I looked at the cache, and sure enough, the header names in the cache were not lower-cased, even though they should be. But why?

Maybe I was mistaken, and I had forgotten the lower-casing after all? A glance at the code: no. The lower-casing was there, right around the call to the download function.

Digression: Skyscraper uses clj-http to download pages. clj-http, in turn, uses the middleware pattern: there’s a “bare” request function, and then there are wrapper functions that implement things like redirects, OAuth, exception handling, and what have you. I say “wrapper” because they literally wrap the bare function: (wrap-something request) returns another function that acts just like request, but with added functionality. And that other function can in turn be wrapped with yet another one, and so on.

There’s a default set of middleware wrappers defined by clj-http, and it also provides a macro, with-additional-middleware, which allows you to specify additional wrappers. One such wrapper is wrap-lower-case-headers, which, as the name suggests, causes the response’s header keys to be returned in lower case.

Back to Skyscraper. We’re ready to look at the code now. Can you spot the problem?

(let [request-fn (or (:request-fn options)
                     http/request)]
  (http/with-additional-middleware [http/wrap-lower-case-headers]
    (request-fn req
                success-fn
                error-fn)))

I stared at it for several minutes, did some dirty experiments in the REPL, perused the code of clj-http, until it dawned on me.

See that request-fn? Even though Skyscraper uses http/request by default, you can override it in the options to supply your own way of doing HTTP. (Some of the tests use it to mock calls to a HTTP server.) In this particular case, it was not overridden, though: the usual http/request was used. So things looked good: within the body of http/with-additional-middleware, headers should be lower-cased because request-fn is http/request.

Or is it?

Let me show you how with-additional-middleware is implemented. It expands to another macro, with-middleware, which is defined as follows (docstring redacted):

(defmacro with-middleware
  [middleware & body]
  `(let [m# ~middleware]
     (binding [*current-middleware* m#
               clj-http.client/request (reduce #(%2 %1)
                                               clj-http.core/request
                                               m#)]
       ~@body)))

That’s right: with-middleware works by dynamically rebinding http/request. Which means the request-fn I was calling is not actually the wrapped version, but the one captured by the outer let, the one that wasn’t rebound, the one without the additional middleware!

After this light-bulb moment, I moved with-additional-middleware outside of the let:

(http/with-additional-middleware [http/wrap-lower-case-headers]
  (let [request-fn (or (:request-fn options)
                       http/request)]
    (request-fn req
                success-fn
                error-fn)))

And, sure enough, it worked.

Part Two: The tests are screaming loud

Is it the end of the story? I’m guessing you’re thinking it is. I thought so too. But I wanted to add one last thing: a regression test, so I’d never run into the same problem in the future.

I whipped up a test in which one ISO-8859-2-encoded page was scraped, and a check for the correct string was made. I ran it against the fixed code. It was green. I ran it against the previous, broken version…

It was green, too.

At this point, I knew I had to get to the bottom of this.

Back to experimenting. After a while, I found out that extracting encoding from a freshly-downloaded page actually worked fine! It only failed when parsing headers fetched from a cache. But the map was the same in both cases! In both cases, the code was effectively doing

(get {"Content-Type" "text/html; charset=ISO-8859-2"}
     "content-type")

This lookup shouldn’t succeed: in map lookup, string comparison is case-sensitive. And yet, for freshly-downloaded headers, it did succeed!

I checked the type of both maps. One of them was a clojure.lang.PersistentHashMap, as expected. The other one was not. It was actually a clj_http.headers.HeaderMap.

I’ll let the comment of that one speak for itself:

a map implementation that stores both the original (or canonical) key and value for each key/value pair, but performs lookups and other operations using the normalized – this allows a value to be looked up by many similar keys, and not just the exact precise key it was originally stored with.

And so it turned out that the library authors have actually foreseen the need for looking up headers irrespective of case, and provided a helpful means for that. The whole lowercasing business was not needed, after all!

I stripped out the with-additional-middleware altogether, added some code elsewhere to ensure that the header map is a HeaderMap regardless of whether it comes from the cache or not, and they lived happily ever after.

Epilogue

Moral of the story? It’s twofold.

Dynamic rebinding can be dangerous. Having a public API that is implemented in terms of dynamic rebinding, even more so. I’d prefer if clj-http just allowed the custom middleware to be explicitly specified as an argument, thusly:

(http/request req
              :additional-middleware [http/wrap-lower-case-headers])

Know your dependencies. If you have a problem that might be generically addressed by the library you’re using, look deeper. It might be there already.

Thanks to 3Jane for proofreading this article.

Word Champions

2020-01-03T00:00:00Z

This story begins on August 9, 2017, when a friend messaged me on Facebook: “Hey, I’m going to be on a TV talent show this weekend. They’ll be giving me this kind of problems. Any ideas how to prepare?”

He attached a link to this video:

Now, we’re both avid Scrabble players, so we explored some ideas about extracting helpful data out of the Official Polish Scrabble Player’s Dictionary. I launched a Clojure REPL and wrote some throwaway code to generate sample training problems for Krzysztof. The code used a brute-force algorithm, so it was dog slow, but it was a start. It was Wednesday.

I woke up next morning with the problem still in my head. Clearly, I had found myself in a nerd sniping situation.

There was only one obvious way out—to write a full-blown training app so that Krzysztof could practice as if he were in the studio. The clock was ticking: we had two days left.

After work, I started a fresh re-frame project. (I was a recent re-frame convert those days, so I wanted to see how well it could cope with the task at hand.) Late that night, or rather early next morning, the prototype was ready.

It had very messy code. It only worked on Chrome. It failed miserably on mobile. It took ages to load. It had native JS dependencies, notably Material-UI and react-dnd, and for some reason it would not compile with ClojureScript’s advanced optimization turned on; so it weighed in at more than 6 MB, slurping in more than 300 JS files on load.

But it worked.

Krzysztof didn’t win his episode against the other contestants, ending up third, but he completed his challenge successfully. It took him 3 minutes and 42 seconds, out of 5 minutes allotted. The episode aired on 24 October.

Krzysztof said that the problem he ended up solving on the show was way easier than the ones generated by the app: had they been more difficult, the wow factor might have been higher.

Several months later, we met at a Scrabble tournament, and I received a present. I wish I had photographed that bottle of wine, so I could show it here, but I hadn’t.

Meanwhile, the code remained messy and low-priority. But I kept returning to it when I felt like it, fixing up things one at a time. I’ve added difficulty levels, so you can have only one diagram, or three. I’ve made it work on Firefox. I’ve done a major rewrite, restructuring the code in a sane way and removing the JS dependencies other than React. I’ve made advanced compilation work, getting the JS down to 400K. I’ve made it work on mobile devices. I’ve written a puzzle generator in C, which ended up several orders of magnitude faster than the prototype Clojure version (it’s still brute-force, but uses some dirty C tricks to speed things up; I hope to rewrite it in Rust someday).

And now, 2½ years later, I’ve added an English version, with an accompanying set of puzzles (generated from a wordlist taken from this repo), for the English-speaking world to enjoy.

Play Word Champions now!

The code is on GitHub if you’d like to check it out or try hacking on it. It’s small, less than 1KLOC in total, so I think it can be a learning tool for re-frame or ClojureScript.

(This game as featured on the TV shows is called Gridlock. The name “Word Champions” was inspired by the title of Krzysztof’s video on YouTube, literally meaning “Lord of the Words”. There is no pun in the Polish title.)

Re-framing text-mode apps

2019-02-05T00:00:00Z

Intro

“But, you know, many explorers liked to go to places that are unusual. And, it’s only for the fun of it.” – Richard P. Feynman

A couple of nights ago, I hacked together a small Clojure program.

All it does is displays a terminal window with a red rectangle in it. You can use your cursor keys to move it around the window, and space bar to change its colour. It’s fun, but it doesn’t sound very useful, does it?

In this post, I’ll try to convince you that there’s more to this little toy than might at first sight appear. You may want to check out the repo as you go along.

In which an unexpected appearance is made

(I’ve always envied Phil Hagelberg this kind of headlines.)

As you might have guessed from this article’s title, clj-tvision (a working name for the program) is a re-frame app.

For those of you who haven’t heard of re-frame, a word of explanation: it’s a ClojureScripty way of writing React apps, with Redux-like management of application state. If you do know re-frame (shameless plug: we at WorksHub do, and use it a lot: it powers the site you’re looking at right now!), you’ll instantly find yourself at home. However, a few moments later, a thought might dawn upon you, and you might start to feel a little uneasy…

Because I’ve mentioned React and ClojureScript, and yet I’d said earlier that we’re talking a text-mode application here. And I’ve mentioned that it’s written in Clojure. It is, in fact, not using React at all, and it has nothing to do whatsoever with ClojureScript, JavaScript, or the browser.

How is that even possible?

Here’s the catch: re-frame is implemented in .cljc files. So while it’s mostly used in the ClojureScript frontend, it can be used from Clojure. You may know this if you’re testing your events or subscriptions on the JVM.

While it’s mostly – if not hitherto exclusively – used for just that, I wanted to explore whether it could be used to manage state in an actual, non-web app. Text-mode is a great playground for this kind of exploration. Rather than picking a GUI toolkit and concern myself with its intricacies, I chose to just put things on a rectangular sheet of text characters.

(But if you are interested in pursuing a React-ish approach for GUIs, check out what Bodil Stokke’s been doing in vgtk.)

Living without the DOM

The building blocks of a re-frame app are subscriptions, events, and views. While the first two work in Clojureland pretty much the same way they do in the browser (although there are differences, of which more anon), views are a different beast.

re-frame’s documentation says that views are “data in, Hiccup out. Hiccup is ClojureScript data structures which represent DOM.” But outside of the browser realm, there’s no DOM. So let’s rephrase that more generally: re-frame views should produce data structures which declaratively describe the component’s appearance to the user. In web apps, those structures correspond to the DOM. What they will look like outside is up to us. We’ll be growing our own DOM-like model, piecemeal, as needs arise.

For clj-tvision, I’ve opted for a very simple thing. Let’s start with a concrete example. Here’s a view:

(defn view []
  [{:type :rectangle, :x1 10, :y1 5, :x2 20, :y2 10, :color :red}])

Unlike in the DOM, in this model the UI state isn’t a tree. It’s a flat sequence of maps that each represent individual “primitive elements”. We could come up with a fancy buzzword-compliant name and call it Component List Model, or CLiM for short, in homage to the venerable GUI toolkit.

Like normal re-frame views, CLiM views can include subviews. An example follows:

(defn square [left top size color]
  [{:type :rectangle,
    :x1 left,
    :y1 top,
    :x2 (+ left size -1),
    :y2 (+ top size -1),
    :color color}])

(defn view []
  [[square 1 1 5 :red]
   [square 9 9 5 :blue]])

How to render a view? Simple. First, flatten the list, performing funcalls on subviews so that you get a sequence containing only primitives. Then, draw each of them in order. (If there is an overlap, the trailers will obscure the leaders. Almost biblical.)

I’ve defined a multimethod, render-primitive, dispatching on :type. Its methods draw the corresponding primitive to a Lanterna screen.

Oh, didn’t I mention Lanterna? It’s a Java library for terminals. Either real ones or emulated in Swing (easier to work with when you’re in a CIDER REPL). Plus, it sports virtual screens which can be blitted to a real terminal. This gives us a rough poor man’s equivalent of React’s VDOM. And it has a Clojure wrapper!

Events at eventide

So now we know how to draw our UI. But an app isn’t made up of just drawing. It has a main loop: it listens to events, which cause the app state to change and the corresponding components to redraw.

re-frame does provide an event mechanism, but it doesn’t define any events per se. So we need to ask ourselves: who calls dispatch? How do events originate? How to write the main loop?

clj-tvision is a proof-of-concept, so it doesn’t concern itself with mouse support. There’s only one way a user can interact with the app: via the keyboard. So keystrokes will be the only “source events”, as it were, for the app; and so writing the event loop should be simple. Sketching pseudocode:

(loop []
  (render-app)
  (let [keystroke (wait-for-key)] ;; blocking!
    (dispatch [:key-pressed keystroke])
    (recur)))

Simple as that, should work, right?

Wrong.

If you actually try that, it’ll somewhat work. Hit right arrow to move the rectangle, nothing happens! Hit right arrow again, it moves. Hit left, it moves right. Hit right, it moves left. Not what you want.

You see, there’s a complication stemming from the fact that re-frame’s events are asynchronous by default. (Hence the dispatch vs. dispatch-sync dichotomy.) They don’t get dispatched immediately; rather, re-frame places them on a queue and processes them asynchronously, so that they don’t hog the browser. The Clojure version of re-frame handles that using a single-threaded executor with a dedicated thread.

We almost could use dispatch-sync everywhere, but for re-frame that’s a no-no: once within a dispatch-sync handler, you cannot dispatch other events. If you try anyway, re-frame will detect it and politely point its dragon-scaly head at you, explaining it doesn’t like it. (It is a benevolent dragon, you know.)

So we need to hook into that “next-tick” machinery of re-frame’s somehow. There are probably better ways of doing this, but I opted to blatantly redefine re-frame.interop/next-tick to tell the main loop: “hey, events have been handled and we have a new state, dispatch an event so we can redraw.” This is one of the rare cases where monkey-patching third-party code with alter-var-root saves you the hassle of forking that entire codebase.

So now we have two sources of events: keystrokes, and next-tick. To multiplex them, I’ve whipped up a channel with core.async. Feels hacky, but allows to add mouse support in the future. Or time-based events that will be fired periodically every so often.

For completeness, I should also add that Clojure-side re-frame doesn’t have the luxury of having reactive atoms provided by Reagent. Its ratoms are ordinary Clojure atoms. Unlike in ClojureScript, any time the app state changes, every subscription in the signal graph will be recomputed. It may well be possible to port Reagent’s ratoms to Clojure, but it is a far more advanced exercise. For simple apps, what re-frame provides on its own might just be enough.

And with that final bit, we can swipe all that hackitude under the carpet… or, should I say, tuck it into an internal ns that hopefully no-one will ever look into. And we’re left with shiny, declarative, re-framey, beautiful UI code on the surface. Just look.

Closing thoughts

“Within C++, there is a much smaller and cleaner language struggling to get out.” – Bjarne Stroustrup

If you’ve ever encountered legacy C++ code, this will ring true. Come to think of it, Stroustrup’s words are true of every system that has grown organically over its lifetime, with features being added to it but hardly ever removed.

And modern webapps may well be the epitome of that kind of system. We now have desktop apps that are fully self-contained on a single machine, yet use an overwhelmingly complex and vast machinery grown out of a simple system originally devised to view static documents over the Internet.

For all that complexity, we continue to use it. Partly owing to its ubiquity, partly for convenience. In my experience, the abstractions provided by re-frame allow you to wrap your head around large apps and reason about them much more easily than, say, object-oriented approaches. It just feels right. Conversely, writing an app in, say, GTK+ would now feel like a setback by some twenty years.

So this toy, this movable rectangle on a black screen, is not so much an app as it is a philosophical exercise. It is what my typing fingers produced while I pondered, weak and weary: “can we throw away most of that cruft, while still enjoying the abstractions that make life so much easier?”

Can we?

This post was originally published on Functional Works.

Happy Programmers’ Day!

2014-09-13T00:00:00Z

Happy Programmers’ Day, everyone!

A feast isn’t a feast, though, until it has a proper way of celebrating it. The Pi Day, for instance, has one: you eat a pie (preferably exactly at 1:59:26.535am), but I haven’t heard of any way of celebrating the Programmers’ Day, so I had to invent one. An obvious way would be to write a program, preferably a non-trivial one, but that requires time and dedication, which not everyone is able to readily spare.

So here’s my idea: on Programmers’ Day, dust off a program that you wrote some time ago — something that is just lying around in some far corner of your hard disk, that you haven’t looked at in years, but that you had fun writing — and put it on GitHub for all the world to see, to share the joy of programming.

Let me initialize the new tradition by doing this myself. Here’s HAZE, the Haskellish Abominable Z-machine Emulator. It was my final assignment for a course in Advanced Functional Programming, in my fourth year at the Uni, way back in 2004. It is an emulator for an ancient kind of virtual machine, the Z-machine, written from scratch in Haskell. It allows you to play text adventure games, such as Zork, much in the vein of Frotz. It’s not very complete, and supports versions of the Z-machine up to 3 only, so newer games won’t run on it as it stands, but Zork is playable.

It probably won’t even compile in modern Haskell systems: it was originally written for GHC version 6.2.1, and extensively uses the FiniteMap data type, which was obsoleted shortly after and is no longer found in modern systems. I should have Linux and Windows binaries lying around (yes, I had compiled it under Windows, using MinGW/PDCurses); I’ll put them on GitHub once I find them.

My mind now wanders ten years back in time, to the days when I was writing it. It took me about three summer weeks to write HAZE from scratch, most of that time on a slow laptop where it took quite a lot of seconds to get GHC to compile even a simple thing. I would do some of it differently if I were doing it now — for one, the state of a ZMachine is a central datatype to HAZE, and you’ll find a lot of functions that take and return ZMachines, so a state monad is an obvious choice; I didn’t understand monads well enough back then. But I still remember how I had the framework in place already and I was adding implementations of Z-code opcodes, one by one, to ZMachine/ZCode/Impl.hs, recompiling, rerunning, getting messages about unimplemented opcodes, when all of a sudden I got the familiar message about a white house and a small mailbox. Freude!

I hope you enjoy looking at it at least half as much as I had enjoyed writing it.

You already use Lisp syntax

2014-05-20T00:00:00Z

Unix Developer: I’m not going to touch Lisp. It’s horrible!

Me: Why so?

UD: The syntax! This illegible prefix-RPN syntax that nobody else uses. And just look at all these parens!

Me: Well, many people find it perfectly legible, although most agree that it takes some time to get accustomed to. But I think you’re mistaken. Lots of people are using Lisp syntax on a daily basis…

UD: I happen to know no one doing this.

Me: …without actually realizing this. In fact, I think you yourself are using it.

UD: Wait, what?!

Me: And the particular variant of Lisp syntax you’re using is called Bourne shell.

UD: Now I don’t understand. What on earth does the shell have to do with Lisp?

Me: Just look: in the shell, you put the name of the program first, followed by the arguments, separated by spaces. In Lisp it’s exactly the same, except that you put an opening paren at the beginning and a closing paren at the end.

Shell: run-something arg1 arg2 arg3

Lisp: (run-something arg1 arg2 arg3)

UD: I still don’t get the analogy.

Me: Then you need a mechanism for expression composition — putting the output of one expression as an input to another. In Lisp, you just nest the lists. And in the shell?

UD: Backticks.

Me: That’s right. Or $(), which has the advantage of being more easily nestable. Let’s try arithmetic. How do you do arithmetic in the shell?

UD: expr. Or the Bash builtin let. For example,

$ let x='2*((10+4)/7)'; echo $x
4

Me: Now wouldn’t it be in line with the spirit of Unix — to have programs do just one thing — if we had one program to do addition, and another to do subtraction, and yet another to do multiplication and division?

It’s trivial to write it in C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv) {
  int mode = -1, cnt = argc - 1, val, i;
  char **args = argv + 1;
  switch (argv[0][strlen(argv[0]) - 1]) {
    case '+': mode = 0; break;
    case '-': mode = 1; break;
    case 'x': mode = 2; break;
    case 'd': mode = 3; break;
  }
  if (mode == -1) {
    fprintf(stderr, "invalid math operation\n");
    return 1;
  }
  if ((mode == 1 || mode == 3) && !cnt) {
    fprintf(stderr, "%s requires at least one arg\n", argv[0]);
    return 1;
  }
  switch (mode) {
    case 0: val = 0; break;
    case 2: val = 1; break;
    default: val = atoi(*args++); cnt--; break;
  }
  while (cnt--) {
    switch (mode) {
      case 0: val += atoi(*args++); break;
      case 1: val -= atoi(*args++); break;
      case 2: val *= atoi(*args++); break;
      case 3: val /= atoi(*args++); break;
    }
  }
  printf("%d\n", val);
  return 0;
}

This dispatches on the last character of its name, so it can be symlinked to +, -, x and d (I picked unusual names for multiplication and division to make them legal and avoid escaping).

Now behold:

$ x 2 $(d $(+ 10 4) 7)
4

UD: Wow, this sure looks a lot like Lisp!

Me: And yet it’s the shell. Our two basic rules — program-name-first and $()-for-composition — allowed us to explicitly specify the order of evaluation, so there was no need to do any fancy parsing beyond what the shell already provides.

UD: So is the shell a Lisp?

Me: Not really. The shell is stringly typed: a program takes textual parameters and produces textual output. To qualify as a Lisp, it would have to have a composite type: a list or a cons cell to build lists on top of. Then, you’d be able to represent code as this data structure, and write programs to transform code to other code.

But the Tao of Lisp lingers in the shell syntax.

I know I’ve glossed over many details here, like the shell syntax for redirection, globbing, subprocesses, the fact that programs have standard input in addition to command-line arguments, pipes, etc. — all these make the analogy rather weak. But I think it’s an interesting way to teach Lisp syntax to people.

DOS debugging quirk

2014-04-06T00:00:00Z

While hacking on Lithium, I’ve noticed an interesting thing. Here’s a sample DOS program in assembly (TASM syntax):

.model tiny
.code
  org 100h

N equ 2

start:
  mov bp,sp
  mov ax,100
  mov [bp-N],ax
  mov cx,[bp-N]
  cmp cx,ax
  jne wrong
  mov dx,offset msg
  jmp disp
wrong:
  mov dx,offset msg2
disp:
  mov ah,9
  int 21h
  mov ax,4c00h
  int 21h

msg db "ok$"
msg2 db "wrong$"
end start

If you assemble, link and then execute it normally, typing prog in the DOS command line, it will output the string “ok”. But if you trace through the program in a debugger instead, it will say “wrong”! What’s wrong?

The problem is in lines 10-11 (instructions 3-4). Here’s what happens when you trace through this program in DOS 6.22’s DEBUG.EXE:

Note how in instruction 3 (actually displayed as the second above) we set the word SS:0xFFFC to 100. When about to execute the following instruction, we would expect that word to continue to hold the value 100, because nothing which could have changed that value has happened in between. Instead, the debugger still reports it as 0x0D8A, as if instruction 3 had not been executed at all — and, interestingly, after actually executing this instruction, CX gets yet another value of 0x7302!

Normally, thinking of DOS .COM programs, you assume a 64KB-long chunk of memory that the program has all to itself: the code starts at 0x100, the stack grows from 0xFFFE downwards (at any given time, the region from SP to 0xFFFE contains data currently on the stack), and all memory in between is free for the program to use however it deems fit. It turns out that, when debugging, it is not the case: the debuggers need to manipulate the region just underneath the program’s stack in order to handle the tracing/breakpoint interrupt traps.

I’ve verified that both DOS’s DEBUG and Borland’s Turbo Debugger 5 do this. The unsafe-to-touch amount of space below SP that they need, however, varies. Manipulating the N constant in the original program, I’ve determined that DEBUG only needs 8 bytes below SP, whereas for TD it is a whopping 18 bytes.

2048: A close look at the source

2014-04-02T00:00:00Z

Dust has now mostly settled down on 2048. Yet, in all the deluge of variants and clones that has swept through Hacker News, little has been written about the experience of modifying the game. As I too have jumped on the 2048-modding bandwagon, it’s time to fill that gap, because, as we shall see, the code more than deserves a close look.

I’ll start with briefly describing my variant. It’s called “words oh so great” (a rather miserable attempt at a pun on “two-oh-four-eight”) and is a consequence of a thought I had, being an avid Scrabble player, after seeing the 3D and 4D versions: “what if we mashed 2048 and Scrabble together?” The answer just lended itself automatically.

Letters instead of number tiles, that was obvious. And you use them to form words. It is unclear how merging tiles should work: merging two identical tiles, as in the original, just wouldn’t make sense here, so drop the concept of merging and make the tiles disappear instead when you form a word. In Scrabble, the minimum length of a word is two, but allowing two-letter words here would mean too many words formed accidentally, so make it at least three. And 16 squares sounds like too tight a space, so increase it to 5x5. And there you have the modified rules.

I cloned the Git repo, downloaded an English word list (EOWL), and set out to work. It took me just over three hours from the initial idea to putting the modified version online and submitting a link to HN. I think three hours is not bad, considering that I’ve significantly changed the game mechanics. And, in my opinion, this is a testimony to the quality of Gabriele Cirulli’s code.

The code follows the MVC pattern, despite not relying on any frameworks or libraries. The model is comprised of the Tile and Grid classes, laying out the universe for the game as well as some basic rules governing it, and the GameManager that implements the game mechanics: how tiles move around, when they can merge together, when the game ends, and so on. It also uses a helper class called LocalStorageManager to keep the score and save it in the browser’s local storage.

The view part is called an “actuator” in 2048 parlance. The HTMLActuator takes the game state and updates the DOM tree accordingly. It also uses a micro-framework for animations. The controller takes the form of a KeyboardInputManager, whose job is to receive keyboard events and translate them to changes of the model.

The GameManager also contains some code to tie it all together — not really a part of the model as in MVC. Despite this slight inconsistency, the separation of concerns is very neatly executed in 2048’s code; I would even go so far as to say that it could be used as a demonstration in teaching MVC to people.

The only gripe I had with the code is that it violates the DRY principle in several places. Specifically, to change the board size to 5x5, I had to modify as many as three places: the HTML (it contains the initial definition for the DOM, including 16 empty divs making up the grid, which is unfortunate — I’d change it to set up the DOM at runtime during initialization); the model (instantiation of GameManager); and the .scss file from which the CSS is generated.

While on this topic, let me add that 2048’s usage of SASS is a prime example of its capabilities. It is very instructive to see how the sizing and positioning of the grid, and also styling for the tiles down to the glow, is done programmatically. I was aware of the existence of SASS before, but never got around to explore it. Now, I’m sold on it.

To sum up: 2048 rocks. And it’s fun to modify. Go try it.

Lithium revisited: A 16-bit kernel (well, sort of) written in Clojure (well, sort of)

2013-05-26T00:00:00Z

Remember Lithium? The x86 assembler written in Clojure, and a simple stripes effect written in it? Well, here’s another take on that effect:

And here is the source code:

(do (init-graph)
    (loop [x 0 y 0]
      (put-pixel x y (let [z (mod (+ (- 319 x) y) 32)]
                       (if (< z 16) (+ 16 z) (+ 16 (- 31 z)))))
      (if (= y 200)
        nil
        (if (= x 319)
          (recur 0 (inc y))
          (recur (inc x) y)))))

I’ve implemented this several months ago, pushed it to Github and development has pretty much stalled since then. And after seeing this recent post on HN today, I’ve decided to give Lithium a little more publicity, in the hope that it will provide a boost of motivation to me. Because what we have here is pretty similar to Rustboot: it’s a 16-bit kernel written in Clojure.

Well, sort of.

After writing a basic assembler capable of building bare binaries of simple x86 real-mode programs, I’ve decided to make it a building block of a larger entity. So I’ve embarked on a project to implement a compiler for a toy Lisp-like language following the paper “An Incremental Approach to Compiler Construction”, doing it in Clojure and making the implemented language similar to Clojure rather than to Scheme.

(Whether it actually can be called Clojure is debatable. It’s unclear what the definition of Clojure the language is. Is running on JVM a part of what makes Clojure Clojure? Or running on any host platform? Is ClojureScript Clojure? What about ClojureCLR, or clojure-py?)

So far I’ve only gotten to step 7 of 24 or so, but that’s already enough to have a working loop/recur implementation, and it was trivial to throw in some graphical mode 13h primitives to be able to implement this effect.

By default I’m running Lithium programs as DOS .COM binaries under DOSBox, but technically, the code doesn’t depend on DOS in any way (it doesn’t ever invoke interrupt 21h) and so it can be combined with a simple bootloader into a kernel runnable on the bare metal.

The obligatory HOWTO on reproducing the effect: install DOSBox and Leiningen, checkout [the code][3], launch a REPL with lein repl, execute the following forms, and enjoy the slowness with which individual pixels are painted:

(require 'lithium.compiler)
(in-ns 'lithium.compiler)
(run! (compile-program "/path/to/lithium/examples/stripes-grey.clj"))

Lithium: an x86 assembler for Clojure

2012-05-14T00:00:00Z

Ah, the golden days of childhood’s hackage. Don’t you have fond memories of them?

I got my first PC when I was 10. It was a 486DX2/66 with 4 megs of RAM and a 170 meg HDD; it ran DOS and had lots of things installed on it, notably Turbo Pascal 6. I hacked a lot in it. These were pre-internet days when knowledge was hard to come by, especially for someone living in a small town in Poland; my main sources were the software I had (TP’s online help was of excellent quality), a couple of books, and a popular computing magazine that published articles on programming. From the latter, I learned how to program the VGA: how to enter mode 13h, draw pixels on screen, wait for vertical retrace, manipulate the palette and how to combine these things into neat effects. One of the very first thing I discovered was when you plot every pixel using sum of its coordinates modulo 40 as color, you get a nice-looking diagonal stripes effect. Because of the initially incomprehensible inline assembly snippets appearing all over the place, I eventually learned x86 assembly, too.

Back to 2012: I’ve long been wanting to hack on something just for pure fun, a side pet project. Writing code for the bare metal is fun because it’s just about as close as you can get to wielding the ultimate power. And yet, since Clojure is so much fun too, I wanted the project to have something to do with Clojure.

So here’s Lithium, an x86 16-bit assembler written in pure Clojure and capable of assembling a binary version of the stripes effect.

To try it, clone the git repo to your Linux or OS X machine, install DOSBox, launch a REPL with Leiningen, change to the lithium namespace and say:

(run! "/home/you/lithium/src/stripes.li.clj")

FAQ

(Well, this is not really a FAQ since nobody actually asked me any questions about Lithium yet. This is more in anticipation of questions that may arise.)

What’s the importance of this?

None whatsoever. It’s just for fun.

How complete is it?

Very incomplete. To even call it pre-pre-alpha would be an exaggeration. It’s currently little more than pure minimum required to assemble stripes.li.clj. Output format wise, it only produces bare binaries (similar to DOS .COMs), and that’s unlikely to change anytime soon.

Do you intend to continue developing it?

Absolutely. I will try to make it more complete, add 32- and possibly 64-bit modes, see how to add a macro system (since the input is s-expressions, it should be easy to produce Clojure macros to write assembly), write something nontrivial in it, and see how it can be used as a backend for some higher-level language compiler (I’m not sure yet which language that will turn out to be).

How to call a private function in Clojure

2012-04-25T00:00:00Z

tl;dr: Don’t do it. If you really have to, use (#'other-library/private-function args).

A private function in Clojure is one that has been defined using the defn- macro, or equivalently by setting the metadata key :private to true on the var that holds the function. It is normally not allowed in Clojure to call such functions from outside of the namespace where they have been defined. Trying to do so results in an IllegalStateException stating that the var is not public.

It is possible to circumvent this and call the private function, but it is not recommended. That the author of the library decided to make a function private probably means that he considers it to be an implementation detail, subject to change at any time, and that you should not rely on it being there. If you think it would be useful to have this functionality available as part of the public API, your best bet is to contact the library author and consult the change, so that it may be included officially in a future version.

Contacting the author, however, is not always feasible: she may not be available or you might be in haste. In this case, several workarounds are available. The simplest is to use (#'other-library/private-function args), which works in Clojure 1.2.1 and 1.3.0 (it probably works in other versions of Clojure as well, but I haven’t checked that).

Why does this work? When the Clojure compiler encounters a form (sym args), it invokes analyzeSeq on that form. If its first element is a symbol, it proceeds to analyze that symbol. One of the first operation in that analysis is checking if it names an inline function, by calling isInline. That function looks into the metadata of the Var named by the symbol in question. If it’s not public, it throws an exception.

On the other hand, #' is the reader macro for var. So our workaround is equivalent to ((var other-library/private-function) args). In this case, the first element of the form is not a symbol, but a form that evaluates to a var. The compiler is not able to check for this so it does not insert a check for privateness. So the code compiles to calling a Var object.

Here’s the catch: Vars are callable, just like functions. They implement IFn. When a var is called, it delegates the call to the IFn object it is holding. This has been recently discussed on the Clojure group. Since that delegation does not check for the var’s privateness either, the net effect is that we are able to call a private function this way.

Lifehacking: How to get cheap home equipment using Clojure

2012-04-12T00:00:00Z

I’ve moved to London last September. Like many new Londoners, I have changed accommodation fairly quickly, being already after one removal and with another looming in a couple of months; my current flat was largely unfurnished when I moved in, so I had to buy some basic homeware. I didn’t want to invest much in it, since it’d be only for a few months. Luckily, it is not hard to do that cheaply: many people are moving out and getting rid of their stuff, so quite often you can search for the desired item on Gumtree and find there’s a cheap one a short bike ride away.

Except when there isn’t. In this case, it’s worthwhile to check again within a few days as new items are constantly being posted. Being lazy, I’ve decided to automate this. A few hours and a hundred lines of Clojure later, gumtree-scraper was born.

I’ve packaged it using lein uberjar into a standalone jar, which, when run, produces a gumtree.rss that is included in my Google Reader subscriptions. This way, whenever something I’m interested in appears, I get notified within an hour or so.

It’s driven by a Google spreadsheet. I’ve created a sheet that has three columns: item name, minimum price, maximum price; then I’ve made it available to anyone who knows the URL. This way I can edit it pretty much from everywhere without touching the script. Each time the script is run (by cron), it downloads that spreadsheet as a CSV that looks like this:

hand blender,,5
bike rack,,15

For each row the script queries Gumtree’s category “For Sale” within London given the price range, gets each result and transforms it to a RSS entry.

Gumtree has no API, so I’m using screenscraping to retrieve all the data. Because the structure of the pages is much simpler, I’m actually scraping the mobile version; a technical twist here is that the mobile version is only served to actual browsers so I’m supplying a custom User-Agent, pretending to be Safari. For actual scraping, the code uses Enlive; it works out nicely.

About half of the code is RSS generation — mostly XML emitting. I’d use clojure.xml/emit but it’s known to produce malformed XML at times, so I include a variant that should work.

In case anyone wants to tries it out, be aware that the location and category are hardcoded in the search URL template; if you want, change the template line in get-page. The controller spreadsheet URL is not, however, hardcoded; it’s built up using the spreadsheet.key system property. Here’s the wrapper script I use that is actually run by cron:

#!/bin/bash
if [ "`ps ax | grep java | grep gumtree`" ]; then
  echo "already running, exiting"
  exit 0
fi
cd "`dirname $0`"
java -Dspreadsheet.key=MY_SECRET_KEY -jar $HOME/gumtree/gumtree.jar
cp $HOME/gumtree/gumtree.rss $HOME/public_html

Now let me remove that entry for a blender — I’ve bought one yesterday for £4…

Ever wanted to programmatically file a lawsuit? In Poland, you can.

2012-03-21T00:00:00Z

This has somehow escaped me: just over a year ago, the Sixth Civil Division of the Lublin-West Regional Court in Lublin, Poland, has opened its online branch. It serves the entire territory of Poland and is competent to recognize lawsuits concerning payment claims. There is basic information available in English. It has proven immensely popular, having processed about two million cases in its first year of operation.

And the really cool thing is, they have an API.

It’s SOAP-based and has a publicly available spec. (Due to the way their web site is constructed, I cannot link to the spec directly; this last link leads to a collection of files related to the web service. The spec is called EpuWS_ver.1.14.1.pdf; it’s in Polish only, but it should be easy to run it through Google Translate.) There are a couple of XML schemas as well, plus the spec contains links to a WSDL and some code samples (in C#) at the end.

To actually use the API, you need to get yourself an account of the appropriate type (there are two types corresponding to two groups of methods one can use: that of a bailiff and of a mass plaintiff). You then log on to the system, where you can create an API key that is later used for authentication. They throttle the speed down to 1 req/s per user to mitigate DoS attacks.

The methods include FileLawsuits, FileComplaints, SupplyDocuments, GetCaseHistory and so on (the actual names are in Polish). To give you an example, the FileLawsuits method returns a structure that consists of, inter alia, the amount of court fee to pay, the value of the matter of dispute (both broken down into individual lawsuits), and a status code with a description.

iOS app, anyone?

Combining virtual sequences
or, Sequential Fun with Macros
or, How to Implement Clojure-Like Pseudo-Sequences with Poor Man’s Laziness in a Predominantly Imperative Language

2011-12-09T00:00:00Z

Sequences and iteration

There are a number of motivations for this post. One stems from my extensive exposure to Clojure over the past few years: this was, and still is, my primary programming language for everyday work. Soon, I realized that much of the power of Clojure comes from a sequence abstraction being one of its central concepts, and a standard library that contains many sequence-manipulating functions. It turns out that by combining them it is possible to solve a wide range of problems in a concise, high-level way. In contrast, it pays to think in terms of whole sequences, rather than individual elements.

Another motivation comes from a classical piece of functional programming humour, [The Evolution of a Haskell Programmer][1]. If you don’t know it, go check it out: it consists of several Haskell implementations of factorial, starting out from a straightforward recursive definition, passing through absolutely hilarious versions involving category-theoretical concepts, and finally arriving at this simple version that is considered most idiomatic:

fac n = product [1..n]

This is very Clojure-like in that it involves a sequence (a list comprehension). In Clojure, this could be implemented as

(defn fac [n]
  (reduce * 1 (range 1 (inc n)))

Now, I thought to myself, how would I write factorial in an imperative language? Say, Pascal?

function fac(n : integer) : integer;
var
  i, res : integer;
begin
  res := 1;
  for i := 1 to n do
    res := res * i;
  fac := res;
end;

This is very different from the functional version that works with sequences. It is much more elaborate, introducing an explicit loop. On the other hand, it’s memory efficient: it’s clear that its memory requirements are O(1), whereas a naïve implementation of a sequence would need O(n) to construct it all in memory and then reduce it down to a single value.

Or is it really that different? Think of the changing values of i in that loop. On first iteration it is 1, on second iteration it’s 2, and so on up to n. Therefore, one can really think of a for loop as a sequence! I call it a “virtual” sequence, since it is not an actual data structure; it’s just a snippet of code.

To rephrase it as a definition: a virtual sequence is a snippet of code that (presumably repeatedly) yields the member values.

Let’s write some code!

To illustrate it, throughout the remainder of this article I will be using Common Lisp, for the following reasons:

It allows for imperative style, including GOTO-like statements. This will enable us to generate very low-level code.
Thanks to macros, we will be able to obtain interesting transformations.

Okay, so let’s have a look at how to generate a one-element sequence. Simple enough:

(defmacro vsingle (x)
 `(yield ,x))

The name VSINGLE stands for “Virtual sequence that just yields a SINGLE element”. (In general, I will try to define virtual sequences named and performing similarly to their Clojure counterparts here; whenever there is a name clash with an already existing CL function, the name will be prefixed with V.) We will not concern ourselves with the actual definition of YIELD at the moment; for debugging, we can define it just as printing the value to the standard output.

(defun yield (x)
  (format t "~A~%" x))

We can also convert a Lisp list to a virtual sequence which just yields each element of the list in turn:

(defmacro vseq (list)
  `(loop for x in ,list do (yield x)))

(defmacro vlist (&rest elems)
  `(vseq (list ,@elems)))

Now let’s try to define RANGE. We could use loop, but for the sake of example, let’s pretend that it doesn’t exist and write a macro that expands to low-level GOTO-ridden code. For those of you who are not familiar with Common Lisp, GO is like GOTO, except it takes a label that should be established within a TAGBODY container.

(defmacro range (start &optional end (step 1))
  (unless end
    (setf end start start 0))
  (let ((fv (gensym)))
    `(let ((,fv ,start))
       (tagbody
        loop
          (when (>= ,fv ,end)
            (go out))
          (yield ,fv)
          (incf ,fv ,step)
          (go loop)
       out))))

Infinite virtual sequences are also possible. After all, there’s nothing preventing us from considering a snippet of code that loops infinitely, executing YIELD, as a virtual sequence! We will define the equivalent of Clojure’s iterate: given a function fun and initial value val, it will repeatedly generate val, (fun val), (fun (fun val)), etc.

(defmacro iterate (fun val)
  (let ((fv (gensym)))
    `(let ((,fv ,val))
       (tagbody loop
          (yield ,fv)
          (setf ,fv (funcall ,fun ,fv))
          (go loop)))))

So far, we have defined a number of ways to create virtual sequences. Now let’s ask ourselves: is there a way, given code for a virtual sequence, to yield only the elements from the original that satisfy a certain predicate? In other words, can we define a filter for virtual sequences? Sure enough. Just replace every occurrence of yield with code that checks whether the yielded value satisfies the predicate, and only if it does invokes yield.

First we write a simple code walker that applies some transformation to every yield occurrence in a given snippet:

(defun replace-yield (tree replace)
  (if (consp tree)
      (if (eql (car tree) 'yield)
          (funcall replace (cadr tree))
          (loop for x in tree collect (replace-yield x replace)))
      tree))

We can now write filter like this:

(defmacro filter (pred vseq &environment env)
  (replace-yield (macroexpand vseq env)
                 (lambda (x) `(when (funcall ,pred ,x) (yield ,x)))))

It is important to point out that since filter is a macro, the arguments are passed to it unevaluated, so if vseq is a virtual sequence definition like (range 10), we need to macroexpand it before replacing yield.

We can now verify that (filter #'evenp (range 10)) works. It macroexpands to something similar to

(LET ((#:G70192 0))
  (TAGBODY
    LOOP (IF (>= #:G70192 10)
           (PROGN (GO OUT)))
         (IF (FUNCALL #'EVENP #:G70192)
           (PROGN (YIELD #:G70192)))
         (SETQ #:G70192 (+ #:G70192 1))
         (GO LOOP)
    OUT))

concat is extremely simple. To produce all elements of vseq1 followed by all elements of vseq2, just execute code corresponding to vseq1 and then code corresponding to vseq2. Or, for multiple sequences:

(defmacro concat (&rest vseqs)
  `(progn ,@vseqs))

To define take, we’ll need to wrap the original code in a block that can be escaped from by means of return-from (which is just another form of goto). We’ll add a counter that will start from n and keep decreasing on each yield; once it reaches zero, we escape the block:

(defmacro take (n vseq &environment env)
  (let ((x (gensym))
        (b (gensym)))
    `(let ((,x ,n))
       (block ,b
         ,(replace-yield (macroexpand vseq env)
                         (lambda (y) `(progn (yield ,y)
                                             (decf ,x)
                                             (when (zerop ,x)
                                               (return-from ,b)))))))))

rest (or, rather, vrest, as that name is taken) can be defined similarly:

(defmacro vrest (vseq &environment env)
  (let ((skipped (gensym)))
    (replace-yield
     `(let ((,skipped nil)) ,(macroexpand vseq env))
     (lambda (x) `(if ,skipped (yield ,x) (setf ,skipped t))))))

vfirst is another matter. It should return a value instead of producing a virtual sequence, so we need to actually execute the code — but with yield bound to something else. We want to establish a block as with take, but our yield will immediately return from the block once the first value is yielded:

(defmacro vfirst (vseq)
  (let ((block-name (gensym)))
   `(block ,block-name
      (flet ((yield (x) (return-from ,block-name x)))
        ,vseq))))

Note that so far we’ve seen three classes of macros:

macros that create virtual sequences;
macros that transform virtual sequences to another virtual sequences;
and finally, vfirst is our first example of a macro that produces a result out of a virtual sequence.

Our next logical step is vreduce. Again, we’ll produce code that rebinds yield: this time to a function that replaces the value of a variable (the accumulator) by result of calling a function on the accumulator’s old value and the value being yielded.

(defmacro vreduce (f val vseq)
  `(let ((accu ,val))
     (flet ((yield (x) (setf accu (funcall ,f accu x))))
       ,vseq
       accu)))

We can now build a constructs that executes a virtual sequence and wraps the results up as a Lisp list, in terms of vreduce.

(defun conj (x y)
  (cons y x))

(defmacro realize (vseq)
 `(nreverse (vreduce #'conj nil ,vseq)))

Let’s verify that it works:

CL-USER> (realize (range 10))
(0 1 2 3 4 5 6 7 8 9)

CL-USER> (realize (take 5 (filter #'oddp (iterate #'1+ 0))))
(1 3 5 7 9)

Hey! Did we just manipulate an infinite sequence and got the result in a finite amount of time? And that without explicit support for laziness in our language? How cool is that?!

Anyway, let’s finally define our factorial:

(defun fac (n)
  (vreduce #'* 1 (range 1 (1+ n))))

Benchmarking

Factorials grow too fast, so for the purpose of benchmarking let’s write a function that adds numbers from 0 below n, in sequence-y style. First using Common Lisp builtins:

(defun sum-below (n)
  (reduce #'+ (loop for i from 0 below n collect i) :initial-value 0))

And now with our virtual sequences:

(defun sum-below-2 (n)
  (vreduce #'+ 0 (range n)))

Let’s try to time the two versions. On my Mac running Clozure CL 1.7, this gives:

CL-USER> (time (sum-below 10000000))
(SUM-BELOW 10000000) took 8,545,512 microseconds (8.545512 seconds) to run
                    with 2 available CPU cores.
During that period, 2,367,207 microseconds (2.367207 seconds) were spent in user mode
                    270,481 microseconds (0.270481 seconds) were spent in system mode
5,906,274 microseconds (5.906274 seconds) was spent in GC.
 160,000,016 bytes of memory allocated.
 39,479 minor page faults, 1,359 major page faults, 0 swaps.
49999995000000

CL-USER> (time (sum-below-2 10000000))
(SUM-BELOW-2 10000000) took 123,081 microseconds (0.123081 seconds) to run
                    with 2 available CPU cores.
During that period, 127,632 microseconds (0.127632 seconds) were spent in user mode
                    666 microseconds (0.000666 seconds) were spent in system mode
 4 minor page faults, 0 major page faults, 0 swaps.
49999995000000

As expected, SUM-BELOW-2 is much faster, causes less page faults and presumably conses less. (Critics will be quick to point out that we could idiomatically write it using LOOP’s SUM/SUMMING clause, which would probably be yet faster, and I agree; yet if we were reducing by something other than + — something that LOOP has not built in as a clause — this would not be an option.)

Conclusion

We have seen how snippets of code can be viewed as sequences and how to combine them to produce other virtual sequences. As we are nearing the end of this article, it is perhaps fitting to ask: what are the limitations and drawbacks of this approach?

Clearly, this kind of sequences is less powerful than “ordinary” sequences such as Clojure’s. The fact that we’ve built them on macros means that once we escape the world of code transformation by invoking some macro of the third class, we can’t manipulate them anymore. In Clojure world, first and rest are very similar; in virtual sequences, they are altogether different: they belong to different worlds. The same goes for map (had we defined one) and reduce.

But imagine that instead of having just one programming language, we have a high-level language A in which we are writing macros that expand to code in a low-level language B. It is important to point out that the generated code is very low-level. It could almost be assembly: in fact, most of the macros we’ve written don’t even require language B to have composite data-types beyond the type of elements of collections (which could be simple integers)!

Is there a practical side to this? I don’t know: to me it just seems to be something with hack value. Time will tell if I can put it to good use.

Color your own Europe with Clojure!

2011-07-11T00:00:00Z

This is a slightly edited translation of an article I first published on my Polish blog on January 19, 2011. It is meant to target newcomers to Clojure and show how to use Clojure to solve a simple real-life problems.

The problem

Some time ago I was asked to prepare a couple of differently-colored maps of Europe. I got some datasets which mapped countries of Europe to numerical values: the greater the value, the darker the corresponding color should be. A sample colored map looked like this:

I began by downloading an easily editable map from Wikipedia Commons, calculated the required color intensities for the first dataset, launched Inkscape and started coloring. After half an hour of tedious clicking, I realized that I would be better off writing a simple program in Clojure that would generate the map for me. It turned out to be an easy task: the remainder of this article will be an attempt to reconstruct my steps.

SVG

The format of the source image is SVG. I knew it was an XML-based vector graphics format, I’d often encountered images in this format on Wikipedia — but editing it by hand was new to me. Luckily, it turned out that the image has a simple structure. Each country’s envelope curve is described with a path element that looks like this:

<path
   id="pl"
   class="eu europe"
   d="a long list of curve node coordinates" />

An important thing to note here is the id attribute — this is the two-letter ISO-3166-1-ALPHA2 country code. In fact, there is an informative comment right at the beginning of the image that explains the naming conventions used. Having such a splendid input was of great help.

Just like HTML, SVG uses CSS stylesheets to define the look of an element. All that is needed to color Poland red is to style the element with a fill attribute:

<path
   id="pl"
   style="fill: #ff0000;"
   class="eu europe"
   d="a long list of curve node coordinates" />

Now that we know all this, let’s start coding!

XML in Clojure

The basic way to handle XML in Clojure is to use the clojure.xml namespace, which contains functions that parse XML (on a DOM basis, i.e., into an in-memory tree structure) and serialize such structures back into XML. Let us launch a REPL and start by reading our map and parsing it:

> (use 'clojure.xml)
nil
> (def m (parse "/home/nathell/eur/Blank_map_of_Europe.svg"))
[...a long while...]
Unexpected end of file from server
  [Thrown class java.net.SocketException]

Hold on in there! What’s that SocketException doing here? Firefox displays this map properly, so does Chrome, WTF?! Shouldn’t everything work fine in such a great language as Clojure?

Well, the language is as good as its libraries — and when it comes to Clojure, one can stretch that thought further: Clojure libraries are as good as the Java libraries they use under the hood. In this case, we’ve encountered a feature of the standard Java XML parser (from javax.xml package). It is restrictive and tries to reject invalid documents (even if they are well-formed). If the file being parsed contains a DOCTYPE declaration, the Java parser, and hence clojure.xml/parse, tries to download the DTD schema from the given address and validate the document against that schema. This is unfortunate in many aspects, especially from the point of view of the World Wide Web Consortium, since their servers hold the Web standards. One can easily imagine the volume of network traffic this generates: W3C has a blog post about it. Many Java programmers have encountered this problem at some time. There are a few solutions; we will go the simplest way and just manually remove the offending DOCTYPE declaration.

> (def m (parse "/home/nathell/eur/bm.svg"))
#'user/m
> m
[...many screenfuls of numbers...]

This time we managed to parse the image. Viewing the structure is not easy because of its sheer size (as expected: the file weighs in at over 0,5 MB!), but from the very first characters of the REPL’s output we can make out that’s it a Clojure map (no pun intended). Let’s examine its keys:

> (keys m)
(:tag :attrs :content)

So the map contains three entries with descriptive names. :tag contains the name of the XML element, :attrs is a map of attributes for this element, and :content is a vector of its subelements, each in turn being represented by similarly structured map (or a string if it’s a text node):

> (:tag m)
:svg
> (:attrs m)
{:xmlns "http://www.w3.org/2000/svg", :width "680", :height "520", :viewBox "1754 161 9938 7945", :version "1.0", :id "svg2"}
> (count (:content m))
68

Just for the sake of practice, let’s try to write the serialized representation of the parsed back as XML. The function emit should be able to do it, but it prints XML to standard output. We can use the with-out-writer macro from the namespace clojure.contrib.io to dump the XML to a file:

> (use 'clojure.contrib.io)
nil
> (with-out-writer "/tmp/a.svg" (emit m))
nil

We try to view a.svg in Firefox and…

Error parsing XML: not well-formed
Area: file:///tmp/a.xml
Row 15, column 44: Updated to reflect dissolution of Serbia & Montenegro: http://commons.wikimedia.org/wiki/User:Zirland
                 -------------------------------------------^

It turns out that using clojure.xml/emit is not recommended, because it does not handle XML entities in comments correctly; we should use clojure.contrib.lazy-xml instead. For the sake of example, though, let’s stay with emit and manually remove the offending line once again (we can safely do it, since that’s just a comment).

Coloring Poland

We saw earlier that our main XML node contains 68 subnodes. Let’s see what they are — tag names will suffice:

> (map :tag (:content m))
(:title :desc :defs :rect :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :path :g :path :path :g :path :path :path)

So far, so good. Seems that all country descriptions are contained directly in the main node. Let us try to find Poland:

> (count (filter #(and (= (:tag %) :path)
                       (= ((:attrs %) :id) "pl"))
                 (:content m)))
1

(This snippet of code filters the list of subnodes of m to pick only those elements whose tag name is path and value of attribute id is pl, and returns the length of such list.) Let’s try to add a style attribute to that element, according to what we said earlier. Because Clojure data structures are immutable, we have to define a new top-level element which will be the same as m, except that we will set the style of the appropriate subnode:

> (def m2 (assoc m
                :content
                (map #(if (and (= (:tag %) :path)
                               (= ((:attrs %) :id) "pl"))
                        (assoc % :attrs (assoc (:attrs %) :style "fill: #ff0000;"))
                        %)
                     (:content m))))
#'user/m2
> (with-out-writer "/tmp/a.svg" (emit m2))
nil

We open the created file and see a map with Poland colored red. Yay!

Generalization

We will generalize our code a bit. Let us write a function that colors a single state, taking a path element (subnode of svg) as an argument:

(defn color-state
  [{:keys [tag attrs] :as element} colorize-fn]
  (let [state (:id attrs)]
    (if-let [color (colorize-fn state)]
      (assoc element :attrs (assoc attrs :style (str "fill:" color)))
      element)))

This function is similar to the anonymous one we used above in the map call, but differs in some respects. It takes two arguments. As mentioned, the first one is the XML element (destructured into tag and attrs: you can read more about destructuring in the appropriate part of Clojure docs), and the second argument is… a function that should take a two-letter country code and return a HTML color description (or nil, if that country’s color is not specified — color-state will cope with this and return the element unchanged).

Now that we have color-state, we can easily write a higher-level function that processes and writes XML in one step:

(defn save-color-map
  [svg colorize-fn outfile]
  (let [colored-map (assoc svg :content (map #(color-state % colorize-fn) (:content svg)))]
    (with-out-writer out
      (emit colored-map))))

Let’s test it:

> (save-color-map m {"pl" "#00ff00"} "/tmp/a.svg")
nil

This time Poland is green (we used a country→color map as an argument to color-state, since Clojure maps are callable like functions). Let’s try to add blue Germany:

> (save-color-map m {"pl" "#00ff00", "de" "#0000ff"} "/tmp/a.svg")
nil

It works!

Problem with the UK

Inspired by our success, we try to color different countries. It mostly works, but the United Kingdom remains gray, regardless of whether we specify its code as “uk” or “gb”. We resort to the source of our image, and the beginning comment once again proves helpful:

Certain countries are further subdivided the United Kingdom has gb-gbn for Great Britain and gb-nir for Northern Ireland. Russia is divided into ru-kgd for the Kaliningrad Oblast and ru-main for the Main body of Russia. There is the additional grouping #xb for the “British Islands” (the UK with its Crown Dependencies – Jersey, Guernsey and the Isle of Man)

Perhaps we have to specify “gb-gbn” and “gb-nir”, instead of just “gb”? We try that, but still no luck. After a while of thought: oh yes! Our initial assumption that all the country definitions are path subnodes of the toplevel svg node is false. We have to fix that.

So far we have been doing a “flat” transform of the SVG tree: we only changed the subnodes of the toplevel node, but no deeper. We should change all the path elements (and g, if we want to color groups of paths like the UK), regardless of how deep they occur in the tree.

We can use a zipper to do a depth-first walk of the SVG tree. Let us define a function that takes a zipper, a predicate that tells whether to edit the node in question, and the transformation function to apply to the node if the predicate returns true:

(defn map-zipper [f pred z]
  (if (zip/end? z)
    (zip/root z)
    (recur f pred (-> z (zip/edit #(if (pred %) (f %) %)) zip/next)))))

Now we rewrite save-color-map as:

(defn save-color-map
  [svg colorize-fn outfile]
  (let [colored-map (map-zipper #(color-state % colorize-fn) (fn [x] (#{:g :path} (:tag x))) (zip/xml-zip svg))]
    (with-out-writer out
      (emit colored-map))))

This time the UK can be colored.

Colorizers

We have automated the process of styling countries to make them appear in color, but translating particular numbers to RGB is tedious. In the last part of this article we will see how to ease this: we are going to write a colorizer, i.e., a function suitable for passing to color-state and save-color-map (so far we’ve been using maps for this).

Let’s start by writing a function that translates a triplet of numbers into a HTML RGB notation, because it will be easier for us to work with integers than with strings:

(defn htmlize-color
  [[r g b]]
  (format "#%02x%02x%02x" r g b))

Now we insert a call to htmlize-color into the appropriate pace in color-state:

(defn color-state
  [{:keys [tag attrs] :as element} colorize-fn]
  (let [state (:id attrs)]
    (if-let [color (colorize-fn state)]
      (assoc element :attrs (assoc attrs :style (str "fill:" (htmlize-color color))))
      element)))

Now imagine we have a table with numeric values for states, like this:

State	Value
Poland	20
Germany	15
Netherlands	30

We want to have a function that assigns colors to states, such that the intensity of a color should be proportional to the value assigned to a given state. To be more general, assume we have two colors, c1 and c2, and for a given state, for each of the R, G, B components we assign a value proportional to the difference between the state’s value and the smallest value in the dataset, normalized to lie between c1 and c2.

This sounds complex, but I hope an example will clear things up. This is the Clojure implementation of the described algorithm:

(defn make-colorizer
  [dataset ranges]
  (let [minv (apply min (vals dataset))
        maxv (apply max (vals dataset))
        progress (map (fn [[min-col max-col]] (/ (- max-col min-col) (- maxv minv))) ranges)]
    (into {}
          (map (fn [[k v]] [(.toLowerCase k) (map (fn [progress [min-color _]] (int (+ min-color (* (- v minv) progress)))) progress ranges)])
               dataset))))

Let us see how it works on our sample data:

> (make-colorizer {"pl" 20, "de" 15, "nl" 30} [[0 255] [0 0] [0 0]])
{"pl" (85 0 0), "de" (0 0 0), "nl" (255 0 0)}

The second argument means that the red component is to range between 0 and 255, and the green and blue components are to be fixed at 0.

Like we wanted, Germany ends up darkest (because it has the least value), the Netherlands is lightest (because it has the greatest value), and Poland’s intensity is one third that of the Netherlands (because 20 is in one third of the way between 15 and 30).

Wrapping up

The application we created can be further developed in many ways. One can, for instance, add a Web interface for it, or write many different colorizers (e.g., discrete colorizer: fixed colours for ranges of input values, or a temperature colorizer transitioning smoothly from blue through white to red — to do this we would have to pass through the HSV color space).

What is your idea to improve on it? For those of you who are tired of pasting snippets of code into the REPL, I’m putting the complete source code with a Leiningen project on GitHub. Forks are welcome.

Meet my little friend createTree

2011-07-08T00:00:00Z

I’ve recently been developing an iPhone application in my spare time. I’m not going to tell you what it is just yet (I will post a separate entry once I manage to get it into the App Store); for now, let me just say that I’m writing it in JavaScript and HTML5, using [PhoneGap][1] and [jQTouch][2] to give it a native touch.

After having written some of code, I began testing it on a real device and encountered a nasty issue. It turned out that some of the screens of my app, containing a dynamically-generated content, sometimes would not show up. I tried to chase the problem down, but it seemed totally random. Finally, I googled up [this blog post][3] that gave me a clue.

My code was using jQuery’s .html() method (and hence innerHTML under the hood) to display the dynamic content. It turns out that, on Mobile Safari, using innerHTML is highly unreliable (at least on iOS 4.3, but this seems to be a long-standing bug). Sometimes, the change just does not happen. I changed one of my screens, to build and insert DOM objects explicitly, and sure enough, it started to work predictably well.

So I had to remove all usages of .html() from my app. The downside to it was that explicit DOM-building code is much more verbose than the version that constructs HTML and then sets it up. It’s tedious to write and contains much boilerplate.

To not be forced to change code, the above-quoted article advocates using a pure-JavaScript HTML parser outputting DOM to replace jQuery’s .html() method. I considered this for a while, but finally decided against it — I didn’t want to include another big, complex dependency that potentially could misbehave at times (writing HTML parsers is hard).

Instead, I came up with this:

function createTree(tree) {
    if (typeof tree === 'string' || typeof tree === 'number')
        return document.createTextNode(tree);
    var tag = tree[0], attrs = tree[1], res = document.createElement(tag);
    for (var attr in attrs) {
        val = attrs[attr];
        if (attr === 'class')
            res.className = val;
        else
            $(res).attr(attr, val);
    }
     for (var i = 2; i < tree.length; i++)
        res.appendChild(createTree(tree[i]));
    return res;
}

This is very similar in spirit to .html(), except that instead of passing HTML, you give it a data structure representing the DOM tree to construct. It can either be a string (which yields a text node), or a list consisting of the HTML tag name, an object mapping attributes to their values, and zero or more subtrees of the same form. Compare:

Using .html():

var html = '<p>This is an <span class="red">example.</span></p>';
$('#myDiv').html(html);

Using createTree:

var tree = ['p', {},
            'This is an ',
            ['span', {'class': 'red'}, 'example.']];
$('#myDiv').empty().append(createTree(tree));

A side benefit is that it is just as easy to build up a tree dynamically as it is to create HTML, and the code often gets clearer. Note how the createTree version above does not mix single and double quotes which is easy to mess up in the .html() version.

A quirk with JavaScript closures

2011-05-15T00:00:00Z

I keep running into this obstacle every now and then. Consider this example:

> q = []
[]
> for (var i = 0; i < 3; i++)
    q.push(function() { console.log(i); });
> q[0]()
3

I wanted an array of three closures, each printing a different number to the console when called. Instead, each prints 3 (or, rather, whatever the value of the variable i happens to be).

I am not exactly sure about the reason, but presumably this happens because the i in each lambda refers to the variable i itself, not to its binding from the creation time of the function.

One solution is to enforce the bindings explicitly on each iteration, like this:

for (var i = 0; i < 3; i++)
  (function(v) {
    q.push(function() { console.log(v); });
  })(i);

Or use Underscore.js, which is what I actually do:

_([1,2,3]).each(function(i) {
  q.push(function() { console.log(i); });
});

The Dijkstran wheel of fortune: SPSS, Excel, VBA

2011-03-28T00:00:00Z

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.
— Edsger W. Dijkstra, EWD 498

I like to think of myself somewhat egotistically as a counterexample to the above Dijkstra’s statement. Granted, some of my code is definitely of poor quality, and I dare not call myself a good programmer. But, having started with BASIC on a Commodore 64, then proceeding to learn Pascal (of the Turbo/Borland flavour), then C, x86 assembly, OCaml, Smalltalk, Java, C++, Haskell, Common Lisp, Clojure, and a couple of other languages, with a few enlightenments achieved along the way, I do think I managed to regenerate from the mental wounds that BASIC had inflicted upon me. And now I feel a strange sensation, now that the Dijkstran wheel of fortune has made a full spin: I’ve spend the last few days writing BASIC code. I’ve written several Excel macros in Visual Basic for Applications.

Why the strange selection of a language? Well, this was simply the best tool for the job. What I needed to do was postprocess the output of some statistical analyses performed in [SPSS][1] running under Windows, altering the way the results were presented. SPSS can export data to HTML, Word, and Excel; of these three, the latter is most convenient, because it preserves the structure of the output tables most thoroughly. (In principle, HTML does too, and in fact my first stab was with Clojure, but I stopped after realizing just how much ad-hoc, throwaway code that parses the SPSS-generated HTML, munges it several times to and fro, and then outputs back HTML I’d have to write). So I went the Excel way, and in this post I’d like to share my mixed feelings from that encounter.

Visual Basic the language is icky. It is certainly a step forward from the BASIC I remember from decades ago, in that I didn’t have to number my lines, and it is possible to structure the code nicely so that it doesn’t contain any GOTOs, GOSUBs or RETURNs. And it has this object-oriented feel to it. But compared to modern languages, programming in it resembles voluntarily putting on handcuffs, and then jumping around to avoid stumbling over the logs it throws under your legs. Not quite so big and scary logs as C++ does, but still. I mean, why on earth does VB have to distinguish between expressions and statements? Many languages do, but in most of them an expression is at least a valid statement. Not so in VB. Also, VB is still line-oriented: whether or not you require an End If in the conditional statement depends on whether it fits in one line or not. But my biggest pain was with the assignments. VB makes a distinction between reference assignments and other assignments, requiring a Set statement in the first case, and disallowing it in the second. So, Set myCell = thatOtherCell but foo = 42. Worse, forgetting the Set in the first case does not result in an error, which makes such bugs very hard to debug. Yurgh.

Also, the IDE built into Excel for developing VB macros is mediocre. There is an editor, which highlights the syntax and automatically reformats the code, inserting spaces as appropriate, which is nice. It slaps me in face with a modal dialog whenever I make a syntax error and move off the line, which is not so nice. There is a REPL of sorts, taking the form of an “Immediate” window, into which you can type statements (not expressions, remember?) and tap Enter to execute them. You can also Debug.Print to them, like to a JavaScript console. It is not reachable by Ctrl-Tab from the editor, so I ended up using mouse much more often than normally. I want my Emacs back!

On the other hand, I find the object-oriented API for actually accessing the spreadsheets quite well-designed and pleasant to use. You just grab the object representing your worksheets from the global Worksheets object (indexable by number or by name), and from there you access your cells. The basic object you work with is the Range object, representing either a single cell or a bunch of them; you can get or set cell values, change the formatting, call Offset to navigate around as if with cursor keys. You also can search for specific content in the sheet. Simple enough, easy to use and pick up; and above all, allows to get the job done without getting in the way much.

As for SPSS itself: it sucks. In fact, it sucks so great and in so many different ways that it merits its own blog entry (which will follow someday). For now, I’ll only note down the things pertaining to Excel interop; hopefully it will save somebody’s time.

Problem is, SPSS 19’s Excel export is buggy. In fact, it’s so unreliable that I’ve wasted more hours struggling with it than actually writing my macros. (We’re talking SPSS 19 here; I’ve also tried version 17, with the same results.) It exports small data chunks fine, but the larger your output, the more likely it is that Excel alerts about unreadable content in your file. Excel then offers to repair the data, which mostly succeeds, but inevitably loses the formatting — which for me was a no-no.

So, after long hours of experimentation and attempting different workarounds, I found that it is much, much more reliable to just copy your data and paste it into Excel directly, without exporting to a temporary file. Just do Edit → Copy special and select Excel BIFF format, to make sure you’re copying the right data. If Excel complains about not being able to understand the copied content (turn on the Clipboard preview to find out), save your output to .spv, restart SPSS, re-run your syntax and try again. With luck, it will eventually work. At least for me it did.

Keyword arguments

2010-05-04T00:00:00Z

There’s been an ongoing debate about how to pass optional named arguments to Clojure functions. One way to do this is the defnk macro from clojure.contrib.def; I hesitate to call it canonical, since apparently not everyone uses it, but I’ve found it useful a number of times. Here’s a sample:

user> (use 'clojure.contrib.def)
nil
user> (defnk f [:b 43] (inc b))
#'user/f
user> (f)
44
user> (f :b 100)
101

This is an example of keyword arguments in action. Keyword arguments are a core feature of some languages, notably Common Lisp and Objective Caml. Clojure doesn’t have them, but it’s pretty easy to emulate their basic usage with macros, as defnk does.

But there’s more to Common Lisp’s keyword arguments than defnk provides. In CL, the default value of a keyword argument can be an expression referring to other arguments of the same function. For example:

CL-USER> (defun f (&key (a 1) (b a))
           (+ a b))
F
CL-USER> (f)
2
CL-USER> (f :a 45)
90
CL-USER> (f :b 101)
102

I wish defnk had this feature. Or is there some better way that I don’t know of?

Sunflower

2010-04-18T00:00:00Z

The program I’ve been [writing about recently][1] has come to a point where I think it can be shown to the wide public. It’s called [Sunflower][2] and has its home on GitHub. It’s nowhere near being completed, and of alpha quality right now, but even at this stage it might be useful.

Just as sunflower seed kernels come wrapped in hulls, most HTML documents seen in the wild come wrapped in noise that is not really part of the document itself. Take any news site: a document from such a site contains things such as advertisements, header, footer, and many links. Now suppose you have many documents grabbed from the same site. Is it possible to somehow automate the extraction of the document “essences”?

Sunflower to the rescue. It relies on the assumption that documents coming from the same source have the same structure. It presents a list of strings to the user, and asks to pick those that are contained in the text essence. Then it finds the coordinates of the smallest HTML subtree that contains all those strings, and uses those coordinates to extract information from all documents. And it comes with a nice, easily understandable GUI for that.

This technique works remarkably well for many collections, although not all. An earlier, proof-of-concept implementation (in Common Lisp) has been used to extract many press texts for the [National Corpus of Polish][3].

I’ve given up on the symbol-capturing approach to wizards I’ve presented in my previous posts. Inspired by the DOM tree in Web apps, with a bag of elements with identifiers, I now have a central bag of Swing widgets (implemented as an atom) identified by keywords. This bag contains tidbits of the mutable state of Sunflower. This means that I can write callback functions like this:

#(with-components [strings-model selected-dir]
   (.removeAllElements strings-model)
   (let [p (-> selected-dir htmls first parse)]
     (add-component :parsed p)
     (doseq [x (strings p)]
       (.addElement strings-model x))))

Name and conquer: having parts of state explicitly named mean that I can reliably access them from just about anywhere. This reduces confusion and allows for less tangled, more self-contained and understandable code.

A case for symbol capture

2010-04-05T00:00:00Z

Clojure by default protects macro authors from incidentally capturing a local symbol. Stuart Halloway describes this in more detail, explaining why this is a Good Thing. However, sometimes this kind of symbol capture is called for. I’ve encountered one such case today while hacking a Swing application.

As I develop the app, I find new ways to express Swing concepts and interact with Swing objects in a more Clojuresque way, so a library of GUI macros and functions gets written. One of them is a wizard macro for easy creation of installer-like wizards, where there is a sequence of screens that can be navigated with Back and Next buttons at the bottom of the window.

The API (certainly not finished yet) currently looks like this:

(wizard & components)

where each Swing component corresponding to one wizard screen can be augmented by a supplementary map, which can contain, inter alia, a function to execute upon showing the screen in question.

Now, I want those functions to be able to access the Back and Next buttons in case they want to disable or enable them at need. I thus want the API user to be able to use two symbols, back-button and next-button, in the macro body, and have them bound to the corresponding buttons.

It is crucial that these bindings be lexical and not dynamic. If they were dynamic, they would be only effective during the definition of the wizard, but not when my closures are invoked later on. Thus, my implementation looks like this:

(defmacro wizard [& panels]
  `(let [~'back-button (button "< Back")
         ~'next-button (button "Next >")]
   (do-wizard ~'back-button ~'next-button ~(vec panels))))

where do-wizard is a private function implementing the actual wizard creation, and the ~'foo syntax forces symbol capture.

By the way, if all goes well, this blog post should be the first one syndicated to Planet Clojure. Hello, Planet Clojure readers!

The pitfalls of `lein swank`

2010-03-31T00:00:00Z

A couple of weeks ago I finally got around to acquainting myself with [Leiningen][1], one of the most popular build tools for Clojure. The thing that stopped me the most was that Leiningen uses [Maven][2] under the hood, which seemed a scary beast at first sight — but once I’ve overcome the initial fear, it turned out to be a quite simple and useful tool.

One feature in particular is very useful for Emacs users like me: lein swank. You define all dependencies in project.clj as usual, add a magical line to :dev-dependencies, then say

$ lein swank

and lo and behold, you can M-x slime-connect from your Emacs and have all the code at your disposal.

There is, however, an issue that you must be aware of when using lein swank: Leiningen uses a custom class loader — [AntClassLoader][3] to be more precise — to load the Java classes referenced by the code. Despite being a seemingly irrelevant thing — an implementation detail — this can bite you in a number of most surprising and obscure ways. Try evaluating the following code in a Leiningen REPL:

(str (.decode
       (java.nio.charset.Charset/forName "ISO-8859-2")
       (java.nio.ByteBuffer/wrap
         (into-array Byte/TYPE (map byte [-79 -26 -22])))))
;=> "???"

The same code evaluated in a plain Clojure REPL will give you "ąćę", which is a string represented in ISO-8859-2 by the three bytes from the above snippet.

Whence the difference? Internally, each charset is represented as a unique instance of its specific class. These are loaded lazily as needed by the Charset/forName method. Presumably, the system class loader is used for that, and somewhere along the way a SecurityException gets thrown and caught.

Note also that there are parts of Java API which use the charset lookup under the hood and are thus vulnerable to the same problem, for example Reader constructors taking charset names. If you use clojure.contrib.duck-streams, then rebinding *default-encoding* will not work from a Leiningen REPL. Jars and überjars produced by Leiningen should be fine, though.

Clojure SET

2010-02-10T00:00:00Z

I’ve just taken a short breath off work to put some code on GitHub that I had written over one night some two months ago. It is an implementation of the Set game in Clojure, using Swing for GUI.

I do not have time to clean up or comment the code, so I’m leaving it as is for now; however, I hope that even in its current state it can be of interest, especially for Clojure learners.

Some random notes on the code:

Clojure is concise! The whole thing is just under 250 lines of code, complete with game logic and the GUI. Of these, the logic is about 50 LOC. Despite this it reads clearly and has been a pleasure to write, thanks to Clojure’s supports for sets as a data structure (in vein of the game’s title and theme).
There are no graphics included. All the drawing is done in the GUI part of code (I’ve replaced the canonical squiggle shape by a triangle and stripes by gradients, for the sake of easier drawing).
I’ve toyed around with different Swing layout managers for this game. Back in the days when I wrote in plain Java, I used to use TableLayout, but it has a non-free license; JGoodies Forms is also nice, but has a slightly more complicated API (and it’s an additional dependency, after all). In the end I’ve settled with the standard GridBagLayout, which is similar in spirit to those two, but requires more boilerplate to set up. As it turned out, simple macrology makes it quite pleasurable to use; see add-gridbag in the code for details.
Other things of interest might be my function to randomly shuffle seqs, which strikes a nice balance between simplicity/conciseness of implementation and randomness; and a useful debugging macro.

Comments?

anti-procrastination.el

2008-12-18T00:00:00Z

Fighting procrastination has been my major concern these days. I’ve devised a number of experimental tools to help me with that. One of them is called snafu and can generate reports of your activity throughout the whole day of work. It’s in a preliminary state, but works (at least since I’ve found and fixed a long-standing bug in it which would cause it to barf every now and then), and I already have a number of ideas for its further expansion.

Reports alone, however, do not quite muster enough motivation for work. I’m doing most of my editing/programming work in Emacs, so yesterday I grabbed the Emacs Lisp manual and came up with a couple of extra lines at the end of my .emacs.

;;; Written by Daniel Janus, 2008/12/18.
;;; This snippet is placed into the public domain.  Feel free
;;; to use it in any way you wish.  I am not responsible for
;;; any damage resulting from its usage.

(defvar store-last-modification-time t)
(defvar last-modification-time nil)
(defun mark-last-modification-time (beg end len)
  (let ((b1 (substring (buffer-name (current-buffer)) 0 1)))
    (when (and store-last-modification-time
               (not (string= b1 " "))
               (not (string= b1 "*")))
      (setq last-modification-time (current-time)))))
(add-hook 'after-change-functions 'mark-last-modification-time)
(defun write-lmt ()
  (setq store-last-modification-time nil)
  (when last-modification-time
    (with-temp-file "/tmp/emacs-lmt"
      (multiple-value-bind (a b c) last-modification-time
        (princ a (current-buffer))
        (terpri (current-buffer))
        (princ b (current-buffer)))))
  (setq store-last-modification-time t))
(run-at-time nil 1 'write-lmt)

Every second (to change that to every 10 seconds, change the 1 to 10 in the last line) it creates a file named /tmp/emacs-lmt which contains the time of last modification of any non-system buffer.

That’s all there is to it, at least on the Emacs side. The other part is a simple shell script, which uses MPlayer to display a nag-screen for five seconds, and then give me some time to start doing anything useful before nagging me again:

#!/bin/bash
TIMEOUT=300
while true; do
   cat /tmp/emacs-lmt | (
      read a; read b;
      c="`date +%s`";
      let x=c-65536*a-b;
      if test $x -gt $TIMEOUT;
          then mplayer -fs $HOME/p.avi;
               sleep 15;
      fi)
   sleep 1
done

The nag-screen in my case is an animation which I’ve created using MEncoder from a single frame which looks like this. Beware the expletives! (This is one of the few cases I find their usage justified, as the strong message bites the conscience more strongly.)

I’ve only been testing this setup for one day, but so far it’s working flawlessly: I got more done yesterday than for the two previous days combined, and that’s excluding the hour or so that took me to write these snippets.

If anyone else happens to give it a try, I’d love to hear any comments.

Who said Common Lisp programs cannot be small?

2008-08-09T00:00:00Z

So, how much disk space does your average CL image eat up? A hundred megs? Fifty? Twenty? Five, perhaps, if you’re using LispWorks with a tree-shaker? Well then, how about this?

[nathell@chamsin salza2-2.0.4]$ ./cl-gzip closures.lisp test.gz
[nathell@chamsin salza2-2.0.4]$ gunzip test
[nathell@chamsin salza2-2.0.4]$ diff closures.lisp test
[nathell@chamsin salza2-2.0.4]$ ls -l cl-gzip
-rwxr-xr-x 1 nathell nathell 386356 2008-08-09 11:08 cl-gzip

That’s right. A standalone executable of a mini-gzip, written in Common Lisp, taking up under 400K! And it only depends on glibc and GMP, which are available by default on pretty much every Linux installation. (This is on a 32-bit x86 machine, by the way).

I used the most recent version of ECL for compiling this tiny example. The key to the size was configuring ECL with --disable-shared --enable-static CFLAGS="-Os -ffunction-sections -fdata-sections" LDFLAGS="-Wl,-gc-sections". This essentially gives you a poor man’s tree shaker for free at a linker level. And ECL in itself produces comparatively tiny code.

I built this example from Salza2’s source by loading the following code snippet:

(defvar salza
  '("package" "reset" "specials"
    "types" "checksum" "adler32" "crc32" "chains"
    "bitstream" "matches" "compress" "huffman"
    "closures" "compressor" "utilities" "zlib"
    "gzip" "user"))

(defvar salza2
  (mapcar (lambda (x) (format nil "~A.lisp" x))
          salza))

(defvar salza3
  (mapcar (lambda (x) (format nil "~A.o" x))
          salza))

(defun build-cl-gzip ()
  (dolist (x salza2)
          (load x)
          (compile-file x :system-p t))
  (c:build-program
   "cl-gzip"
   :lisp-files salza3
   :epilogue-code
     '(progn
       (in-package :salza2)
       (gzip-file (second (si::command-args))
                  (third (si::command-args))))))

(build-cl-gzip)

(Sadly enough, there’s no ASDF in here. I have yet to figure out how to leverage ASDF to build small binaries in this constrained environment.)

This gave me a standalone executable 1.2 meg in size. I then proceeded to compress it with UPX (with arguments --best --crp-ms=999999) and got the final result. How cool is that?

I am actively looking for a new job. If you happen to like my writings and think I might be just the right man for the team you’re building up, please feel free to consult my résumé or pass it on.

Update 2010-Jan-17: the above paragraph is no longer valid.

cl-morfeusz: A ninety minutes’ hack

2008-06-23T00:00:00Z

Here’s what I came up with today, after no more than 90 minutes of coding (complete with comments and all):

MORFEUSZ> (morfeusz-analyse "zażółć gęślą jaźń")
((0 1 "zażółć" "zażółcić" "impt:sg:sec:perf")
 (1 2 "gęślą" "gęśl" "subst:sg:inst:f")
 (2 3 "jaźń" "jaźń" "subst:sg:nom.acc:f"))

This is cl-morfeusz in action, a Common Lisp interface to Morfeusz, the morphological analyser for Polish.

It’s a single Lisp file, so there’s no ASDF system definition or asdf-installability for now. I’m not putting it under version control, either. Or, should I say, not yet. When I get around to it, I plan to write a simple parser and write a Polish-language version of the text adventure that started it all.

Meanwhile, you may use cl-morfeusz for anything you wish (of course, as long as you comply with Morfeusz’s license). Have fun!

Update 2010-Jan-17: With the advent of UTF-8 support in CFFI, the ugly workarounds in the code are probably no longer necessary; I don’t have time to check it right now, though.

cl-netstrings

2008-04-30T00:00:00Z

I’ve just packaged up the Common Lisp netstring handling code that I wrote a week ago into a neat library. Unsurprisingly enough, it is called cl-netstrings and has its own home on the Web. It’s even asdf-installable! I wonder whether this one turns out to be useful for anybody besides me…

The other thing I’ve been working on is a new build system for Poliqarp. But that’s the story for another post — most probably I will write about it when it gets out of a state of constant flux.

Update 2010-Jan-17: cl-netstrings is now hosted on GitHub; I’ve updated the link.

Hacking away with JSON-RPC

2008-04-24T00:00:00Z

Let’s try:

(let ((s (socket-stream
          (socket-connect "localhost" 10081
                          :element-type '(unsigned-byte 8)))))
  (write-netstring "{\"method\":\"ping\",\"params\":[],\"id\":1}" s)
  (finish-output s)
  (princ (read-netstring s))
  (close s))
; { "result": "pong" }
; --> T

Yay! This is Common Lisp talking to a JSON-RPC server written in C. This means that I have now the foundations for rewriting Poliqarp on top of JSON-RPC (according to the protocol spec I have recently posted) up and running, and all that remains is to fill the remainder.

Well, to be honest, this is not exactly JSON-RPC. First off, as you might have noticed, the above snippet of code sends JSON-RPC requests as netstrings. This is actually intentional, and the reasons for adopting this encoding have been described in detail in the spec (it basically boils down to the fact that it greatly simplifies reading from and writing to network, especially in C). I wrote some crude code to handle netstrings in CL — now it occurred to me that it might actually be worthwhile to polish it up a little, write some documentation and put on CLiki as an asdf-installable library. I’ll probably get on to this quite soon.

Second, the resulting JSON object does not have all the necessary stuff. It contains the result, but not the error or id (as mandated by the JSON-RPC spec). This is actually a deficiency of the JSON-RPC C library I’m currently using. It places the burden of constructing objects that are proper JSON-RPC responses on the programmer, instead of doing that itself. This will be easy to sort out, however, because the library is small and adheres to the KISS principle. More of a problem is that the licensing of that library is unclear; I emailed the maintainers to explain the status.

Poliqarp’s new protocol

2008-04-16T00:00:00Z

The first version of the document I’ve been writing about a couple of days ago is now ready for public review. I’ll be making an initial attempt at the implementation once I return from the European Common Lisp Meeting ‘08 and write a report.

I’m not playing this stupid game anymore

2008-04-14T00:00:00Z

Not until the next tournament, that is. My achievements in the 12th Scrabble Championship of Warsaw can be described as “mediocre” at best; four won, one drawn and seven lost games mean that my general rating will drop down by two points or so. Oh well. Everybody knows it’s a stupid game. ;-) At least I’ve managed to get a decent small score, with an average of 377 points per game.

Random resolutions for the indefinite future:

Get a final draft of the C++09 standard when it’s ready and acquaint myself with it as closely as possible. I strongly dislike C++ (and I’m not alone in this — see the Frequently Questioned Answers about C++ for very detailed criticisms); however, I’ve long wanted to learn that language better just to know all the strengths and weaknesses of the enemy. The ideal moment for this will be when the new standard is out; this will give me the advantage of not having to unlearn the things changed by the standard, while staying on a cutting and competitive edge.
Get a copy of Federico García Lorca’s poems translated into Polish by Jerzy Ficowski. I have only a very vague knowledge of Lorca (just his Romance of the Spanish Civil Guard (Romance de la Guardia Civil Española)), but I very much like what little I know.

Daniel Janus – programming

On LLMs in programming

Where I stand

Conscious excitement

Are we all 10x programmers now?

Adapt or perish

Final words

No, really, you can’t branch Datomic from the past (and what you can do instead)

Cleaner codebase, happier mind

Double, double toil and trouble or, Corner-Cases of Comparing Clojure Numbers

Lossy CSS compression for fun and loss (or profit)

What

How

Why

My mental model of transducers

Intro

Pictures

Code

A visual tree iterator in Rust

The problem

The algorithm

Building an iterator, take 1

Taking a step back

Removing the root

Fixing a bug

Takeaways

Learning to learn Rust

Testing a compiler that can’t even print stuff out

Things I wish Git had: Commit groups

Intro

Create a merge commit

Squash and merge

Rebase and merge

Rebase, group and merge

I made a website to guess tomorrow’s number of COVID-19 cases, and here’s what happened

Before

After

Inside out

Takeaways

Making of “Clojure as a dependency”

Data acquisition

Data analysis

Some words on plotting

Key takeaways

Clojure as a dependency

Is it possible for a Clojure project to declare no dependency on Clojure at all?

Leiningen

cli-tools

Boot

So what do the existing projects do?

Indenting cond forms

Tests and expressions next to each other

Stack all forms vertically, no extra spacing

Stack all forms vertically, blank lines between test/expr pairs

Forms one under another, extra indentation for expressions only

Forms one under another, expressions prefixed by #_=>

Meanwhile, in another galaxy

Careful with that middleware, Eugene

Prologue

Part One: Wrap, wrap, wrap, wrap

Part Two: The tests are screaming loud

Epilogue

Word Champions

Re-framing text-mode apps

Intro

In which an unexpected appearance is made

Living without the DOM

Events at eventide

Closing thoughts

Happy Programmers’ Day!

You already use Lisp syntax

DOS debugging quirk

2048: A close look at the source

Lithium revisited: A 16-bit kernel (well, sort of) written in Clojure (well, sort of)

Lithium: an x86 assembler for Clojure

FAQ

How to call a private function in Clojure

Lifehacking: How to get cheap home equipment using Clojure

Ever wanted to programmatically file a lawsuit? In Poland, you can.

Combining virtual sequencesor, Sequential Fun with Macrosor, How to Implement Clojure-Like Pseudo-Sequences with Poor Man’s Laziness in a Predominantly Imperative Language

Forms one under another, expressions prefixed by `#_=>`

Combining virtual sequences
or, Sequential Fun with Macros
or, How to Implement Clojure-Like Pseudo-Sequences with Poor Man’s Laziness in a Predominantly Imperative Language

The pitfalls of `lein swank`