tag:blog.danieljanus.pl,2019:feedcode · words · emotions: Daniel Janus’s blog2024-01-26T00:00:00ZDaniel Janushttp://danieljanus.pldj@danieljanus.pltag:blog.danieljanus.pl,2024-01-26:post:lossy-css-compressionLossy CSS compression for fun and loss (or profit)2024-01-26T00:00:00Z<div><h2 id="what">What</h2><p>Late last year, I had an idea that’s been steadily brewing in my head. I’ve found myself with some free time recently (it coincided with vacation, go figure), and I’ve hacked together some proof-of-concept code. Whether or not it is actually proving the concept I’m not sure, but the results are somewhat interesting, and I believe the idea is novel (I haven’t found any other implementation in the wild). So it’s at least worthy of a blog post.</p><p>I wrote <code>cssfact</code>, a lossy CSS compressor. That is, a program that takes some CSS and outputs back some other CSS that hopefully retains some (most) of the information in the input, but contains fewer rules than the original. Exactly how many rules it produces is configurable, and the loss depends on that number.</p><p>The program only works on style rules (which make up the majority of a typical CSS). It leaves the non-style rules unchanged.</p><p><a href="https://github.com/nathell/cssfact">Here’s the source</a>. It’s not exactly straightforward to get it running, but it shouldn’t be very hard, either. It’s very simple – the program itself doesn’t contain any fancy logic; the actual decisions on what the output will contain are made by an external program.</p><p>If you just want to see some results, here is a sample with <a href="https://danieljanus.pl">my homepage</a> serving as a patient etherized upon a table. Its CSS is quite small – 55 style rules that cssfact can work on – and here’s how the page looks with various settings:</p><ul><li><span>Original: <a href="https://danieljanus.pl">page</a>, <a href="https://danieljanus.pl/css/nhp.css">CSS</a>, <a href="https://github.com/nathell/nhp/blob/master/src/sass/nhp.sass">source SASS</a></span></li><li><span>1 style rule: <a href="https://danieljanus.pl/index1.html">page</a>, <a href="https://danieljanus.pl/css/nhp1.css">CSS</a> (93% information loss)</span></li><li><span>5 style rules: <a href="https://danieljanus.pl/index5.html">page</a>, <a href="https://danieljanus.pl/css/nhp5.css">CSS</a> (74% information loss)</span></li><li><span>10 style rules: <a href="https://danieljanus.pl/index10.html">page</a>, <a href="https://danieljanus.pl/css/nhp10.css">CSS</a> (55% information loss)</span></li><li><span>20 style rules: <a href="https://danieljanus.pl/index20.html">page</a>, <a href="https://danieljanus.pl/css/nhp20.css">CSS</a> (31% information loss)</span></li><li><span>30 style rules: <a href="https://danieljanus.pl/index30.html">page</a>, <a href="https://danieljanus.pl/css/nhp30.css">CSS</a> (17% information loss)</span></li></ul><p>My homepage and both of my blogs all use the same CSS, so you can try to replace the CSS in your browser’s devtools elsewhere on the site and see how it looks.</p><h2 id="how">How</h2><p>Three words: <a href="https://cs.uef.fi/~pauli/bmf_tutorial/material.html">binary matrix factorization</a> (BMF, in the Boolean algebra).</p><p>I guess I could just stop here, but I’ll elaborate just in case it isn’t clear.</p><p>Consider a simple CSS snippet:</p><pre><code class="hljs css"><span class="hljs-selector-tag">h1</span>, <span class="hljs-selector-tag">h2</span> {
<span class="hljs-attribute">padding</span>: <span class="hljs-number">0</span>;
<span class="hljs-attribute">margin-bottom</span>: <span class="hljs-number">0.5em</span>;
}
<span class="hljs-selector-tag">h1</span> {
<span class="hljs-attribute">font-size</span>: <span class="hljs-number">32px</span>;
<span class="hljs-attribute">font-weight</span>: bold;
}
<span class="hljs-selector-tag">h2</span> {
<span class="hljs-attribute">font-size</span>: <span class="hljs-number">24px</span>;
<span class="hljs-attribute">font-weight</span>: bold;
}
</code></pre><p>The first rule tells you that for all elements that match either the <code>h1</code> or <code>h2</code> selectors, the two declarations should apply.</p><p>You could visualize this CSS as a 5x2 binary matrix <i>A<sup>T</sup></i> where the <em>n</em> columns correspond to simple selectors (i.e., without commas in them) and the <em>m</em> rows correspond to declarations:</p><style>
.css-table th { text-align: right; }
.css-table td { text-align: center; }
</style>
<table class="css-table">
<tr><th></th><th><code>h1</code></th><th><code>h2</code></th></tr>
<tr><th><code>padding: 0</code></th><td>1</td><td>1</td></tr>
<tr><th><code>margin-bottom: 0.5em</code></th><td>1</td><td>1</td></tr>
<tr><th><code>font-size: 32px</code></th><td>1</td><td>0</td></tr>
<tr><th><code>font-size: 24px</code></th><td>0</td><td>1</td></tr>
<tr><th><code>font-weight: bold</code></th><td>1</td><td>1</td></tr>
</tbody>
</table>
<p>You could also transpose the matrix, yielding <em>A</em> with <em>m</em> rows denoting selectors and <em>n</em> columns denoting declarations. For my homepage’s CSS, <em>m</em> = 60 and <em>n</em> = 81; for bigger stylesheets, several thousand in either direction is not uncommon.</p><p>Now, linear algebra gives us algorithms to find a matrix <em>A′ ≈ A</em> such that there exists a decomposition <em>A′ = B × C</em>, where <em>B</em> has dimensions <em>m × r</em>, <em>C</em> has dimensions <em>r × n</em>, and <em>r</em> is small – typically much smaller than <em>m</em> or <em>n</em>. So this is a way of dimensionality reduction.</p><p>In the usual algebra of real numbers, there’s no guarantee that <em>B</em> or <em>C</em> will themselves be binary matrices – in fact, most likely they won’t. But if we operate in Boolean algebra instead (i.e. one where 1 + 1 = 1), then both <em>B</em> and <em>C</em> will be binary. The flip side is that the Boolean BMF problem is NP-hard, so the algorithms found in the wild perform approximate decompositions, not guaranteed to be optimal.</p><p>But that’s okay, because lossiness is inherent in what we’re doing anyway, and it turns the binary matrices <em>B</em> and <em>C</em> are readily interpretable. Look again at the CSS matrix above: why is there a 1 in the top-left cell? Because at least one of the CSS rules stipulates the declaration <code>padding: 0</code> for the selector <code>h1</code>.</p><p>This is exactly the definition of matrix multiplication in the Boolean algebra. The matrix <em>A′</em> will have a 1 at coordinates [<em>i, j</em>] iff there is at least one <em>k</em> ∈ {1, …, <em>r</em>} such that <em>B</em>[<em>i</em>, <em>k</em>] = 1 and <em>C</em>[<em>k</em>, <em>j</em>] = 1. So the columns of <em>B</em> and rows of <em>C</em> actually correspond to CSS rules! Every time you write CSS, you’re actually writing out binary matrices – and the browser is multiplying them to get at the actual behaviour.</p><p>Well, not really, but it’s one way to think about it. It’s not perfect – it completely glosses over rules overlapping each other and having precedence, and treats them as equally important – but it somewhat works!</p><p>You could plug in any BMF algorithm to this approach. For cssfact, I’ve picked the code by <a href="https://github.com/IBM/binary-matrix-factorization/">Barahona and Goncalves 2019</a> – sadly, I wasn’t able to find the actual paper – not because it performs spectacularly well (it’s actually dog-slow on larger stylesheets), but because I was easily able to make it work and interface with it.</p><h2 id="why">Why</h2><p>Why not?</p><p>The sheer joy of exploration is reason enough, but I believe there are potential practical applications. CSS codebases have the tendency to grow organically and eventually start collapsing under their own weight, and they have to be maintained very thoughtfully to prevent that. In many CSS monstrosities found in the wild, there are much cleaner, leaner, essence-capturing cores struggling to get out.</p><p>This tool probably won’t automatically extract them for you – so don’t put it in your CI pipeline – but by perusing the CSS that it produces and cross-checking it with the input, you could encounter hints on what redundancy there is in your styles. Things like “these components are actually very similar, so maybe should be united” may become more apparent.</p></div>tag:blog.danieljanus.pl,2023-09-09:post:transducersMy mental model of transducers2023-09-09T00:00:00Z<div><h2 id="intro">Intro</h2><p>I’ve been programming in Clojure for a long time, but I haven’t been using transducers much. I learned to mechanically transform <code>(into [] (map f coll))</code> to <code>(into [] (map f) coll)</code> for a slight performance gain, but not much beyond that. Recently, however, I’ve found myself refactoring transducers-based code at work, which prompted me to get back to speed.</p><p>I found Eero Helenius’ article <a href="https://dev.solita.fi/2021/10/14/grokking-clojure-transducers.html">“Grokking Clojure transducers”</a> a great help in that. To me, it’s much more approachable than the <a href="https://clojure.org/reference/transducers">official documentation</a> – in a large part because it shows you how to build transducers from the ground up, and this method of learning profoundly resonates with me. I highly recommend it. However, it’s also useful to have a visual intuition of how transducers work, a mental model that hints at the big picture without zooming into the details too much. In this post, I’d like to share mine and illustrate it with a REPL session. (Spoiler alert: there’s <a href="https://github.com/clojure/core.async">core.async</a> ahead, but in low quantities.)</p><h2 id="pictures">Pictures</h2><p>Imagine data flowing through a conveyor belt. Say, infinitely repeating integers from 1 to 5:</p><img src="/img/blog/conveyor-belt.svg" alt="Conveyor belt">
<p>I’m using the abstract term “conveyor belt”, rather than “sequence” or something like this, to avoid associations with any implementation details. Just pieces of data, one after another. These data may be anything; they may flow infinitely or stop at some point; may or may not all exist in memory at the same time. Doesn’t matter. That’s the beauty of transducers: they completely abstract away the implementation of sequentiality.</p><p>So, what is a transducer, intuitively? It’s a mechanism for <em>transforming conveyor belts into other conveyor belts</em>.</p><p>For example, <code>(map inc)</code> is a transducer that says: “take this conveyor belt and produce one where every number is incremented”. Applying it to the above belt yields this one:</p><img src="/img/blog/conveyor-belt-2.svg" alt="Conveyor belt, transformed">
<p>An important thing about transducers is that they’re <em>composable</em>. To understand that, imagine further transforming the above belt by removing all the odd numbers. Intuitively, that’s what <code>(remove odd?)</code> does:</p><img src="/img/blog/conveyor-belt-3.svg" alt="Conveyor belt, transformed again">
<p>(I’ve left the spacing between boxes the same as before, because it helps me visualise <code>(remove odd?)</code> better. I imagine an invisible gnome sitting above the belt, watching carefully all the boxes that pass below it, and snatching greedily every one that happens to contain an odd number.)</p><p>Composability means that Clojure lets you say <code>(comp (map inc) (remove odd?))</code> to mean the transducer that transforms the first belt to the third one. By putting together two simple building blocks, we produced a more complex one – that it itself reusable and can be used as another building block in an ever more complex data pipeline.</p><p>Notice we <em>still</em> haven’t said anything about the actual representation of the data, but are already able to model complex processes. We can then apply them to actual data, whether it’s a simple vector-to-vector transformation within the same JVM, or listening to a topic on a Kafka cluster, summarizing the incoming data and sending them to a data warehouse.</p><h2 id="code">Code</h2><p>OK, enough handwaving, time for a demo. Let’s fire up a REPL and load core.async (I’m assuming you’ve added it to your dependencies already). I won’t reproduce here the resulting values of expressions we evaluate (they’re mostly <code>nil</code>s anyway), but I will reproduce output from the REPL (as comments).</p><pre><code class="hljs clojure">(<span class="hljs-name">require</span> '[clojure.core.async <span class="hljs-symbol">:refer</span> [chan <!! >!! thread close!]])
</code></pre><p>Why core.async? Because I find it a great way to implement a conveyor belt that you can play with interactively. This can help you understand how the various Clojure-provided transducers work. For the noncognoscenti: core.async is a Clojure library that allows you to implement concurrent processes that communicate over <em>channels</em>. By default, that communication is synchronous, meaning that if a process tries to read from a channel, it blocks until another process writes something to that channel.</p><p>As it happens, we can pass a transducer to the function that creates channels, <code>chan</code>. It will put the invisible gnomes to work on values that pass through the channel. So you can view that channel as a conveyor belt!</p><p>For easy tinkering, we can do this:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defn</span> <span class="hljs-title">transformed-belt</span> [xf]
(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [ch (<span class="hljs-name">chan</span> <span class="hljs-number">1</span> xf)]
(<span class="hljs-name">thread</span>
(<span class="hljs-name"><span class="hljs-built_in">loop</span></span> []
(<span class="hljs-name">when-some</span> [value (<span class="hljs-name"><!!</span> ch)]
(<span class="hljs-name">println</span> <span class="hljs-string">"Value:"</span> (<span class="hljs-name">pr-str</span> value)))
(<span class="hljs-name"><span class="hljs-built_in">recur</span></span>)))
ch))
</code></pre><p>This fires up a process working at the receiving end of the conveyor belt. It will print out any transformed values as soon as they become available. Typing at the REPL, we will assume the role of producer, putting data on the belt.</p><p>Like this:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">def</span> <span class="hljs-title">b</span> (<span class="hljs-name">transformed-belt</span> (<span class="hljs-name"><span class="hljs-built_in">map</span></span> inc)))
(<span class="hljs-name">>!!</span> b <span class="hljs-number">2</span>)
<span class="hljs-comment">; Value: 3</span>
(<span class="hljs-name">>!!</span> b <span class="hljs-number">42</span>)
<span class="hljs-comment">; Value: 43</span>
</code></pre><p>It works! We’re putting in numbers, and out come the incremented ones.</p><p>When we’re done experimenting with the belt, we need to <code>close!</code> it. This will cause the worker thread to shutdown.</p><pre><code class="hljs clojure">(<span class="hljs-name">close!</span> b)
</code></pre><p>We can now experiment with something more complex, like that combined transducer we’ve talked about before:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">def</span> <span class="hljs-title">b</span> (<span class="hljs-name">transformed-belt</span> (<span class="hljs-name"><span class="hljs-built_in">comp</span></span> (<span class="hljs-name"><span class="hljs-built_in">map</span></span> inc) (<span class="hljs-name"><span class="hljs-built_in">remove</span></span> odd?))))
(<span class="hljs-name">>!!</span> b <span class="hljs-number">1</span>)
<span class="hljs-comment">; Value: 2</span>
(<span class="hljs-name">>!!</span> b <span class="hljs-number">2</span>)
(<span class="hljs-name">>!!</span> b <span class="hljs-number">3</span>)
<span class="hljs-comment">; Value: 4</span>
</code></pre><p>We got the transformed 1 and 3, but the intermediate value for 2 was odd, so it was snatched by the gnome and we never saw it.</p><p>There’s even more fun to be had! Let’s try <code>(partition-all 3)</code>:</p><pre><code class="hljs clojure">(<span class="hljs-name">close!</span> b)
(<span class="hljs-keyword">def</span> <span class="hljs-title">b</span> (<span class="hljs-name">transformed-belt</span> (<span class="hljs-name">partition-all</span> <span class="hljs-number">3</span>)))
(<span class="hljs-name">>!!</span> b <span class="hljs-number">1</span>)
</code></pre><p>Nothing…</p><pre><code class="hljs clojure">(<span class="hljs-name">>!!</span> b <span class="hljs-number">2</span>)
</code></pre><p>Still nothing…</p><pre><code class="hljs clojure">(<span class="hljs-name">>!!</span> b <span class="hljs-number">3</span>)
<span class="hljs-comment">; Value: [1 2 3]</span>
</code></pre><p>Blammo! Our gnome is now packaging together incoming items into bundles of three, caching them in the interim while the bundle is not complete yet. But if we close the input prematurely, it will acknowledge and produce the incomplete bundle:</p><pre><code class="hljs clojure">(<span class="hljs-name">>!!</span> b <span class="hljs-number">4</span>)
(<span class="hljs-name">>!!</span> b <span class="hljs-number">5</span>)
(<span class="hljs-name">close!</span> b)
<span class="hljs-comment">; Value: [4 5]</span>
</code></pre><p>In fact, <code>partition-all</code> is what prompted me to write this post. That code at work I mentioned actually included a transducer composition that had a <code>(net.cgrand.xforms/into [])</code> in it. That transducer (from Christophe Grand’s <a href="https://github.com/cgrand/xforms/">xforms</a> library) accumulates data until there’s nothing more to accumulate, and then emits all of it as one large vector. By replacing it with <code>partition-all</code>, I altered the downstream processing to handle multiple smaller batches rather than one huge batch, improving the system’s latency.</p><p>A small change for a huge win. Clojure continues to amaze me.</p><p>Plus, it’s fun to make JS-less animations in SVG. :)</p></div>tag:blog.danieljanus.pl,2023-07-20:post:iterating-treesA visual tree iterator in Rust2023-07-20T00:00:00Z<div><p>My <a href="/2023/07/06/learning-to-learn-rust/">adventure with learning Rust</a> continues. As a quick recap from the previous post, I’m writing a <a href="https://github.com/nathell/treeviewer">tree viewer</a>. I have now completed another major milestone, which is to rewrite the tree-printing function to use an iterator. (Rationale: it makes the code more reusable – I can, for instance, easily implement a tree-drawing view for <a href="https://github.com/gyscos/cursive">Cursive</a> with it.)</p><p>And, as usual, I’ve fallen into many traps before arriving at a working version. In this post, I’ll reflect on the mistakes I’ve made.</p><h2 id="the-problem">The problem</h2><p>Let’s start with establishing the problem. Given a <code>Tree</code> struct defined as:</p><pre><code class="hljs rust"><span class="hljs-keyword">pub</span> <span class="hljs-keyword">struct</span> <span class="hljs-title class_">Tree</span><T> {
value: T,
children: <span class="hljs-type">Vec</span><Tree<T>>,
}
</code></pre><p>I want it to have a <code>lines()</code> method returning an iterator, so that I can implement <code>print_tree</code> as:</p><pre><code class="hljs rust"><span class="hljs-keyword">fn</span> <span class="hljs-title function_">print_tree</span><T: Display>(t: &Tree<T>) {
<span class="hljs-keyword">for</span> <span class="hljs-variable">line</span> <span class="hljs-keyword">in</span> t.<span class="hljs-title function_ invoke__">lines</span>() {
<span class="hljs-built_in">println!</span>(<span class="hljs-string">"{}"</span>, line);
}
}
</code></pre><p>and have the output identical to the previous version.</p><h2 id="the-algorithm">The algorithm</h2><p>Before we dive into the iterator sea, let’s have a look at the algorithm. Imagine that we’re printing the tree (in sexp-notation) <code>(root (one (two) (three (four))) (five (six)))</code>. This is its dissected visual representation:</p><img src="/img/blog/tree-anatomy.png" alt="Anatomy of a tree">
<p>Each line consists of three concatenated elements, which I call “parent prefix”, “immediate prefix”, and “node value”. The immediate prefix is always (except for the root node) <code>"└─ "</code> or <code>"├─ "</code>, depending on whether the node in question is the last child of its parent or not. The parent prefix has variable length that depends on the node’s depth, and has the following properties:</p><ul><li><span>For any node, all its subnodes’ parent prefixes start with its parent prefix.</span></li><li><span>For any node, the parent prefixes of its direct children are obtained by appending <code>" "</code> or <code>"│ "</code> to its own parent prefix, again depending on whether the node is its parent’s last child or not.</span></li></ul><p>This gives rise to the following algorithm that calls itself recursively:</p><pre><code class="hljs rust"><span class="hljs-keyword">fn</span> <span class="hljs-title function_">print_tree</span><T>(t: &Tree<T>,
parent_prefix: &<span class="hljs-type">str</span>,
immediate_prefix: &<span class="hljs-type">str</span>,
parent_suffix: &<span class="hljs-type">str</span>)
<span class="hljs-keyword">where</span> T: Display
{
<span class="hljs-comment">// print the line for node t</span>
<span class="hljs-built_in">println!</span>(<span class="hljs-string">"{0}{1}{2}"</span>, parent_prefix, immediate_prefix, t.value);
<span class="hljs-comment">// print all children of t recursively</span>
<span class="hljs-keyword">let</span> <span class="hljs-keyword">mut </span><span class="hljs-variable">it</span> = t.children.<span class="hljs-title function_ invoke__">iter</span>().<span class="hljs-title function_ invoke__">peekable</span>();
<span class="hljs-keyword">let</span> <span class="hljs-variable">child_prefix</span> = <span class="hljs-built_in">format!</span>(<span class="hljs-string">"{0}{1}"</span>, parent_prefix, parent_suffix);
<span class="hljs-keyword">while</span> <span class="hljs-keyword">let</span> <span class="hljs-variable">Some</span>(child) = it.<span class="hljs-title function_ invoke__">next</span>() {
<span class="hljs-keyword">match</span> it.<span class="hljs-title function_ invoke__">peek</span>() {
<span class="hljs-literal">None</span> => <span class="hljs-title function_ invoke__">print_tree</span>(child, &child_prefix, <span class="hljs-string">"└─ "</span>, <span class="hljs-string">" "</span>),
<span class="hljs-title function_ invoke__">Some</span>(_) => <span class="hljs-title function_ invoke__">print_tree</span>(child, &child_prefix, <span class="hljs-string">"├─ "</span>, <span class="hljs-string">"│ "</span>),
}
}
}
</code></pre><p>The three extra string arguments start out as empty strings and become populated as the algorithm descends into the tree. The implementation uses a <a href="https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html">peekable</a> iterator over the <code>children</code> vector to construct the prefixes appropriately.</p><h2 id="building-an-iterator,-take-1">Building an iterator, take 1</h2><p>So the printing implementation is recursive. How do we write a recursive iterator in Rust? Is it even possible? I initially thought I would have to replace the recursion with an explicit stack stored in the iterator’s mutable state, started to write some code, and promptly got lost.</p><p>I then searched for the state-of-the-art on iterating through trees, and found <a href="https://fasterthanli.me/articles/recursive-iterators-rust">this post</a> by Amos Wenger. You might want to read it first before continuing; my final implementation ended up being an adaptation of one of the techniques described there.</p><p>My definition of tree is slightly different than Amos’s (mine has only one value in a node), but it’s easy enough to adapt his final solution to iterate over its values:</p><pre><code class="hljs rust"><span class="hljs-keyword">impl</span><T> Tree<T> <span class="hljs-keyword">where</span> T: Display {
<span class="hljs-keyword">pub</span> <span class="hljs-keyword">fn</span> <span class="hljs-title function_">lines</span><<span class="hljs-symbol">'a</span>>(&<span class="hljs-symbol">'a</span> <span class="hljs-keyword">self</span>) <span class="hljs-punctuation">-></span> <span class="hljs-type">Box</span><<span class="hljs-keyword">dyn</span> <span class="hljs-built_in">Iterator</span><Item = <span class="hljs-type">String</span>> + <span class="hljs-symbol">'a</span>> {
<span class="hljs-keyword">let</span> <span class="hljs-variable">child_iter</span> = <span class="hljs-keyword">self</span>.children.<span class="hljs-title function_ invoke__">iter</span>().<span class="hljs-title function_ invoke__">map</span>(|n| n.<span class="hljs-title function_ invoke__">lines</span>()).<span class="hljs-title function_ invoke__">flatten</span>();
<span class="hljs-type">Box</span>::<span class="hljs-title function_ invoke__">new</span>(
<span class="hljs-title function_ invoke__">once</span>(<span class="hljs-keyword">self</span>.value.<span class="hljs-title function_ invoke__">to_string</span>()).<span class="hljs-title function_ invoke__">chain</span>(child_iter)
)
}
}
</code></pre><p>(Note the <code>dyn</code> keyword; Rust started requiring it in this context sometime after Amos’s article was published.)</p><p>Clever! This sidesteps the issue of writing a custom iterator altogether, by chaining some standard ones, wrapping them in a box and sprinkling some lifetime annotation magic powder to appease the borrow checker. We also make it explicit that the iterator is returning strings, no matter what the type of tree nodes is.</p><p><em>But…</em> while it compiles and produces a sequence of strings, they don’t reflect the structure of the tree: there’s no pretty prefixing going on.</p><p>Let’s try to fix that. Clearly, the iterator-returning function will now need to take three additional arguments, just like <code>print_tree</code> – the first one will now be a <code>String</code> because we’ll be building it at runtime, and the other two are string literals so can just be <code>&'static str</code>s. Let’s try:</p><pre><code class="hljs rust"><span class="hljs-comment">// changing the name because we now accept extra params</span>
<span class="hljs-comment">// I want the original lines() to keep its signature</span>
<span class="hljs-keyword">pub</span> <span class="hljs-keyword">fn</span> <span class="hljs-title function_">prefixed_lines</span><<span class="hljs-symbol">'a</span>>(&<span class="hljs-symbol">'a</span> <span class="hljs-keyword">self</span>,
parent_prefix: <span class="hljs-type">String</span>,
immediate_prefix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>,
parent_suffix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>)
<span class="hljs-punctuation">-></span> <span class="hljs-type">Box</span><<span class="hljs-keyword">dyn</span> <span class="hljs-built_in">Iterator</span><Item = <span class="hljs-type">String</span>> + <span class="hljs-symbol">'a</span>>
{
<span class="hljs-keyword">let</span> <span class="hljs-variable">value</span> = <span class="hljs-built_in">format!</span>(<span class="hljs-string">"{0}{1}{2}"</span>, parent_prefix, immediate_prefix, <span class="hljs-keyword">self</span>.value);
<span class="hljs-keyword">let</span> <span class="hljs-keyword">mut </span><span class="hljs-variable">peekable</span> = <span class="hljs-keyword">self</span>.children.<span class="hljs-title function_ invoke__">iter</span>().<span class="hljs-title function_ invoke__">peekable</span>();
<span class="hljs-keyword">let</span> <span class="hljs-variable">child_iter</span> = peekable
.<span class="hljs-title function_ invoke__">map</span>(|n| {
<span class="hljs-keyword">let</span> <span class="hljs-variable">child_prefix</span> = <span class="hljs-built_in">format!</span>(<span class="hljs-string">"{0}{1}"</span>, parent_prefix, parent_suffix);
<span class="hljs-keyword">let</span> <span class="hljs-variable">last</span> = !peekable.<span class="hljs-title function_ invoke__">peek</span>().<span class="hljs-title function_ invoke__">is_some</span>();
<span class="hljs-keyword">let</span> <span class="hljs-variable">immediate_prefix</span> = <span class="hljs-keyword">if</span> last { <span class="hljs-string">"└─ "</span> } <span class="hljs-keyword">else</span> { <span class="hljs-string">"├─ "</span> };
<span class="hljs-keyword">let</span> <span class="hljs-variable">parent_suffix</span> = <span class="hljs-keyword">if</span> last { <span class="hljs-string">" "</span> } <span class="hljs-keyword">else</span> { <span class="hljs-string">"│ "</span> };
n.<span class="hljs-title function_ invoke__">prefixed_lines</span>(child_prefix, immediate_prefix, parent_suffix)
})
.<span class="hljs-title function_ invoke__">flatten</span>();
<span class="hljs-type">Box</span>::<span class="hljs-title function_ invoke__">new</span>(
<span class="hljs-title function_ invoke__">once</span>(value).<span class="hljs-title function_ invoke__">chain</span>(child_iter)
)
}
</code></pre><p>And, sure enough, it doesn’t compile. One of the things that Rust complains about is:</p><pre><code>error[E0373]: closure may outlive the current function,
but it borrows `peekable`, which is owned by the current function
--> src/main.rs:55:18
|
55 | .map(|n| {
| ^^^ may outlive borrowed value `peekable`
56 | let child_prefix = format!("{0}{1}"...
57 | let last = !peekable.peek().is_some();
| -------- `peekable` is borrowed here
|
note: closure is returned here
--> src/main.rs:64:9
|
64 | Box::new(once(value).chain(child_iter))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: to force the closure to take ownership of `peekable`
(and any other referenced variables), use the `move` keyword
|
55 | .map(move |n| {
| ++++
</code></pre><p>So trying to borrow the iterator from within the closure passed to <code>map()</code> is non-kosher. I’m not sure where the “may outlive the current function” comes from, but I think this is because <a href="https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.map">the iterator returned by <code>map</code> is lazy</a>, and so the closure needs to be able to live for at least as long as the resulting iterator does. The suggestion of using <code>move</code> doesn’t work, because it then invalidates the <code>map</code> call. (Rust complained about borrowing <code>parent_prefix</code> and <code>parent_suffix</code> as well, and <code>move</code> does work for those.)</p><h2 id="taking-a-step-back">Taking a step back</h2><p>I was not able to find a way out of this conundrum. But after re-reading Amos’s post, I’ve decided to revisit his “bad” approach, with a custom iterator (which I now think is actually not bad at all). It made all the more sense to me when I considered future extensibility: eventually I want to be able to render certain subtrees collapsed, and I want the iterator to know about that.</p><p>It took me a while to understand how that <a href="https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c2cf6a965c3637553edd95eecc1993cd">custom iterator</a> works. It doesn’t have an explicit stack and doesn’t try to “de-recursivize” the process! Instead, it holds two sub-iterators, one initially iterating over the node values (<code>viter</code>) and the other over children (<code>citer</code>). The <code>next()</code> method just tries <code>viter</code> first; if it returns nothing, then a next subtree is picked from <code>citer</code>, and <code>viter</code> (by now already consumed) <em>is replaced by another instance of the same iterator, but for that subtree</em>.</p><p>Meditate on this for a while. There’s a lot going on here.</p><ul><li><span><code>viter</code> starts out as an iterator over a vector (a <code>std::slice::Iter</code>), and then gets replaced by a tree iterator (Amos’s <code>NodeIter</code>).</span></li><li><span>This is possible because it’s declared as a <code>Box<Iterator<Item = &'a i32> + 'a></code>. TIL: in Rust, you can’t use a trait directly as a type for a struct field (because there’s no telling what its size will be), but you <em>can</em> put it into a <code>Box</code> (or, I guess, <code>Rc</code> or <code>Arc</code>). Polymorphism, baby!</span></li><li><span>Recursion is achieved by having <code>NodeIter</code> contain a member that, at times, is itself another <code>NodeIter</code>; whereas the correct behaviour is obtained by having those <code>NodeIters</code> instantiated at the right moment.</span></li></ul><p>Whoa. Now <em>that’s</em> clever. I probably wouldn’t have thought about this. It’s good to be standing on the shoulders of giants. Thanks, Amos.</p><p>Anyway, let’s adapt it to our use-case and add the prefixes to the iterator’s state:</p><pre><code class="hljs rust"><span class="hljs-keyword">pub</span> <span class="hljs-keyword">struct</span> <span class="hljs-title class_">TreeIterator</span><<span class="hljs-symbol">'a</span>, T> {
parent_prefix: <span class="hljs-type">String</span>,
immediate_prefix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>,
parent_suffix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>,
viter: <span class="hljs-type">Box</span><<span class="hljs-keyword">dyn</span> <span class="hljs-built_in">Iterator</span><Item = <span class="hljs-type">String</span>> + <span class="hljs-symbol">'a</span>>,
citer: <span class="hljs-type">Box</span><<span class="hljs-keyword">dyn</span> <span class="hljs-built_in">Iterator</span><Item = &<span class="hljs-symbol">'a</span> Tree<T>> + <span class="hljs-symbol">'a</span>>,
}
</code></pre><p>And our iterator implementation follows Amos’s, except that we handle the prefixes and initialize <code>viter</code> with a <a href="https://doc.rust-lang.org/std/iter/struct.Once.html"><code>Once</code></a> iterator:</p><pre><code class="hljs rust"><span class="hljs-keyword">impl</span><T> Tree<T> <span class="hljs-keyword">where</span> T: Display {
<span class="hljs-keyword">pub</span> <span class="hljs-keyword">fn</span> <span class="hljs-title function_">prefixed_lines</span><<span class="hljs-symbol">'a</span>>(&<span class="hljs-symbol">'a</span> <span class="hljs-keyword">self</span>,
parent_prefix: <span class="hljs-type">String</span>,
immediate_prefix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>,
parent_suffix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>)
<span class="hljs-punctuation">-></span> TreeIterator<<span class="hljs-symbol">'a</span>, T>
{
TreeIterator {
parent_prefix: parent_prefix,
immediate_prefix: immediate_prefix,
parent_suffix: parent_suffix,
viter: <span class="hljs-type">Box</span>::<span class="hljs-title function_ invoke__">new</span>(<span class="hljs-title function_ invoke__">once</span>(<span class="hljs-built_in">format!</span>(<span class="hljs-string">"{}"</span>, &<span class="hljs-keyword">self</span>.value))),
citer: <span class="hljs-type">Box</span>::<span class="hljs-title function_ invoke__">new</span>(<span class="hljs-keyword">self</span>.children.<span class="hljs-title function_ invoke__">iter</span>().<span class="hljs-title function_ invoke__">peekable</span>()),
}
}
}
<span class="hljs-keyword">impl</span><<span class="hljs-symbol">'a</span>, T> <span class="hljs-built_in">Iterator</span> <span class="hljs-keyword">for</span> <span class="hljs-title class_">TreeIterator</span><<span class="hljs-symbol">'a</span>, T> <span class="hljs-keyword">where</span> T: Display {
<span class="hljs-keyword">type</span> <span class="hljs-title class_">Item</span> = <span class="hljs-type">String</span>;
<span class="hljs-keyword">fn</span> <span class="hljs-title function_">next</span>(&<span class="hljs-keyword">mut</span> <span class="hljs-keyword">self</span>) <span class="hljs-punctuation">-></span> <span class="hljs-type">Option</span><<span class="hljs-keyword">Self</span>::Item> {
<span class="hljs-keyword">if</span> <span class="hljs-keyword">let</span> <span class="hljs-variable">Some</span>(val) = <span class="hljs-keyword">self</span>.viter.<span class="hljs-title function_ invoke__">next</span>() {
<span class="hljs-title function_ invoke__">Some</span>(<span class="hljs-built_in">format!</span>(<span class="hljs-string">"{0}{1}{2}"</span>, <span class="hljs-keyword">self</span>.parent_prefix, <span class="hljs-keyword">self</span>.immediate_prefix, val))
} <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">let</span> <span class="hljs-variable">Some</span>(child) = <span class="hljs-keyword">self</span>.citer.<span class="hljs-title function_ invoke__">next</span>() {
<span class="hljs-keyword">let</span> <span class="hljs-variable">last</span> = !<span class="hljs-keyword">self</span>.citer.<span class="hljs-title function_ invoke__">peek</span>().<span class="hljs-title function_ invoke__">is_some</span>();
<span class="hljs-keyword">let</span> <span class="hljs-variable">immediate_prefix</span> = <span class="hljs-keyword">if</span> last { <span class="hljs-string">"└─ "</span> } <span class="hljs-keyword">else</span> { <span class="hljs-string">"├─ "</span> };
<span class="hljs-keyword">let</span> <span class="hljs-variable">parent_suffix</span> = <span class="hljs-keyword">if</span> last { <span class="hljs-string">" "</span> } <span class="hljs-keyword">else</span> { <span class="hljs-string">"│ "</span> };
<span class="hljs-keyword">let</span> <span class="hljs-variable">subprefix</span> = <span class="hljs-built_in">format!</span>(<span class="hljs-string">"{0}{1}"</span>, <span class="hljs-keyword">self</span>.parent_prefix, <span class="hljs-keyword">self</span>.parent_suffix);
<span class="hljs-keyword">self</span>.viter = <span class="hljs-type">Box</span>::<span class="hljs-title function_ invoke__">new</span>(child.<span class="hljs-title function_ invoke__">prefixed_lines</span>(subprefix, immediate_prefix, parent_suffix));
<span class="hljs-keyword">self</span>.<span class="hljs-title function_ invoke__">next</span>()
} <span class="hljs-keyword">else</span> {
<span class="hljs-literal">None</span>
}
}
}
</code></pre><p>Looks sensible, right? Except (you guessed it!) it doesn’t compile:</p><pre><code class="hljs rust">error[E0599]: no method named `peek` found <span class="hljs-keyword">for</span> <span class="hljs-title class_">struct</span>
`<span class="hljs-type">Box</span><(<span class="hljs-keyword">dyn</span> <span class="hljs-built_in">Iterator</span><Item = &<span class="hljs-symbol">'a</span> Tree<T>> + <span class="hljs-symbol">'a</span>)>` <span class="hljs-keyword">in</span> the current scope
-<span class="hljs-punctuation">-></span> src/main.rs:<span class="hljs-number">38</span>:<span class="hljs-number">36</span>
|
<span class="hljs-number">38</span> | <span class="hljs-keyword">let</span> <span class="hljs-variable">last</span> = !<span class="hljs-keyword">self</span>.citer.<span class="hljs-title function_ invoke__">peek</span>().<span class="hljs-title function_ invoke__">is_some</span>();
| ^^^^ help: there is a method with a
| similar name: `peekable`
</code></pre><p>Ah, right. We’ve forgotten to tell Rust that <code>citer</code> contains a <code>Peekable</code>. Let’s fix that:</p><pre><code class="hljs rust"><span class="hljs-keyword">pub</span> <span class="hljs-keyword">struct</span> <span class="hljs-title class_">TreeIterator</span><<span class="hljs-symbol">'a</span>, T> {
<span class="hljs-comment">// … other fields as before</span>
citer: <span class="hljs-type">Box</span><Peekable<<span class="hljs-keyword">dyn</span> <span class="hljs-built_in">Iterator</span><Item = &<span class="hljs-symbol">'a</span> Tree<T>> + <span class="hljs-symbol">'a</span>>>,
}
</code></pre><p>Nope, that doesn’t compile either:</p><pre><code class="hljs rust">error[E0277]: the size <span class="hljs-keyword">for</span> <span class="hljs-title class_">values</span> of <span class="hljs-keyword">type</span> `(<span class="hljs-keyword">dyn</span> <span class="hljs-built_in">Iterator</span><Item = &<span class="hljs-symbol">'a</span> Tree<T>> + <span class="hljs-symbol">'a</span>)`
cannot be known at compilation time
-<span class="hljs-punctuation">-></span> src/main.rs:<span class="hljs-number">16</span>:<span class="hljs-number">12</span>
|
<span class="hljs-number">16</span> | citer: <span class="hljs-type">Box</span><Peekable<<span class="hljs-keyword">dyn</span> <span class="hljs-built_in">Iterator</span><Item = &<span class="hljs-symbol">'a</span> Tree<T>> + <span class="hljs-symbol">'a</span>>>,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| doesn<span class="hljs-symbol">'t</span> have a size known at compile-time
|
= help: the <span class="hljs-keyword">trait</span> `<span class="hljs-built_in">Sized</span>` is not implemented <span class="hljs-keyword">for</span>
`(<span class="hljs-keyword">dyn</span> <span class="hljs-built_in">Iterator</span><Item = &<span class="hljs-symbol">'a</span> Tree<T>> + <span class="hljs-symbol">'a</span>)`
note: required by a bound <span class="hljs-keyword">in</span> `Peekable`
</code></pre><p>Bummer. We can put a trait of unknown size in a <code>Box</code>, but we can’t put a <code>Peekable</code> in between! <code>Peekable</code> needs to know the size of its contents at compile time. Trying to convince it by sprinkling <code>+ Sized</code> in various places doesn’t work.</p><p>Fortunately, we know the <em>actual</em> type of <code>citer</code>. It’s an iterator over <code>Vec<Tree<T>></code>, so it’s a <code>std::slice::Iter<Tree<T>></code>. Let’s put it in the definition of <code>TreeIterator</code>:</p><pre><code class="hljs rust"><span class="hljs-keyword">use</span> std::slice::Iter;
<span class="hljs-keyword">pub</span> <span class="hljs-keyword">struct</span> <span class="hljs-title class_">TreeIterator</span><<span class="hljs-symbol">'a</span>, T> {
<span class="hljs-comment">// … other fields as before</span>
citer: <span class="hljs-type">Box</span><Peekable<Iter<<span class="hljs-symbol">'a</span>, Tree<T>>>>,
}
</code></pre><p>And it compiles!</p><h2 id="removing-the-root">Removing the root</h2><p>Here’s what happens when you try to run treeviewer with this implementation on a very simple tree:</p><pre><code class="hljs bash">$ <span class="hljs-built_in">echo</span> -e <span class="hljs-string">'one\ntwo'</span> | ./target/debug/treeviewer
├─ one
└─ two
</code></pre><p>Seems good, but that empty line is worrying. That’s because treeviewer takes slash-separated paths as input, and because the paths can begin with anything, it puts everything under a pre-existing root node with an empty <code>value</code>. We don’t want the output to contain that root node.</p><p>Simple, right? We just need to initialize <code>viter</code> with an empty iterator if one of the prefixes is also empty:</p><pre><code class="hljs rust"><span class="hljs-keyword">pub</span> <span class="hljs-keyword">fn</span> <span class="hljs-title function_">prefixed_lines</span><<span class="hljs-symbol">'a</span>>(&<span class="hljs-symbol">'a</span> <span class="hljs-keyword">self</span>,
parent_prefix: <span class="hljs-type">String</span>,
immediate_prefix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>,
parent_suffix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>)
<span class="hljs-punctuation">-></span> TreeIterator<<span class="hljs-symbol">'a</span>, T>
{
TreeIterator {
<span class="hljs-comment">// … other fields as before</span>
viter: <span class="hljs-type">Box</span>::<span class="hljs-title function_ invoke__">new</span>(<span class="hljs-keyword">if</span> immediate_prefix.<span class="hljs-title function_ invoke__">is_empty</span>() {
<span class="hljs-title function_ invoke__">empty</span>()
} <span class="hljs-keyword">else</span> {
<span class="hljs-title function_ invoke__">once</span>(<span class="hljs-built_in">format!</span>(<span class="hljs-string">"{}"</span>, &<span class="hljs-keyword">self</span>.value))
}),
}
}
</code></pre><p>And (this is becoming obvious by now) we’re rewarded by yet another interesting error message:</p><pre><code class="hljs rust">error[E0308]: `<span class="hljs-keyword">if</span>` and `<span class="hljs-keyword">else</span>` have incompatible types
-<span class="hljs-punctuation">-></span> src/main.rs:<span class="hljs-number">49</span>:<span class="hljs-number">32</span>
|
<span class="hljs-number">46</span> | viter: <span class="hljs-type">Box</span>::<span class="hljs-title function_ invoke__">new</span>(<span class="hljs-keyword">if</span> immediate_prefix.<span class="hljs-title function_ invoke__">is_empty</span>() {
| _________________-
<span class="hljs-number">47</span> | | <span class="hljs-title function_ invoke__">empty</span>()
| | ------- expected because of this
<span class="hljs-number">48</span> | | } <span class="hljs-keyword">else</span> {
<span class="hljs-number">49</span> | | <span class="hljs-title function_ invoke__">once</span>(<span class="hljs-built_in">format!</span>(<span class="hljs-string">"{}"</span>, &<span class="hljs-keyword">self</span>.value))
| | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| | expected `Empty<_>`, found `Once<<span class="hljs-type">String</span>>`
<span class="hljs-number">50</span> | | }),
| |_________________- `<span class="hljs-keyword">if</span>` and `<span class="hljs-keyword">else</span>` have incompatible types
|
= note: expected <span class="hljs-keyword">struct</span> `std::iter::Empty<_>`
found <span class="hljs-keyword">struct</span> `std::iter::Once<<span class="hljs-type">String</span>>`
</code></pre><p>Ahhh. Even though both branches of the <code>if</code> expression have types that meet the trait requirement (<code>Iterator<Item = String></code>), these are <em>different types</em>. Apparently, <code>if</code> insists on both branches being the same type.</p><p>What we can do is lift the <code>if</code> upwards:</p><pre><code class="hljs rust"><span class="hljs-keyword">pub</span> <span class="hljs-keyword">fn</span> <span class="hljs-title function_">prefixed_lines</span><<span class="hljs-symbol">'a</span>>(&<span class="hljs-symbol">'a</span> <span class="hljs-keyword">self</span>,
parent_prefix: <span class="hljs-type">String</span>,
immediate_prefix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>,
parent_suffix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>)
<span class="hljs-punctuation">-></span> TreeIterator<<span class="hljs-symbol">'a</span>, T>
{
<span class="hljs-keyword">if</span> immediate_prefix.<span class="hljs-title function_ invoke__">is_empty</span>() {
TreeIterator {
<span class="hljs-comment">// … other fields as before</span>
viter: <span class="hljs-type">Box</span>::<span class="hljs-title function_ invoke__">new</span>(<span class="hljs-title function_ invoke__">empty</span>()),
}
} <span class="hljs-keyword">else</span> {
TreeIterator {
<span class="hljs-comment">// … other fields as before, repeated</span>
viter: <span class="hljs-type">Box</span>::<span class="hljs-title function_ invoke__">new</span>(<span class="hljs-title function_ invoke__">once</span>(<span class="hljs-built_in">format!</span>(<span class="hljs-string">"{}"</span>, &<span class="hljs-keyword">self</span>.value))),
}
}
}
</code></pre><p>Yuck. We needed to duplicate most of the instantiation details of <code>TreeIterator</code>. But at least it compiles and works – the root is gone!</p><pre><code class="hljs bash">$ <span class="hljs-built_in">echo</span> -e <span class="hljs-string">'one\ntwo'</span> | ./target/debug/treeviewer
├─ one
└─ two
</code></pre><h2 id="fixing-a-bug">Fixing a bug</h2><p>Or does it? Let’s try the original tree from our illustration:</p><pre><code class="hljs bash">$ <span class="hljs-built_in">echo</span> -e <span class="hljs-string">'one/two\none/three/four\nfive/six'</span> | ./target/debug/treeviewer
├─ one
├─ │ ├─ two
├─ │ └─ three
├─ │ └─ │ └─ four
└─ five
└─ └─ six
</code></pre><p>Uh oh. It’s totally garbled. Time to go back to the drawing board.</p><p>It took me quite a few <code>println!()</code> debugging statements to figure out what was going on. Remember, the <code>TreeIterator</code> for the whole tree will contain a nested <code>TreeIterator</code> in its <code>viter</code> field, which in turn may contain another nested <code>TreeIterator</code>, and so on. Each of these nested iterators eventually passes its value to the “parent” iterator… decorating it with prefixes, again and again!</p><p>To fix this, we need to differentiate between two cases:</p><ol><li><span>We’re producing the value for the node we’re holding (that’s when we need the prefixes);</span></li><li><span>We’re propagating up the value returned by <code>viter</code> that holds a nested <code>TreeIterator</code> (in this case we need to return it unchanged).</span></li></ol><p>We’ll add two more fields to <code>TreeIterator</code>: a boolean indicating whether we’ve already <code>emitted</code> the value at the node in question, and a reference to that <code>value</code> itself.</p><pre><code class="hljs rust"><span class="hljs-keyword">pub</span> <span class="hljs-keyword">struct</span> <span class="hljs-title class_">TreeIterator</span><<span class="hljs-symbol">'a</span>, T> {
<span class="hljs-comment">// … other fields as before</span>
emitted: <span class="hljs-type">bool</span>,
value: &<span class="hljs-symbol">'a</span> T,
}
</code></pre><p>And we initialize them as follows:</p><pre><code class="hljs rust"><span class="hljs-keyword">pub</span> <span class="hljs-keyword">fn</span> <span class="hljs-title function_">prefixed_lines</span><<span class="hljs-symbol">'a</span>>(&<span class="hljs-symbol">'a</span> <span class="hljs-keyword">self</span>,
parent_prefix: <span class="hljs-type">String</span>,
immediate_prefix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>,
parent_suffix: &<span class="hljs-symbol">'static</span> <span class="hljs-type">str</span>)
<span class="hljs-punctuation">-></span> TreeIterator<<span class="hljs-symbol">'a</span>, T>
{
TreeIterator {
emitted: immediate_prefix.<span class="hljs-title function_ invoke__">is_empty</span>(),
value: &<span class="hljs-keyword">self</span>.value,
viter: <span class="hljs-type">Box</span>::<span class="hljs-title function_ invoke__">new</span>(<span class="hljs-title function_ invoke__">empty</span>()),
<span class="hljs-comment">// … other fields as before</span>
}
}
</code></pre><p>Note that the logic of skipping emitting the root has been moved to the initialization of <code>emitted</code>. This lets us kill the duplication! We now initialize <code>viter</code> to <code>empty()</code> – it no longer matters; this initial value will be unused and eventually replaced by child <code>TreeIterator</code>s.</p><p>Finally, we need to amend the implementation of <code>next()</code>:</p><pre><code class="hljs rust"><span class="hljs-keyword">fn</span> <span class="hljs-title function_">next</span>(&<span class="hljs-keyword">mut</span> <span class="hljs-keyword">self</span>) <span class="hljs-punctuation">-></span> <span class="hljs-type">Option</span><<span class="hljs-keyword">Self</span>::Item> {
<span class="hljs-keyword">if</span> !<span class="hljs-keyword">self</span>.emitted {
<span class="hljs-keyword">self</span>.emitted = <span class="hljs-literal">true</span>;
<span class="hljs-comment">// decorate value with prefixes</span>
<span class="hljs-title function_ invoke__">Some</span>(<span class="hljs-built_in">format!</span>(<span class="hljs-string">"{0}{1}{2}"</span>, <span class="hljs-keyword">self</span>.parent_prefix, <span class="hljs-keyword">self</span>.immediate_prefix, <span class="hljs-keyword">self</span>.value))
} <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">let</span> <span class="hljs-variable">Some</span>(val) = <span class="hljs-keyword">self</span>.viter.<span class="hljs-title function_ invoke__">next</span>() {
<span class="hljs-title function_ invoke__">Some</span>(val) <span class="hljs-comment">// propagate unchanged</span>
} <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">let</span> <span class="hljs-variable">Some</span>(child) = <span class="hljs-keyword">self</span>.citer.<span class="hljs-title function_ invoke__">next</span>() {
<span class="hljs-comment">// … this part doesn’t change</span>
} <span class="hljs-keyword">else</span> {
<span class="hljs-literal">None</span>
}
}
</code></pre><p>And <em>this</em> version, finally, compiles and works as expected:</p><pre><code class="hljs bash">$ <span class="hljs-built_in">echo</span> -e <span class="hljs-string">'one/two\none/three/four\nfive/six'</span> | ./target/debug/treeviewer
├─ one
│ ├─ two
│ └─ three
│ └─ four
└─ five
└─ six
</code></pre><h2 id="takeaways">Takeaways</h2><p>There are quite a few things I learned about Rust in the process, and then there are meta-learnings. Let’s recap the Rust-specific ones first.</p><ul><li><span>You can’t put a trait in a struct directly, but you can put a <code>Box</code> of traits.</span></li><li><span>But not a <code>Box</code> of <code>Foo</code> of traits, where <code>Foo</code> expect its parameter to be <code>Sized</code>.</span></li><li><span>If you’re <code>map()</code>ping a closure over an iterator, you can’t access that iterator itself from within the closure.</span></li><li><span>Closures by default borrow stuff that they close over, but you can move that stuff to the closure instead with the <code>move</code> keyword. If I understand correctly, it’s an all-or-nothing move; no mix and match.</span></li><li><span>In an <code>if</code> expression, all branch expressions must be of the same type; conforming to the same trait is not enough.</span></li></ul><p>And now the general ones.</p><p>First off, Rust is <em>hard</em>. (The least wonder in the world.) Most of the traps I’ve fallen into are accidental complexity, not inherent in the simple problem. I guess that it’s really a matter of the initial steepness of Rust’s learning curve, and that things become easier once you’re past the initial hurdles – you train your instincts to avoid these tarpits and keep the compiler happy.</p><p>I’m still very much a newcomer to Rust, so I’m pretty sure I ended up taking a suboptimal approach. A seasoned Rustacean would probably write this code in an altogether different way. If you have suggestions how to improve my code, or how to attack the problem from different angles, tell me!</p><p>As an experiment in learning, I’ve decided to reflect on my mistakes more frequently. I elaborate on it in my <a href="/2023/07/06/learning-to-learn-rust/">previous post</a>, which also discusses changes I’ve made to my workflow to make learning easier.</p><p>Writing the present post showed me how much time it takes. It took me just over an hour to fall into all the traps described in this post and find a way out. A few hours, if you count reading Amos’s post and contemplating the problem. In contrast, this write-up took about two days, plus some <a href="https://mastodon.social/@nathell/110725780205595986">yak shaving</a> it led me to. Part of the reason is that the <em>actual</em> road that I went through was much more bumpy than described here. While writing this, I had to go through no fewer than fifty-six compilation attempts. Here are some of them, with one-line descriptions and a tick or cross to indicate whether the compilation attempt was successful:</p><img src="/img/blog/rust-compilation-attempts.png" alt="Some compilation attempts">
<p>Yet I think it’s worth it. Some of the errors I’ve fixed groping in the dark, kind of randomly: I have now revisited them and I feel I have a much more solid understanding of what’s going on.</p><p>And finally: if you’re into Rust, Amos’s blog (<a href="https://mastodon.social/@nathell/110725780205595986">fasterthanli.me</a>) is an excellent resource. Go sponsor him on GitHub if these articles are of value to you.</p></div>tag:blog.danieljanus.pl,2023-07-06:post:learning-to-learn-rustLearning to learn Rust2023-07-06T00:00:00Z<div><p>I’m enjoying a two-month sabbatical this summer. It’s been great so far! I’ve used almost half of the time to <a href="https://danieljanus.substack.com/about">cycle through the entire Great Britain</a> and let my body work physically and my mind rest (usually, the opposite is true). And now that I’m back, I’ve switched focus to a few personal projects that I have really wanted to work on for a while but never found time.</p><p>One of these projects is to learn Rust. Clojure has made me lazy and it’s really high time for me to flex the language-learning muscles. But while the title says “Rust,” there is nothing Rust-specific about the tip I’m about to share: it can be applied to many programming languages.</p><p>I learn best by doing, so after learning the first few chapters of <a href="https://doc.rust-lang.org/book/">the Rust book</a>, I set off to write a simple but non-trivial program: a console-based tree viewer. The idea is to have a TUI that you could feed with a set of slash-separated paths:</p><pre><code>one/two
one/three/four
five/six
</code></pre><p>and have it render the tree visually:</p><pre><code>├─ one
│ ├─ two
│ └─ three
│ └─ four
└─ five
└─ six
</code></pre><p>allowing to scroll it, search it and (un)fold individual subtrees. The paths may come from the filesystem (e.g. you could pipe <code>find . -type f</code> into it), but not necessarily: they might be S3 object paths, hierarchical names of RocksDB keys (my actual use case), or represent any other tree.</p><p>Today I hit a major milestone: I <a href="https://github.com/nathell/treeviewer/commit/fb1332aa5bd0f695604522492ccd893dac28066a">wrote a function</a>, <code>append_path</code>, that, given a tree of strings and a slash-separated path, creates new nodes as needed and adds them to the tree. Needless to say, I didn’t get it right on the first attempt. I fought with the compiler and its borrow checker <em>a lot</em>.</p><p>I guess that’s a typical ordeal that a Rust newbie goes through. But along treeviewer’s code, I keep an org-mode file called <code>LEARN</code> where I jot down things that I might want to remember for the future. So after getting <code>append_path</code> right, I wanted to pause and look back at the failed attempts and the corresponding compiler errors, to try to make sense of them, armed with my new knowledge.</p><p>But… <em>which</em> versions of the code caused <em>which</em> errors? I had no idea! And the Emacs undo tree is really hard to dive in.</p><p>An obvious way out is to commit early and often. But this (1) requires a discipline that I don’t have at the moment, and (2) pollutes the Git history. So, instead, I automated it.</p><p>I’ve added a Makefile to my repo. Instead of <code>cargo run</code>, I will now be compiling and executing the code via <code>make run</code>. In addition to Cargo, this runs <a href="https://github.com/nathell/treeviewer/blob/main/scripts/record.sh">a script</a> that:</p><ul><li><span>Commits everything that’s uncommitted yet</span></li><li><span>Creates an annotated tag with that commit, named <code>build-$TIMESTAMP</code>, that serves as a snapshot of the code that was built</span></li><li><span>Reverts the working tree to the state it was in (whatever was staged stays staged, whatever was unstaged remains unstaged)</span></li></ul><p>This workflow change has the nice property of being unintrusive. I can hack on the code, compile, commit and rebase to my heart’s delight. But when I need to look back at the most recent compilation attempts, all I need to do is <code>git tag</code> and from there I can meditate on individual mistakes I made.</p><p>Why tags and not branches, one might ask? I guess this is a matter of personal preference. I opted for tags because I want to minimise the chance of accidentally pushing the branch. The resulting tags are technically dangling, which I don’t see as an issue: the older the build tag, the less likely I am to need it in the future, so I see myself cleaning up old builds every now and then.</p><p>When working with a language I’m proficient in, I don’t need this. But as a learning aid, I already see the idea as indispensable. Feel free to reuse it!</p></div>tag:blog.danieljanus.pl,2022-11-07:post:dcd-22Dutch Clojure Days 20222022-11-07T00:00:00Z<div><p>It’s <a href="/2008/04/22/eclm-2008/">a tradition of this blog</a> that I write down impressions on my way back from Amsterdam conferences (<em>addendum a week later</em>: unfortunately I took a flight this time, too short to complete this entry, and it had to wait until I caught up). This time, it was <a href="https://clojuredays.org/">Dutch Clojure Days 2022</a>, my first post-COVID full-size conference and the first DCD I’ve ever been to. And, hopefully, not last. I know I want to come back.</p><p>This is in no small part thanks to Carlo Sciolla and the whole organising team of DCD. Y’all absolutely rock! I’d like to extend my <code>(bit-shift-left 1 20)</code> thank yous.</p><p>I also loved the friendly, informal, meetup-y, no-ceremony vibe of the event. I felt right at home. The venue resonated with that vibe as well. Cloud Pirates’ space might not be the largest or the fanciest conference room ever, but it felt welcoming: one step from the street and you’re there.</p><p>And you listen to the talks!</p><h2 id="nikita-prokopov:-clojure-+-ui-=-❤️">Nikita Prokopov: <em>Clojure + UI = ❤️</em></h2><p>(Did you ever try italicising emoji?)</p><p>I’ve been keeping an eye on Nikita’s <a href="https://github.com/HumbleUI/HumbleUI">HumbleUI</a> ever since it was publicly announced, and this talk makes me eager to try it out even more. I do have a use-case in mind (Spleen, my Scrabble engine that predates Leiningen by a few days); I’ve been using <a href="https://github.com/cljfx/cljfx">cljfx</a> to experiment with an UI so far, but I guess I’ll try HumbleUI as well and see how it fares.</p><p>HumbleUI may be in pre-alpha, but it’s already practical: Nikita used it to write a presentation engine for his talk!</p><h2 id="paula-gearon:-a-library-reckoning">Paula Gearon: <em>A Library Reckoning</em></h2><p>Did you know that Paula is the person we owe a cross-platform <code>clojure.math</code> to? I had no idea! And I greatly enjoyed this highly technical, low-level talk. I learned more that I probably wanted to know about IEEE-754 and the technicalities of floating-point number crunching in JavaScript. And because of Paula’s hard work, dedication, attention to detail, and working closely with the core CLJS team, the whole community gets to benefit! This is open source at its finest. I’m left with an immense sense of gratitude.</p><p>I recall <a href="https://www.youtube.com/watch?v=xvk-Gnydn54&t=342s">Carin Meier’s keynote from EuroClojure 2016</a>, where she introduces (following David Mumford) four tribes of programmers: explorers, alchemists, wrestlers, and detectives. I think both Paula and I share the trait of being detectives: people who find enjoyment in diving into deep, detailed aspects of programming.</p><h2 id="lunch">Lunch</h2><p>It merits separate attention, as it was one of the best conference lunches I ever had. If you’re in Amsterdam, do treat yourself to some great food at <a href="https://www.mediamatic.net/en/ETEN">Mediamatic</a>. They’re a lovely, vegan-only, quiet place at the waterside, allowing an escape from the hustle and bustle of the city. They grow their own produce, and the resident cat makes sure that everyone feels comfortable!</p><h2 id="lightning-talks">Lightning talks</h2><h3 id="me:-golfing-clojure:-check-checker-in-<280-characters-of-clojure">Me: <em>Golfing Clojure: Check checker in <280 characters of Clojure</em></h3><p>I won’t assess my own lightning talk. (You can check out the <a href="https://danieljanus.pl/talks/2022-clojuredays/">slides</a> if you want.) But I did manage to make the audience laugh, and I’m happy.</p><h3 id="brendon-walsh:-sorry-for-the-convenience:-the-importance-of-progressive-enhancement">Brendon Walsh: <em>Sorry For The Convenience: The Importance of Progressive Enhancement</em></h3><p>I’ll be honest: I was winding down after my own, so didn’t pay much attention to this one. But it did reiterate a few points from Rich’s spec-ulation talk, and this is always worthwhile.</p><h3 id="adrien-siegfried:-tagfl,-task-analysis-generated-from-lisp">Adrien Siegfried: <em>tagfl, task analysis generated from lisp</em></h3><p>Another winding-down talk for me. The live demo, however, did catch my eye. If I ever find myself needing to generate a task graph, I’ll be back.</p><h3 id="adam-helins:-clojupedia,-linking-the-clojure-ecosystem">Adam Helins: <em>Clojupedia, linking the Clojure ecosystem</em></h3><p>Adam has some great ideas about how to make the Clojure library ecosystem more discoverable and annotable. I will keep fingers crossed for <a href="https://clojupedia.org/#/page/Clojupedia.org">Clojupedia</a>, and want to contribute.</p><h2 id="sung-shik-jongmans:-automated-correctness-analysis-for-core.async">Sung-Shik Jongmans: <em>Automated Correctness Analysis for core.async</em></h2><p>A reprise from this year’s <a href="https://clojured.de/">:clojureD</a>, which I unfortunately missed. But I’m so glad I had a second chance to listen to this talk live. Core.async is notoriously hard to use correctly, which I experienced first-hand while developing <a href="https://github.com/nathell/skyscraper">Skyscraper</a>. (I ended up abstracting away all message-passing and process construction into a <a href="https://github.com/nathell/skyscraper/blob/master/src/skyscraper/traverse.clj">higher-level construct</a>, and then using that to implement the functionality.) But I’ve had my share of debugging deadlocks, and <a href="https://github.com/discourje/development">Discourje</a> would have been so much help had I known about it earlier! I’m gonna try it out anyway.</p><p>On top of the usefulness, Sung-Shik presented it in a very fun and entertaining way.</p><h2 id="jordan-miller:-got-a-guru?">Jordan Miller: <em>Got a Guru?</em></h2><p>Whoah. I liked a lot of talks at DCD, but if I were to pick up <em>the</em> one highlight of the day, it’d probably be this one. Being a soft talk, it was certainly the most welcome surprise.</p><p>I won’t try to summarize it (wait for the recording), but I’ll just say that in addition to having a guru it touched on being a glue person, note-taking, multi-dimensional self-awareness progression, and ASSES (which doesn’t quite mean what you think it does). Lambduh (the number of h’s varies) is either a natural-born presenter or had put in extremely high effort to deliver a show like this. Or both. In any case, I’m in awe.</p><h2 id="michiel-borkent:-clojurescript-reimagined">Michiel Borkent: <em>ClojureScript reimagined</em></h2><p>I’m not sure how Borkdude does it, but he’s a relentless deliverer. He wrote and actively maintains I-don’t-know-how-many alternative Clojure runtimes, in addition to <a href="https://github.com/clj-kondo/clj-kondo">clj-kondo</a> and many other projects. This is Fabrice Bellard-level productivity, and I don’t say that lightly.</p><p>Anyway, those runtimes together cover a wide range of usecases. With this talk, Michiel adds two for an even wider coverage: <a href="https://github.com/squint-cljs/cherry">Cherry</a> (compiling ClojureScript to ES <code>.mjs</code> modules), and <a href="https://github.com/squint-cljs/squint">Squint</a> (“a way to write JavaScript with familiar syntax that sort of looks like cljs if you squint”). Clojure is coming to your kettle Real Soon Now!</p><h2 id="drinks">Drinks</h2><p>Great. And wonderful people, too. Party like you’re in Amsterdam.</p><h2 id="the-bad">The bad</h2><p>I struggle to find <em>anything</em> that I might have disliked! I forgot my water bottle, but I can only blame myself for that. :)</p></div>tag:blog.danieljanus.pl,2022-09-24:post:paying-for-booksHow to pay for books2022-09-24T00:00:00Z<div><p><em>This post was originally <a href="https://plblog.danieljanus.pl/2021/10/10/jak-placic-za-ksiazki/">published in Polish</a>. This translation has been slightly edited to explain some details that are likely to be obscure for people outside Poland.</em></p><h2 id="fortuna-imperatrix-mundi">Fortuna imperatrix mundi</h2><p>I wouldn’t make a good emperor of the universe.</p><p>Sometimes I wonder what I would change if I had the power to shape the world any way I could, and always I come to the same conclusion: <em>I don’t know</em>. I see many issues with the status quo, but all the solutions that I can come up with have their own problems. And so it rolls.</p><p>But certain ideas seem sensible to me. For instance, I have a pretty clear vision of how paying for books (including ebooks) works in my perfect world. Before I explain it, though, let me say a few words about what I dislike about the current reality.</p><h2 id="the-way-things-stand">The way things stand</h2><p>Let us establish right at the beginning that the need to incentivise the authors, as well as other people whose work is needed to create books, is obvious.</p><p>All the deficiencies of capitalism notwithstanding, as a society we suffer from a kind of doublethink. On the one hand, we praise libraries as temples of culture and knowledge. Their social and culture-making role is hard to overstate. On the other hand, we rightfully cringe when someone illegally downloads an ebook from the Net: it violates a social agreement.</p><p>Meanwhile, from an author’s point of view, in both situations their profit is usually lower than if the reader had bought the book. In particular, it can be zero in both cases unless the country implements some form of <a href="https://en.wikipedia.org/wiki/Public_Lending_Right">Public Lending Right</a>.</p><p>Digressing for a while: in 2012, <a href="https://en.wikipedia.org/wiki/Kazik_Staszewski">Kazik Staszewski</a> called people who downloaded the <a href="https://www.youtube.com/playlist?list=PLfo7rU6KgPU90FLmMyxhGQ_w6B90Quy8T">then-new Kult album</a> “paltry b*****es”. In response, a mock page popped up, called <a href="https://web.archive.org/web/20210308100646/https://stratakazika.pl/">“Kazik’s Loss”</a><sup class="sidenote-ref" data-label="footnote1">1</sup>.</p><p>The premise was simple: make a copy of Kazik’s album to incur a loss of profit, then increment the Grand Total on the page. Actually, you can use any other album. Or, indeed, make many copies. Use your local disk to make it faster. Delete old copies if you run out of space. Go wild! Oh, and you’d better not share them online, lest men in black knock on your door.</p><p>(See what I did two paragraphs ago? I linked to the album on YouTube! It was put there by the copyright holder, but now you’ll just listen to that and not pay Kazik any money. I guess I’m gonna go bump the amount on the site. Or not, because it’s no longer up. But in 2021, last time the Wayback Machine successfully crawled it, the Grand Total was approaching 300 million dollars.)</p><p>I bring that example up because it illustrates pretty clearly that merely making a copy of digital connect (whether legally or not) is not a particularly meaningful act in and of itself. An USB stick filled with thirty thousand ebooks is not automatically worth $100,000. If I just read four of them, then only those four will present any value to me. The book brings value for the reader not when bought, but <em>in the process of reading</em>.</p><p>Thus, I think it would make sense to tie the payment for the book (or more precisely, for its content) to that very process.</p><h2 id="how-i-imagine-it">How I imagine it</h2><p>Hence, the following idea. This is a sketch; details would need to be fleshed out.</p><ol><li><span>Nothing changes in the model of distribution of paper books: you can buy one or borrow it from a library.</span></li><li><span>All ebooks can be downloaded from the Internet for free, in unlimited amounts.</span></li><li><span>When buying a book in the store, you pay for a physical item, not for the content.</span></li><li><span>Every book and ebook includes a bank account number (or a link to Stripe, or whatever) that lets you pay for the content you’ve read. This money is then distributed between people who contributed to the book (the author, obviously, but also people responsible for editing, proofreading, typesetting, illustrations, cover, etc.)</span></li><li><span>There’s a strong <strong>social expectation</strong> to pay for every book you read, as long as you can afford it. This holds for all books, no matter how you acquire them or whether you hold on to them. In particular, this means that you pay twice for books that you buy to own (once in the store and once after reading); and that you also need to pay for books that you borrow from the library, from a friend, or download.</span></li></ol><p>The more payers in the system, the better its chances to work. But obviously this cannot be enforced legally. Even if it were technically possible to devise a Readership Control Office, the very thought makes me shiver. For the idea to take off, a societal mentality change would be needed: a widespread belief that evading readership fees is just as unethical as not paying for the bus ticket. Hence the phrase “social expectation”.</p><p>While it’d be a strong expectation, I also think it’d be important that it be soft and non-exclusive. As a child, I used to spend a quarter of my life in public libraries. I vividly remember the gratifying feeling of interacting with an immense wealth you can wallow in completely for free. I wouldn’t want to take that feeling away from that young me, just because I was low on pocket money. Nor do I think that depriving people of library access if they can’t afford it would be a good idea. That’s why I say “you pay <em>as long as you can afford it</em>”. In my perfect world, it’s the reader who decides the support amount, based on what they can give and how much value they drew from the book.</p><p>Just how much that would be in practice? I have no idea, but I roughly guess typically a few dollars. A breakdown of the retail price of a typical book looks like this:</p><figure><img src="/img/blog/book-price.svg" alt="Breakdown of a book price"><figcaption>What makes up the price of a book? (Image translated from <a href="https://www.granice.pl/news/skad-sie-bierze-cena-ksiazki/5868">here</a>, based on Polish data; I wasn’t able to find similarly detailed information for English-language market, but <a href="https://www.davidderrico.com/cost-breakdowns-e-books-vs-printed-books/">this article</a> suggests it’s not far off.)</figcaption></figure>
<p>I imagine the fee would cover all publisher costs that <em>don’t</em> involve creating the book as a physical item, or some 30% of its typical retail price. Because the marginal cost of producing a new copy doesn’t include the same elements (and the marginal cost of producing a new copy of an ebook is zero), I guess the final price in the store might be roughly 70% of what it’s now; I also imagine ebooks could be downloaded for free or for a tiny fee to cover the costs of on-line distribution sites.</p><p>Another approach to determining the fee is to ask yourself two questions:</p><ol><li><span>How many books do I read per month?</span></li><li><span>How much money can I spend monthly to support authors?</span></li></ol><p>Just divide #2 by #1 and you’ll know what your limits are.</p><h2 id="swallows-make-summer">Swallows make summer</h2><p>There’s one more reason why I like this vision: it’s not an all-or-nothing proposition. It can—and I believe it should—be implemented piecemeal, today, on the grassroots level. Indeed, it incorporates ideas that are already functioning in different places.</p><p>This article has been brewing for a long time, inspired in no small part by Matthew Butterick’s online-only book <a href="https://practicaltypography.com/">“Practical Typography”</a>. The author states bluntly that the book is not free:</p><blockquote><p>This book’s only source of revenue is readers like you. If you don’t pay, the book dies.</p></blockquote><p>And some people pay. Butterick’s income is underwhelming, and the ratio of paying to non-paying readers <a href="https://practicaltypography.com/effluents-influence-affluence.html">even less so</a> (the mentality shift has yet to happen) — but he admits that it’s an experiment in online publishing. And there’s no printed version.</p><p>Some modern authors publish their novels under free licenses, so they can be freely copied and shared: Goodreads lists <a href="https://www.goodreads.com/list/show/9437.Free_Creative_Commons_Novels">42 novels available under Creative Commons licenses</a>. All of them have been can also be bought as paperbacks in the usual way. Seven of these are by Cory Doctorow. <a href="https://wiki.creativecommons.org/wiki/Case_Studies/Cory_Doctorow">In his own words</a>:</p><blockquote><p>Not only does making my books available for free increase the number of sales that I get, but I also came to understand it artistically as a Science Fiction writer that if I was making work that wasn't intended to be copied, then I was really making contemporary work.</p></blockquote><p>An example from the IT world: the full text of Peter Seibel’s book “Practical Common Lisp” is <a href="https://gigamonkeys.com/book/">available online</a> – and this also doesn’t discourage people from paying for the paper version.</p><p>I’ve taken the term “social expectation” from Marijn Haverbeke, who thus expresses <a href="https://marijnhaverbeke.nl/fund/">his ideas</a> on funding the software he writes.</p><p>Paying for the content after reading is a form of micro-patronage: it forms a bond between the reader and the book creators. So, I’d like to point out some other initiatives that also contribute to forming such bonds, albeit in different ways. I mean <a href="https://www.humblebundle.com/books">Humble Book Bundle</a> and its workalikes like the Polish <a href="https://artrage.pl/bookrage">BookRage</a>, where you can choose your own price for a set of books and how to distribute it between publishers, the platform, and a charity; and sites like <a href="https://www.patreon.com/">Patreon</a>, where you can support creators with regular payments.</p><h2 id="be-the-change-you-wish-to-see-in-the-world">Be the change you wish to see in the world</h2><p>Thus spake Mahatma Gandhi, and I want to follow.</p><p>I’ll be frank: while I had read “Practical Typography” years ago (although not paying much attention), I hadn’t paid for it so far. If it weren’t for the experiment, I probably wouldn’t have remembered it.</p><p>But Butterick points out that we also pay for books in a third way: with our own time. A precious, non-renewable resource. Quoting him again:</p><blockquote><p>Every great book is underpriced; no bad book is cheap enough.</p></blockquote><p>Those words! Those words have been sitting in the back of my mind ever since I’d read them. With those words, Mr Butterick, I had incurred debt to you; and with the publishing of this article, I’m hereby paying this debt off. This shows that sometimes it takes long time for the reader to make up his mind about supporting the author.</p><img src="/img/blog/splata-dlugu.png" alt="Payment confirmation">
<p>From my correspondence with <a href="https://www.bjornlarssen.com/">Bjørn Larssen</a> (go read <a href="https://www.bjornlarssen.com/books/">“Storytellers”</a> if you haven’t yet, it’s good; I hope Bjørn won’t mind me sharing this snippet):</p><p>Me:</p><blockquote><p>I’m adding three more coffees to your ko-fi. Not just because I want the second book when it comes out (I do!), but because I dream of a world where people who can afford it support creators they read/listen to/etc. In my perfect world, there’s a social expectation to do that regardless of whether you bought or borrowed or pirated the book, and there’s a link to your ko-fi right on the last page of “Storytellers.” :)</p></blockquote><p>Bjørn:</p><blockquote><p>For a while now, I’ve been toying with an idea to put a tip jar on <a href="http://bjornlarssen.com">bjornlarssen.com</a>: a separate ko-fi, meant only for readers who wish to encourage and support me, or for those who pirated the book and now feel remorse. You’ve just proved to me that it’s not a bad idea after all and maybe I should just do it.</p></blockquote><h2 id="what-you-should-do">What you should do</h2><p>Buy the books by authors you like. Support them on Patreon and elsewhere. Email them (<a href="https://fuse-pl.translate.goog/beton/hello-i-love-you.html?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=pl">if only to say thank you</a>) and ask them to set up a Patreon account, share their PayPal, or set up some other way to support. Or invite them for a cup of coffee, if you can.</p><hr /><div class="footnote"><p><sup class="footnote-ref" data-ref="0">1</sup> In Polish, “Strata Kazika”, a pun on <a href="https://www.youtube.com/playlist?list=PLfo7rU6KgPU_ka63NCXDVZR5lb8LxRdoB">another Kult album title</a>. (Oops, I did it again. And I tried hard not to link to Britney.)</p></div></div>tag:blog.danieljanus.pl,2022-08-18:post:i-love-my-gpd-micro-pcI love my GPD Micro PC2022-08-18T00:00:00Z<div><p>I bought two computers this year: a beefy Macbook Pro with M1 Pro and a GPD Micro PC.</p><p>The MBP is meant to be my mobile workstation, to satisfy all my needs whenever I need to work outside of my home office (at home, I’m still mostly using a two-year-old Intel-based Mac Mini). The GPD was a caprice. Perhaps I’m at the stage in life when well-off men buy themselves Ferraris to fend off a mid-life crisis; or perhaps I just have a separate heart for small computers.</p><p>Let’s compare the two machines:</p><table class="center">
<tr class="header"><th></th><th>Macbook Pro</th><th>GPD Micro PC</th></tr>
<tr><td>CPU</td><td>Apple M1 Pro<br><small>(8 cores @ 2 – 3.2 GHz)</small></td><td>Intel Celeron N4100<br><small>(4 cores @ 1.1 GHz)</small></td></tr>
<tr><td>RAM</td><td>32 GB</td><td>8 GB</td></tr>
<tr><td>SSD</td><td>512 GB</td><td>128 GB</td></tr>
<tr><td>Display</td><td>14.2″</td><td>6″</td></tr>
<tr><td>Performance<br><small>(Cinebench R15 multi-treaded)</small></td><td>1309</td><td>238</td></tr>
<tr><td>Price paid (EUR)</td><td>2550</td><td>300</td></tr>
</table>
<p>Guess which of these two I find myself using more? That’s right, the GPD Micro. Granted, I’ve only had it for a month, so it may be a novelty effect that’ll wane over time, but still: I’m impressed. And, yes, I’ve installed Ubuntu MATE (a semi-official distro that has dedicated builds for this hardware) and Emacs on it, and I program on it.</p><figure><img src="/img/blog/karypel.jpg" alt="The GPD Micro PC running Emacs on Ubuntu MATE and a credit-size card for scale"><figcaption>The GPD Micro PC running Emacs on Ubuntu MATE and a credit-size card for scale</figcaption></figure>
<p>So what do I like about it? Why would I reach out more for a device that has sub-par performance, a screen that you have to squint to notice anything on, and an uncomfortable keyboard?</p><p><strong>It’s ultra-portable.</strong> It resides permanently in my waist bag (a.k.a. fanny pack for my American readers) alongside my wallet and phone, and I carry it around everywhere when I’m out and about. It’s super lightweight for a laptop (I hardly feel the extra grams), and reaching for it only takes a second or so, as does putting it away.</p><p><strong>It’s cheap and sturdy.</strong> I’m very vigilant and still a bit freaked out when I carry the Macbook around. Careful in tight spaces! Better not hop on a city bike with it in my backpack, ’cause what if I fall?</p><p>In contrast, the Micro doesn’t mind to be worn or battered. In many ways, it reminds me of the Eee PC 1000HE that used to be my main driver years ago. (Did I mention having a separate heart for small computers?) If it breaks, it breaks; but who knows! I once accidentally dropped the Eee from ~1 metre of height, chipping off some of the chassis plastic, but the computer continued to work. I wouldn’t be surprised if the Micro turns out to be just as resilient.</p><p><strong>Having a hacking environment at all times feels very empowering.</strong> This is something I hadn’t anticipated at all. I like long, solitary walks, letting my mind wander; when I feel like thinking about code, I will sometimes stop for a coffee and experiment. And if I don’t, the very thought of having a dev environment always within reach makes me feel inspired.</p><blockquote><p><em>I walk, I lift up, I lift up heart, eyes</em><br></p><p style="text-align: right;">— Gerard Manley Hopkins, _Hurrahing in Harvest_</p>
</blockquote><blockquote><p><em>My hands in my pockets, and my pockets like an ocean,</em><br> <em>I slowly walk and look around</em><br></p><p style="text-align: right;">— Sławomir Wolski / Mariusz Lubomski, [_Walkology_](https://www.youtube.com/watch?v=iuy6IxgaM6Q)</p>
</blockquote><p>Similarly on the underground: rather than mindlessly reaching for the phone and scrolling through news, I choose to pull out the Micro and read some code. I very deliberately am not logged into any social media there, do not do any work on it (just hacking on personal projects for fun), and remain offline unless I really need something from the Net.</p><p><strong>It dual-boots Windows and Linux</strong> and is my only x86-64 computer. I like many things about macOS and the Apple ecosystem, but the walled garden of Apple still irks me. It’s good to have an escape hatch to the more open world of Linux (I’m hesitant of trying out Asahi on the Mac) and be able to play an occassional Windows game.</p><p>Finally, <strong>this little thing has a soul.</strong> Like the Eee; like the 8-bit micros of yore. I don’t know how better to put it. The Macbook Pro is a very capable workhorse, but I think of it as just a tool. In contrast, the GPD just wants to be used, interacted with, tinkered with. Someday I’ll find a use for its RS-232 serial port!</p></div>tag:blog.danieljanus.pl,2021-09-25:post:testing-lithiumTesting a compiler that can’t even print stuff out2021-09-25T00:00:00Z<div><p>I’m enjoying a week-long vacation. In addition to other vacationy things (a trip to Prague, yay!), I wanted to do some off-work programming Just For Fun™ and revisit one of my dormant pet projects, to see if I can make some progress.</p><p>I opted for Lithium, my toy x86 assembler and Lisp compiler that hasn’t seen new development since 2014. But before that, I had <a href="/2012/05/14/lithium/">blogged</a> <a href="/2013/05/26/lithium-revisited/">about it</a> and even <a href="https://danieljanus.pl/talks/reveal.js/2013-euroclojure.html#/">talked about it</a> at EuroClojure one time.</p><p>Over the week, I’ve re-read the <a href="http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf">paper</a> that I’ve been loosely following while developing Lithium. In it, Abdulaziz Ghuloum advocates to have a testing infrastructure from day one, so that one can ensure that the compiler continues to work after each small modification. I’d cut corners on it before, but today, I’ve finally added one.</p><p>What’s the big deal? And why not earlier?</p><p>One of the original goals that I set myself for Lithium is that it have no runtime dependencies. Not even a C library; not even an OS. It produces raw x86 binaries targetting real mode – non-relocatable blobs of raw machine code. I’m running them in DOSBox, because it’s convenient, but the point is it’s not necessary.</p><p>(Some day, I’ll write a mission statement to explain why. But that’s a story for another day.)</p><p>And because the setup is so minimalistic, the setup suggested by Ghuloum becomes unfeasible. Ghuloum presupposes the existence of a host C compiler and linker; I have no such privilege. By itself, Lithium can barely output stuff to screen. There’s a <code>write-char</code> primitive that emits one character, but nothing more than that. And there’s as yet no library to add things to, because there’s no <code>defn</code> and not much of a global environment.</p><p>So what to do? I thought about the invariant in Ghuloum’s design, one that Lithium inherits as well:</p><p><em>Every expression is compiled to machine code that puts its value in the <code>AX</code> register.</em></p><p>If I could somehow obtain the values that the CPU registers have at the end of executing a Lithium-compiled program, then I could compare them to the expected value in a test. But how to grab those registers?</p><p>That turned out to be easier than expected. Instead of extending Lithium to support printing decimal or hexadecimal numbers, I just grabbed <a href="http://www.fysnet.net/yourhelp.htm">some pre-existing assembly code</a> to affix to the program as an epilog. (It does depend on DOS’s interrupt <code>21h</code>, but hey, it doesn’t hurt to have it for debugging/testing only.) Surprise: the snippet failed to compile, because Lithium’s assembler is woefully incomplete! But it was easy enough to extend it until it worked.</p><p>So this gave me a way to view the program’s results.</p><img src="/img/blog/lithium-testing.png">
<p>But there’s another problem: these results are printed within DOSBox. In the emulated DOS machine. I needed a way to transfer them back to the host. Can you guess how?</p><p>Yes, you’re right: the simplest thing (DOS redirection to a file, as in <code>PROG.COM >REG.TXT</code>) works. And you’ll laugh at me that it hasn’t occurred to me until now, when I’m writing up the <a href="https://github.com/nathell/lithium/commit/27563b3c5b92f32b24f750d98248d013f924a700">commit</a> that’s already out in the wild. Another proof that it pays to write documentation.</p><p>My original idea was… SCREEN CAPTURE!</p><p>I’ve scavenged Google for a DOS screen grabber that can produce text files and is not a TSR, <a href="http://www.pc-tools.net/dos/dosutils/">found one</a>, bundled it with Lithium, and wrote <a href="https://github.com/nathell/lithium/blob/27563b3c5b92f32b24f750d98248d013f924a700/src/lithium/driver.clj#L23-L36">some duct-tape code</a> that invokes the compiled program and the screen grabber in turn and then parses the output. With that, I can finally have <a href="https://github.com/nathell/lithium/blob/27563b3c5b92f32b24f750d98248d013f924a700/test/lithium/compiler_test.clj">tests</a> that check whether <code>(+ 3 4)</code> is really <code>7</code>.</p><p>And now let me go refactor it…</p></div>tag:blog.danieljanus.pl,2021-07-01:post:commit-groupsThings I wish Git had: Commit groups2021-07-01T00:00:00Z<div><h2 id="intro">Intro</h2><p>Everyone <sup class="sidenote-ref" data-label="footnote1">1</sup> and their dog <sup class="sidenote-ref" data-label="footnote2">2</sup> loves Git. I know I do. It works, it’s efficient, it has a brilliant data model, and it sports <a href="https://git-scm.com/book/en/v2/Git-Tools-Rerere">every feature under the sun</a>. In 13 years of using it, I’ve never found myself needing a feature it didn’t have. Until recently.</p><p>But before I tell you about it, let’s talk about GitHub.</p><p>There are three groups of GitHub users, distinguished by how they prefer to merge pull requests:</p><img src="/img/blog/3-groups-of-gh-users.png">
<p>Merge commit, squash, or rebase? There’s no single best answer to that question. A number of factors are at play in choosing the merge strategy: the type of the project, the size, workflow and preferences of the team, business considerations, and so on. You probably have your own preference if you’ve used GitHub to collaborate with a team.</p><p>I’ll talk for a while about the pros and cons of each approach. But first, let’s establish a setting. Imagine that your project has a <code>main</code> branch, from which a <code>feature</code> branch was created off at one point. Since then, both branches have seen developments, and now after <code>feature</code> has undergone reviews and testing, it’s ready to be merged back to <code>main</code>:</p><img src="/img/blog/orig.svg">
<h2 id="create-a-merge-commit">Create a merge commit</h2><p>Merge commits are the original answer that Git has to combining changes. A merge commit has two or more parents and brings in all the changes from them and their ancestors:</p><img src="/img/blog/merge-commit.svg">
<p>In this example, Git has created a new commit, number 9, that merges commits 6 and 8. The branch <code>main</code> now points to that new commit, and so contains all changes in the range 1–8.</p><p>Merge commits are extremely versatile and scale well, especially for complicated workflows with multiple maintainers, each responsible for different part of the code; for example, they’re pervasively used by the Linux kernel developers. However, for small, agile teams (especially in the business context), they can be overkill and pose potential problems.</p><p>In such a team, you typically have one eternal branch, from which production releases are made, and to which people merge changes from short-lived feature branches. In such a setting, it’s hard to tell how the history of a project has progressed. <a href="https://nvie.com/posts/a-successful-git-branching-model/">GitFlow</a>, a popular way of working with Git, advocates merge commits everywhere, and <a href="https://www.endoflineblog.com/gitflow-considered-harmful">people are struggling with it</a>.</p><p>I’ll refer you to the visual argument from that last post:</p><img src="/img/blog/gitflow-mess.png">
<p>Setting aside the fact that this history is littered with merge commits, the author makes a point that with this kind of an entangled graph, it’s practically impossible to find anything in it. Whether that’s true or not I’ll leave for you to decide, but there’s definitely a case for linear history there.</p><p>There’s another, oft-overlooked quirk here. Quick: look again at the second image above, the one with merge commit number 9. Can you tell, from the image alone, which commit was the tip of <code>main</code> before the merge happened? Surely it must be 8, because it’s on the gray line, right?</p><p>Yeah: on the image. But when you look at the merge commit itself, it’s not that obvious. Under the hood, all the commit really says is:</p><pre><code>Merge: 8 6
</code></pre><p>So it tells you that these two parents have been merged together, <em>but it doesn’t tell you which one used to be <code>main</code></em>. You might guess 8, because it’s the leftmost one, but you don’t know for sure. (Remember, branches in Git are just pointers to commits.) The only way (that I know of) to be sure is to use the <a href="https://git-scm.com/docs/git-reflog">reflog</a>, but that is ephemeral: Git occassionally prunes old entries from reflogs.</p><p>So this prevents you from being able to confidently answer questions such as: “which features were released over the given time period?”, or “what was the state of <code>main</code> as of a given date?”.</p><p>That’s also why you can’t <code>git revert</code> a merge commit—that is, unless you tell Git which of the parent commits you want to keep and which to discard.</p><h2 id="squash-and-merge">Squash and merge</h2><p>In the merge commit-based approach, we don’t rewrite history: once a commit is made, it stays; repository only grows by accretion. In contrast, the other two approaches use Git’s facilities for rewriting history. As we’ll see, the fundamentals are the same: where they differ is commit granularity.</p><p>Coming back to our example: when squashing, we mash together the changes introduced by commits 4, 5, and 6 into a single commit (“S”), and then replay that commit on top of <code>main</code>.</p><img src="/img/blog/squash-and-merge.svg">
<p>The <code>feature</code> branch is still there, but I didn’t include it on this picture because it’s no longer relevant—it typically gets deleted upon merge (which, as we will see, might not actually be a good idea).</p><p>There’s a lot to like about this approach, and <a href="https://blog.dnsimple.com/2019/01/two-years-of-squash-merge/">some teams</a> <a href="https://christopher.xyz/2020/07/13/squash-merge.html">advocate for it</a>. The biggest and most obvious benefit is likely that <em>the history becomes very legible</em>. It’s linear and there’s a one-to-one correspondence between commits on <code>main</code> and pull requests (and, mostly, either features or bugfixes). Such a history can be of great help in project management: it becomes very easy to answer the questions which were nigh impossible to answer in the merge-commit approach.</p><h2 id="rebase-and-merge">Rebase and merge</h2><p>This situation is similar to the previous one, except that we don’t squash commits 4–6 together. Instead, we directly replay them on top of <code>main</code>.</p><img src="/img/blog/rebase-and-merge.svg">
<p>Let me start with a long digression. You might guess, from the GitHub screenshot at the top of this post, that I’m in this camp, and you’d be right. In fact, I used to squash and merge feature branches, but I switched to the rebase-and-merge approach after introducing probably the single biggest improvement to the quality of my work over recent years:</p><p>I started writing <a href="https://chris.beams.io/posts/git-commit/">meaningful commit messages</a>.</p><p>In the not-too-distant past, my commit messages used to be one-liners, as evidenced, for example, in the <a href="https://github.com/nathell/skyscraper/commits/master">history of Skyscraper</a>. These first lines haven’t changed much, but now I strive to augment them with explanation of <em>why</em> the change is being made. When it fixes a bug, I explain what was causing it and how the change makes the bug go away; when it implements a feature, I highlight the specifics of the implementation. I might not write more code these days, but I certainly write more prose: it’s not uncommon for me to write two or three paragraphs about a +1/−1 change.</p><p>So my commit messages now look like this (I’m taking a recent random example from the <a href="https://iamfy.co">Fy!</a> app’s repo):</p><pre><code class="hljs text">app/tests: allow to mock config
Tests expected the code-push events to fire, but now that I’ve
disabled CP in dev, and the tests are built with the dev aero profile,
they’d fail.
This could have been fixed by building them with AERO_PROFILE=staging
in CI, but it doesn’t feel right: I think tests shouldn’t depend on
varying configuration. If a test requires a given bit of configuration
to be present, it’s better to configure it that way explicitly.
Hence this commit. It adds a wrap-config mock and a corresponding
:extra-config fixture, which, when present (and it is by default),
will merge the value onto generated-config.
</code></pre><p>I’m very conscious about having a clean history. I’m aiming for each commit to be small (with the threshold at approximately +20/−20 LOCs) and introduce a coherent, logical change.</p><p>That’s not to say I always <em>develop</em> that way, of course. If you looked at a <code>git log</code> of my work-in-progress branch, chances are you’d see something like this:</p><pre><code class="hljs text">5d64b71 wip
392b1e0 wip
0a3ad89 more wip
3db02d3 wip
</code></pre><p>But before declaring the PR ready to review, I’ll throw <em>this</em> history away (by <code>git reset --mixed $(git merge-base feature main)</code>) and re-commit the changes, dividing them into logical units and writing the rationales, bit by bit.</p><p>The net result of rigorously applying this practice is that</p><p><strong>you can do <code>git annotate</code> anywhere, and learn about why any line of code in the codebase is the way it is.</strong></p><p>I can’t emphasize enough how huge, huge impact for the developer’s wellbeing this has. These commits messages, when I read them back weeks or months later, working on something different but related, almost read as little love letters from me-in-the-past to me-now. They reduce the all-important WTFs/minute metric to zero.</p><img src="/img/blog/wtfm.jpg">
<p>They’re also an aid in reviewing code. My PR notes usually say “please read each commit in isolation.” I’ve found it easier to follow a PR when it tries to tell a story, and each commit is a milestone down that road.</p><p>Ending the digression: can you see why I prefer rebase-and-merge over squash-and-merge? Because, all the benefits notwithstanding, squashing <em>irrevocably loses context</em>.</p><p>Now, instead of each line being a result of a small, +20/−20 change, you can only tell that it’s part of a set of such changes — maybe ten of them, maybe fifty. You don’t know. Sure you can go look in the original branch, but it’s an overhead, and what if it’s been deleted?</p><p>So yeah. Having those love letters all in place, each carefully placed and not glued to others, is just too much of a boon to let go. But it’s not to say that rebasing-and-merging is without downsides.</p><p>For example, it’s again hard to tell how many features were deployed over a given period of time. More troublesomely, it’s harder to revert changes: typically you want to operate on a feature level there. With squash-and-merge, it takes one <code>git revert</code> to revert a buggy feature. With rebase-and-merge, you need to know the range.</p><p>Worse yet: it’s more likely for a squashed-and-merged commit to be cleanly undone (or cherry-picked) than for a series of small commits. (I sometimes deliberately commit wrong or half-baked approaches that are changed in subsequent commits, just to tell the story more convincingly, and it’s possible that each of these changes individually causes trouble but that they cancel each other in squash.)</p><p>So I’m not completely happy with either of the three approaches. Which finally brings me to my preferred fourth approach, one that Git (yet?) doesn’t allow for:</p><h2 id="rebase,-group-and-merge">Rebase, group and merge</h2><p>You know the “group” facility of vector graphics programs? You draw a couple of shapes, you group them together, and then you can apply transformations to the entire group at once, operating on it as if it were an atomic thing. But when need arises, you can “ungroup” it and look deeper.</p><p>That’s because sometimes there’s a need to have a “high-level” view of things, and sometimes you need to delve deeper. Each of these needs is valid. Each is prompted by different circumstances that we all encounter.</p><p>I’d love to see that same idea applied to Git commits. In Git, a commit group might just be a named and annotated range of commits: <code>feature-a</code> might be the same as <code>5d64b71..3db02d3</code>. Every Git command that currently accepts commit ranges could accept group names. I envision groups to have descriptions, so that <code>git log</code>, <code>git blame</code>, etc could take <code>--grouped</code> or <code>--ungrouped</code> options and act appropriately.</p><p>Obviously, details would need to be fleshed out (can groups overlap? can groups be part of other groups?), and I’m not that familiar with Git innards to say with confidence that it’s doable. But the more I think about it, the more sound the idea seems to me.</p><p>I think creating a group when doing a rebase-and-merge could bring together the best of all three worlds, so that we can have all our cakes and eat them too.</p><hr /><div class="footnote"><p><sup class="footnote-ref" data-ref="0">1</sup> <a href="https://www.mercurial-scm.org/">Well,</a> <a href="https://fossil-scm.org/">almost</a> <a href="https://pijul.org/">everyone</a>.</p></div><div class="footnote"><p><sup class="footnote-ref" data-ref="1">2</sup> It’s Dog Day here in Poland as I write these words. Happy Dog Day!</p></div></div>tag:blog.danieljanus.pl,2020-11-08:post:coronalottoI made a website to guess tomorrow’s number of COVID-19 cases, and here’s what happened2020-11-08T00:00:00Z<div><h2 id="before">Before</h2><p>It seems so obvious in hindsight. Here in Poland, people have been guessing it ever since the pandemic breakout: in private conversations, in random threads on social media, in comments under governmental information outlets. It seemed a matter of time before someone came up with something like this. In fact, on one Sunday evening in October, I found myself flabbergasted that apparently no one yet has.</p><p>I doled out $4 for a domain, <a href="http://koronalotek.pl">koronalotek.pl</a> (can be translated as “coronalotto” or “coronalottery” – occurrences of the name on Twitter date back at least as far as April), and fired up a REPL. A few hours and 250 Clojure LOCs later, the site was up.</p><p>I wanted it to be as simple as possible. A form with two fields: “your name” and “how many cases tomorrow?” A top-ten list of today’s winners, sorted by the absolute difference between the guess and the actual number of cases, as <a href="https://twitter.com/mz_gov_pl">reported daily on Twitter</a> by the Polish Ministry of Health. The official number, prominently displayed. And that’s all.</p><img src="/img/blog/koronalotek.png">
<p>On 17 October, I posted the link on my Facebook and Twitter feeds, and waited. The stream of guesses started to trickle in.</p><h2 id="after">After</h2><p>It never grew to be more than a stream, but it hasn’t gone completely unnoticed either.</p><img src="/img/blog/koronalotek-g1.png">
<p>The above plot shows daily number of accepted guesses (i.e., those that were used to generate the next day’s winners) over time – a metric of popularity. Each day’s number means guesses cast in the 24 hours up until 10:30 (Warsaw time) on that day, which is when the official numbers are published by the Ministry of Health.</p><p>I’ve been filtering out automated submissions, as well as excess manual submissions by the same IP that seemed to skew the results too much – I’ve arbitrarily set the “excess” threshold at 10. The missing datapoint for 19 October is not a zero, but a N/A: I’ve lost that datapoint due to a glitch. More on this below.</p><p>The interest peaked on October 23, with more than a thousand guesses for that day (I think it was reposted by someone with a significant outreach back then), and has been slowly declining since.</p><p>I have privately received some feedback. One person has pointed out that they found the site distasteful and that making fun of pandemic tragedies made them uncomfortable. (I empathise; for me it’s not so much making fun as it is a coping mechanism—a way to put distance between my thoughts and the difficult times we’re in and to keep fears at bay.) Some people, however, have thanked me for making them smile when they guessed more or less correctly.</p><p>Back to data. Being a data junkie, I looked at what I had been collecting. First things first: how accurate is the collective predictive power of the guessers?</p><img src="/img/blog/koronalotek-g2.png">
<p>Quite accurate, in fact! Data for this plot has only been slightly preprocessed, by filtering out “unreasonable” guesses that don’t fall within the range <code>[100; 50000]</code>.</p><p>People have over- and underguesstimated the number of new cases, but not by much. There were only a few occasions where the actual case count didn’t fall within one standard deviation of the mean of guesses (represented by the whiskers around blue bars on the plot). Granted, the daily standard deviation tends to be large (on the order of a few thousand), but still, I’m impressed. A paper on estimating the growth of pandemic based on coronalottery results coming soon to a journal near you! ;-)</p><p>Just for the heck of it, I’ve also been looking at individual votes. Specifically, names. Here’s a snapshot of unique guessers’ names sorted by decreasing length, on 23 October. (NSFW warning: expletives ahead!)</p><img src="/img/blog/koronalotek-names.jpg">
<p>Let me translate a few of these for those of you who don’t speak Polish:</p><p>1 is “Sasin has fucked over 70 million zlotys for elections that didn’t take place and was never held responsible.” This alludes to the <a href="https://notesfrompoland.com/2020/05/27/70-million-zloty-bill-for-polands-abandoned-presidential-election/">ghost election in Poland</a> from May. This news had gone memetic, going so far as Minister Sasin’s name being ironically used as a dimensionless unit of 70 million (think Avogadro’s number). You’ll discover the same theme in #2, #3, #5, and others.</p><p>6 is “CT {Constitutional Tribunal}, you focking botch, stop repressing my abortion”. Just a day before, the Polish constitutional court (whose current legality is <a href="https://en.wikipedia.org/wiki/Constitutional_Tribunal_(Poland)#2015%E2%80%93present:_Polish_Constitutional_Court_crisis">disputed at best</a>) has <a href="https://notesfrompoland.com/2020/10/22/constitutional-court-ruling-ends-almost-all-legal-abortion-in-poland/">decreed a ban on almost all legal abortion</a> in Poland, giving rise to <a href="https://edition.cnn.com/2020/10/31/europe/poland-abortion-protests-scli-intl/index.html">the biggest street protests in decades</a>.</p><p>Not all is political: 4 is “Why study for the exam if we’re not gonna survive until November anyway?”. I hope whoever wrote this is alive and well.</p><p>Corollary? Give people a text field, and they’ll use it to express themselves: politically or otherwise.</p><p>In fact, I have taken the liberty of chiming in. Shortly after, I altered the thank-you page (which used to just say “thanks for guessing”) to proudly display one of the emblems of the Women’s Strike, along with a link to a <a href="https://zrzutka.pl/kasa-na-aborcyjny-dream-team-55g5gx">crowdfounding campaign</a> for an NGO that supports women needing abortion.</p><img src="/img/blog/koronalotek-thanks.jpg">
<h2 id="inside-out">Inside out</h2><p>I’m not much of a DevOps person, so I deployed it the quick and dirty way, not caring about scalability or performance. The maxim “make it as simple as possible” permeates the setup.</p><p>I just started a REPL within a <code>screen</code> session on the tiny Scaleway C1 server that also hosts this blog and some of my other personal stuff. I launched a Jetty server within it, and set up a nginx proxy. And that’s pretty much it. I liberally tinker with the app’s state in “production,” evaluating all kinds of expressions when I feel like it.</p><p>Code changes are deployed by <code>git pull</code>ing new developments and doing <code>(require 'koronalotek.core :reload)</code> in the REPL.</p><p>Someone tried a SQL injection attack. This is doomed to fail because there’s no SQL involved. In fact, there’s no database at all. The entire state is kept in an in-memory atom and periodically synced out to an EDN file. In addition, state is reset and archived daily at the time of announcing winners. (I’ve added the archiving after forgetting it on one occasion – hence the lack of data for 19 October.)</p><p>I also don’t yet have a mechanism of automatically pulling in the Ministry of Health’s data. Every morning, I spend two minutes checking if there’s excess automatic votes, removing them if any, and then filling in the blanks:</p><pre><code class="hljs clojure">(<span class="hljs-name">new-data!</span> #inst <span class="hljs-string">"2020-11-08T10:30+01:00"</span> <span class="hljs-number">24785</span>)
</code></pre><p>For all the violations of good practices in this setup, it has worked out surprisingly well so far. I’ve resorted to removing automated votes a handful of times, and blacklisting IPs of voting bots in the nginx setup twice, but otherwise it’s been a low-maintenance toy. People seem to be willing to have fun, and I’m just not interfering.</p><h2 id="takeaways">Takeaways</h2><ol><li><span>You should call on your country’s authorities to exert pressure on the Polish government to respect women’s choices and stop actively repressing them.</span></li><li><span>Give people a text field, and they’ll use it to express themselves.</span></li><li><span>Release early, release often.</span></li></ol></div>tag:blog.danieljanus.pl,2020-05-08:post:making-of-clojure-dependencyMaking of “Clojure as a dependency”2020-05-08T00:00:00Z<div><p>In my previous post, <a href="/2020/05/02/clojure-dependency/">“Clojure as a dependency”</a>, I’ve presented the results of some toy research on Clojure version numbers seen in the wild. I’m a big believer in <a href="https://en.wikipedia.org/wiki/Reproducibility#Reproducible_research">reproducible research</a>, so I’m making available a <a href="https://github.com/nathell/versions">Git repo</a> that contains code you can run yourself to reproduce these results. This post is an experience report from writing that code.</p><p>There are two main components to this project: acquisition and analysis of data (implemented in the namespaces <code>versions.scrape</code> and <code>versions.analyze</code>, respectively). Let’s look at each of these in turn.</p><h2 id="data-acquisition">Data acquisition</h2><p>This step uses the <a href="https://developer.github.com/v3/">GitHub API v3</a> to:</p><ul><li><span>retrieve the 1000 most popular Clojure repositories (using the <a href="https://developer.github.com/v3/search/#search-repositories">Search repositories</a> endpoint and going through all <a href="https://developer.github.com/v3/#pagination">pages</a> of the paginated result);</span></li><li><span>for each of these repositories, look at its file list (in the master branch) and pick up any files named <code>project.clj</code> or <code>deps.edn</code> in the root directory, using the <a href="https://developer.github.com/v3/repos/contents/">Contents</a> endpoint);</span></li><li><span>parse each of these files and extract the list of dependencies.</span></li></ul><p>As hinted by the namespace, I’ve opted to use <a href="https://github.com/nathell/skyscraper">Skyscraper</a> to orchestrate the process. It would arguably have been simpler to use GitHub’s <a href="https://developer.github.com/v4/">GraphQL v4 API</a>, but I wanted to showcase Skyscraper’s custom parsing facilities.</p><p>There’s no actual HTML scraping going on (all processors use either JSON or Clojure parsers), but Skyscraper is still able to “restructure” the result – traverse the graph endpoint in a manner similar to that of GraphQL – with very little effort. It would have been possible with any other RESTful API. Plus, we get goodies like caching or tree pruning for free.</p><p>Most of the code is straightforward, but parsing of <code>project.clj</code> merits some explanation. Some of my initial assumptions proved incorrect, and it’s fun to see how. I initially tried to use <a href="https://clojure.github.io/clojure/clojure.edn-api.html#clojure.edn/read"><code>clojure.edn</code></a>, but Leiningen project definitions are not actually EDN – they are Clojure code, which is a superset of EDN. So I had to resort to <code>read-string</code> from core – with <code>*read-eval*</code> bound to nil (otherwise the code would have a Clojure injection vulnerability – think <a href="https://xkcd.com/327/">Bobby Tables</a>). Needless to say, some <code>project.clj</code>s turned out to depend on read-eval.</p><p>Some projects (I’m looking at you, <a href="https://github.com/dundalek/closh">Closh</a>, <a href="https://github.com/borkdude/babashka">Babashka</a> and <a href="https://github.com/borkdude/sci">sci</a>) keep the version number outside of <code>project.clj</code>, in a text file (typically in <code>resources/</code>), and slurp it back into <code>project.clj</code> with a read-eval’d expression:</p><pre><code class="hljs clojure">(<span class="hljs-name">defproject</span> closh-sci
#=(<span class="hljs-name">clojure.string/trim</span>
#=(<span class="hljs-name"><span class="hljs-built_in">slurp</span></span> <span class="hljs-string">"resources/CLOSH_VERSION"</span>))
…)
</code></pre><p>A trick employed by one project, <a href="https://github.com/metabase/metabase">Metabase</a>, is to dynamically generate JVM options containing a port number at parse time, so that test suites running at the same time don’t clash with each other:</p><pre><code class="hljs clojure">#=(<span class="hljs-name"><span class="hljs-built_in">eval</span></span> (<span class="hljs-name"><span class="hljs-built_in">format</span></span> <span class="hljs-string">"-Dmb.jetty.port=%d"</span> (<span class="hljs-name"><span class="hljs-built_in">+</span></span> <span class="hljs-number">3001</span> (<span class="hljs-name"><span class="hljs-built_in">rand-int</span></span> <span class="hljs-number">500</span>))))
</code></pre><p>Finally, it turned out that <code>defproject</code> is not always a first form in <code>project.clj</code>. Some projects, like <a href="https://github.com/robert-stuttaford/bridge">bridge</a>, only contain a placeholder <code>project.clj</code> with no forms; others, like <a href="https://github.com/ztellman/aleph">aleph</a>, first define some constants, and then refer to them in a <code>defproject</code> form. If those constants contain parts of the dependencies list, then those dependencies won’t be processed correctly. Fortunately, not a lot of projects do this, so it doesn’t skew the results much.</p><p>Anyway, the end result of the acquisition phase is a sequence of maps describing project definitions. They look like this:</p><pre><code class="hljs clojure">{<span class="hljs-symbol">:name</span> <span class="hljs-string">"clojure-koans"</span><span class="hljs-punctuation">,</span>
<span class="hljs-symbol">:full-name</span> <span class="hljs-string">"functional-koans/clojure-koans"</span><span class="hljs-punctuation">,</span>
<span class="hljs-symbol">:deps-type</span> <span class="hljs-symbol">:leiningen</span><span class="hljs-punctuation">,</span>
<span class="hljs-symbol">:page</span> <span class="hljs-number">1</span><span class="hljs-punctuation">,</span>
<span class="hljs-symbol">:deps</span> {org.clojure/clojure #:mvn{<span class="hljs-symbol">:version</span> <span class="hljs-string">"1.10.0"</span>}<span class="hljs-punctuation">,</span>
koan-engine #:mvn{<span class="hljs-symbol">:version</span> <span class="hljs-string">"0.2.5"</span>}}}<span class="hljs-punctuation">,</span>
<span class="hljs-symbol">:profile-deps</span> {<span class="hljs-symbol">:dev</span> {lein-koan #:mvn{<span class="hljs-symbol">:version</span> <span class="hljs-string">"0.1.5"</span>}}}
</code></pre><p>Homogeneity is important: every dependency description has been converted to the cli-tools format, even if it comes from a <code>project.clj</code>.</p><h2 id="data-analysis">Data analysis</h2><p>I’ve long been searching for a way to do exploratory programming in Clojure without turning the code to a tangled mess, portable only along with my computer.</p><p>Exploratory (or research) programming is very different from “normal” programming. In the latter, most of the time you typically focus on a coherent project – a program or a library. In contrast, in the former, you spend a lot of time in the REPL, trying all sorts of different things and <code>def</code>ing new values derived from already computed ones.</p><p>This is very convenient, but it’s extremely easy to get carried away in the REPL and get lost in a sea of <code>def</code>s. If you want to redo your computations from scratch, just about your only option is to take your REPL transcript and re-evaluate the expressions one by one, in the correct order. Cleaning up the code (e.g. deglobalizing) as you go is very difficult.</p><p>I’ve found an answer: <a href="https://plumatic.github.io/prismatics-graph-at-strange-loop">Plumatic Graph</a>, part of the <a href="https://github.com/plumatic/plumbing">plumbing</a> library. There are a plethora of uses for it: for example, at <a href="https://iamfy.co">Fy</a>, my current workplace, we’re using it to define our test fixtures. But as it turns out, it makes exploratory programming enjoyable.</p><p>The bulk of code in <a href="https://github.com/nathell/versions/blob/master/src/clj/versions/analyze.clj#L41"><code>versions.analyze</code></a> consists of a big definition of a graph, with nodes representing computations – things that I’d normally have <code>def</code>’d in a REPL. Consequently, most of these definitions are short and to the point. I also gave the nodes verbose, descriptive, explicit names. Name and conquer. <code>raw-repos</code> is the output from data acquisition, <code>repos</code> is an all-important node containing those <code>raw-repos</code> that were successfully parsed, and most other things depend on it.</p><p>It also doesn’t obstruct much the normal REPL research flow. My normal workflow with REPL and Graph is something along the lines of:</p><ol><li><span><code>(def result (main))</code></span></li><li><span>evaluate something using inputs from <code>result</code></span></li><li><span>nah, it leads nowhere</span></li><li><span>evaluate something else</span></li><li><span>hey, that’s interesting!</span></li><li><span>add a new node to the graph definition</span></li><li><span>GOTO 1</span></li></ol><p>Thanks to Graph’s lazy compiler, I can re-evaluate anything at need and have it evaluate only the things needed, and nothing else. Also, because the graph is explicit, it’s fairly easy to <a href="https://github.com/RedBrainLabs/graph-fnk-viz">visualize it</a>. (Click the image to open it in full-size in another tab.)</p><p><a href="/img/blog/computation-graph.png" target="_blank"><img src="/img/blog/computation-graph.png"></a></p><p>Because it’s lazy, it doesn’t hurt to put extra things in there just in case, even when you’re not going to report them. For example, I was curious what things besides a version number people put in dependencies. <code>:exclusions</code>, for sure, but what else? This is the <code>:what-other-things-besides-versions</code> node.</p><p>Imagine my surprise when I found <code>:exlusions</code> (<em>sic</em>) in there, which turned out to be a typo in shadow-cljs’ <code>project.clj</code>! I submitted <a href="https://github.com/thheller/shadow-cljs/pull/699">a PR</a>, and Thomas Heller merged it a few days after.</p><p>My only gripe with Graph is that it runs somewhat contrary to the current trends in the Clojure community: for example, it doesn’t support namespaced keywords (although there’s an <a href="https://github.com/plumatic/plumbing/issues/126">open ticket</a> for that). But on the whole, I’m sold. I’ll definitely be using it in the next piece of research in Clojure, and I’m on a lookout for something similar in pure R. If you know something, do tell me!</p><h2 id="some-words-on-plotting">Some words on plotting</h2><p>The plot from previous post has been generated in pure R, using <a href="https://ggplot2.tidyverse.org">ggplot2</a> (an extremely versatile API). Clojure generates a CSV with munged data, and then R reads that CSV as a data frame and generates the plot in a few lines.</p><p>I’ve briefly played around with <a href="https://github.com/scicloj/clojisr">clojisr</a>, a bridge between Clojure and R. It was an enlightening experiment, and it would let me avoid the intermediate CSV, but I decided to ditch it for a few reasons:</p><ul><li><span>It pulls in quite a few dependencies (I wanted to keep them down to a minimum), and requires some previous setup on the R side.</span></li><li><span>I’d much rather write my R as R, since I’m comfortable with it, rather than spend time wondering how it maps to Clojure. This is similar to the SQL story: these days I prefer <a href="https://www.hugsql.org">HugSQL</a> over <a href="https://github.com/korma/Korma">Korma</a>, unless I have good reasons to choose otherwise.</span></li><li><span>clojisr opens up a child R process just by <code>require</code>ing a namespace. I’m not a fan of that.</span></li></ul><p>But it’s definitely very promising! I applaud the effort and I’ll keep a close eye on it.</p><h2 id="key-takeaways">Key takeaways</h2><ul><li><span>Skyscraper makes data acquisition bearable, if not fun.</span></li><li><span>Plumatic Graph makes writing research code in Clojure fun.</span></li><li><span>ggplot makes plotting data fun.</span></li><li><span>Clojure makes programming fun. (But you knew that already.)</span></li></ul></div>tag:blog.danieljanus.pl,2020-05-02:post:clojure-dependencyClojure as a dependency2020-05-02T00:00:00Z<div><p>I have a shameful confession to make: I have long neglected an open-source library that I maintain, <a href="https://github.com/nathell/clj-tagsoup">clj-tagsoup</a>.</p><p>This would have been less of an issue, but this is my second-most-starred project on GitHub. Granted, I don’t feel a need for it anymore, but apparently people do. I wish I had spent some time reviewing and merging the incoming PRs.</p><p>Anyway, I’ve recently been prompted to revive it, and I’m preparing a new release. While on it, I’ve been updating dependencies to their latest versions, and upon seeing a dependency on <code>[org.clojure/clojure "1.2.0"]</code> in <code>project.clj</code> (yes, it’s been neglected for that long), I started wondering: which Clojure to depend on? Actually, should Clojure itself be a dependency at all?</p><p>I’ve googled around for best practices, but with no conclusive answer. So I set out to do some research.</p><p><strong>TLDR:</strong> with Leiningen, add it with <code>:scope "provided"</code>; with cli-tools, you don’t have to, unless you want to be explicit.</p><h2 id="is-it-possible-for-a-clojure-project-to-declare-no-dependency-on-clojure-at-all?">Is it possible for a Clojure project to declare no dependency on Clojure at all?</h2><p>Quite possible, as it turns out. But the details depend on the build tool.</p><p>Obviously, this only makes sense for libraries. Or, more broadly, for projects that are not meant to be used standalone, but rather included in other projects (which will have a Clojure dependency of their own).</p><h3 id="leiningen">Leiningen</h3><p>If you try to create a Leiningen project that has no dependencies:</p><pre><code class="hljs clojure">(<span class="hljs-name">defproject</span> foo <span class="hljs-string">"0.1.0"</span>
<span class="hljs-symbol">:dependencies</span> [])
</code></pre><p>then Leiningen (as of version 2.9.3, but I’d guess older versions behave similarly) won’t allow you to launch a REPL:</p><pre><code>$ lein repl
Error: Could not find or load main class clojure.main
Caused by: java.lang.ClassNotFoundException: clojure.main
Subprocess failed (exit code: 1)
</code></pre><p>But all is not lost: <code>lein jar</code> works just fine (as long as you don’t AOT-compile any namespaces), as does <code>lein install</code>. The resulting library will happily function as a dependency of other projects.</p><p>The upside of depending on no particular Clojure version is that you don’t impose it on your consumers. If a library depends on Clojure 1.9.0, but a project that uses it depends on Clojure 1.10.1, then Leiningen will fetch 1.9.0’s <code>pom.xml</code> (it’s smart enough to figure out that the jar itself won’t be needed, as the conflict will always be resolved in favour of the direct dependency), and <code>lein deps :tree</code> will report “possibly confusing dependencies”.</p><p>It’s not very useful to have a library that you can’t launch a REPL against, though. So what some people do is declare a dependency on Clojure not in the main <code>:dependencies</code>, but in a profile.</p><pre><code class="hljs clojure">(<span class="hljs-name">defproject</span> foo <span class="hljs-string">"0.1.0"</span>
<span class="hljs-symbol">:dependencies</span> []
<span class="hljs-symbol">:profiles</span> {<span class="hljs-symbol">:dev</span> {<span class="hljs-symbol">:dependencies</span> [[org.clojure/clojure <span class="hljs-string">"1.10.1"</span>]]}})
</code></pre><p>This avoids conflicts and brings back the possibility to launch a REPL. Sometimes, people create multiple profiles for different Clojure versions; <a href="https://github.com/technomancy/leiningen/blob/master/doc/PROFILES.md">Leiningen’s documentation</a> mentions this possibility.</p><p>Unfortunately, with this approach it’s still not possible to AOT-compile things or create uberjars with Leiningen. (Putting Clojure in the <code>:provided</code> profile causes building the uberjar to succeed, but the resulting <code>-standalone</code> jar doesn’t actually contain Clojure).</p><p>Another option is to add Clojure to the main <code>:dependencies</code>, but with <code>:scope "provided"</code>. Per the <a href="http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html">Maven documentation</a>, this means:</p><blockquote><p>This is much like <code>compile</code>, but indicates you expect the JDK or a container to provide the dependency at runtime. For example, when building a web application for the Java Enterprise Edition, you would set the dependency on the Servlet API and related Java EE APIs to scope <code>provided</code> because the web container provides those classes. This scope is only available on the compilation and test classpath, and is not transitive.</p></blockquote><p>The key are the last words: “not transitive.” If project A depends on a library B that declares a “provided” dependency C, then C won’t be automatically put in A’s dependencies, and A is expected to explicitly declare its own C.</p><p>This means that it’s adequate for both libraries and standalone projects when it comes to declaring a Clojure dependency. It doesn’t break anything, doesn’t cause any ephemeral conflicts, and can be combined with the profiles approach when multiple configurations are called for.</p><h3 id="cli-tools">cli-tools</h3><p>cli-tools will accept a <code>deps.edn</code> as simple as <code>{}</code>. Even passing <code>-Srepro</code> to <code>clojure</code> or <code>clj</code> (which excludes the Clojure dependency that you probably have in your <code>~/.clojure/deps.edn</code>) doesn’t break anything: cli-tools will just use 1.10.1 (at least as of version 1.10.1.536).</p><p>With cli-tools, as a library author you probably don’t have to declare a Clojure dependency at all. But things are less uniform in this land than they are in Leiningen (for example, there are quite a few uberjarrers to choose from), so it’s reasonable to check with your tooling first.</p><h3 id="boot">Boot</h3><p>I’m no longer a Boot user, so I can’t tell. But from what I know, it uses Aether just like Leiningen and Maven do, so I’d wager a guess the same caveats apply as for Leiningen. Haven’t checked, though.</p><h2 id="so-what-do-the-existing-projects-do?">So what do the existing projects do?</h2><p>I figured it would be a fun piece of research to examine how the popular projects depend (or don’t depend) on Clojure. I queried GitHub’s API for the 1000 most starred Clojure projects, fetched and parsed their <code>project.clj</code>s and/or <code>deps.edn</code>s, and tallied things up.</p><p>I’ll write a separate “making of” post, because it turned out to be an even more fun weekend project than I had anticipated. But for now, let me share the conclusions.</p><p>I ended up with 968 project definition files that I was able to successfully parse: 140 <code>deps.edn</code>s and 828 <code>project.clj</code>s. Here’s a breakdown of Clojure version declared as a “main” dependency (i.e., not in a profile or alias):</p><img src="/img/blog/clojure-versions.png">
<p>N/A means that there’s no dependency on Clojure declared, and “other” is an umbrella for the zoo of alphas, betas and snapshots.</p><p>As expected, not depending on Clojure is comparatively more popular in the cli-tools land: almost half (48.6%) of cli-tools projects don’t declare a Clojure dependency, versus 21.5% (174 projects) for Leiningen.</p><p>That Leiningen number still seemed quite high to me, so I dug a little deeper. Out of those 174 projects, 100 have Clojure somewhere in their <code>:profiles</code>. The remaining 74 are somewhat of outliers:</p><ul><li><span>some, like Ring or Pedestal, are umbrella projects composed of sub-projects (with the <code>lein-sub</code> plugin) that have actual dependencies themselves;</span></li><li><span>some, like Klipse or Reagent, are essentially ClojureScript-only;</span></li><li><span>some, like Overtone, use the <code>lein-tools-deps</code> plugin to store their dependencies in <code>deps.edn</code> while using Leiningen for other tasks.</span></li></ul><p>Finally, the popularity of <code>:scope "provided"</code> is much lower. Only 68 Leiningen projects specify it (8.9% of those that declare any dependencies), and only two <code>deps.edn</code> files do so (re-frame and fulcro – note that re-frame actually has both a <code>project.clj</code> and a <code>deps.edn</code>).</p></div>tag:blog.danieljanus.pl,2020-02-10:post:cond-indentationIndenting cond forms2020-02-10T00:00:00Z<div><p>Indentation matters when reading Clojure code. It is the primary visual cue that helps the reader discern the code structure. Most Clojure code seen in the wild conforms to either the <a href="https://github.com/bbatsov/clojure-style-guide#source-code-layout-organization">community style guide</a> or the proposed <a href="https://tonsky.me/blog/clojurefmt/">simplified rules</a>; the existing editors make it easy to reformat code to match them.</p><p>I find both these rulesets to be helpful when reading code. But there’s one corner-case that’s been irking me: <code>cond</code> forms.</p><p><code>cond</code> takes an even number of arguments: alternating test-expression pairs. They are commonly put next to each other, two forms per line.</p><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">cond</span></span>
test expr-1
another-test expr-2
<span class="hljs-symbol">:else</span> expr-3)
</code></pre><p>Sometimes, people align the expressions under one another, in a tabular fashion:</p><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">cond</span></span>
test expr-1
another-test expr-2
<span class="hljs-symbol">:else</span> expr-3)
</code></pre><p>But things get out of hand when either <code>tests</code> or <code>exprs</code> get longer and call for multiple lines themselves. There are several options here, all of them less than ideal.</p><h3 id="tests-and-expressions-next-to-each-other">Tests and expressions next to each other</h3><p>In other words, keep the above rule. Because we’ll have multiple lines in a form, this tends to make the resulting code axe-shaped:</p><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">cond</span></span>
(<span class="hljs-name"><span class="hljs-built_in">=</span></span> (<span class="hljs-name">some-function</span> something) expected-value) (<span class="hljs-name"><span class="hljs-built_in">do</span></span>
(<span class="hljs-name">do-this</span>)
(<span class="hljs-name">and-also-do-that</span>))
(<span class="hljs-name">another-predicate</span> something-else) (<span class="hljs-name"><span class="hljs-built_in">try</span></span>
(<span class="hljs-name">do-another-thing</span>)
(<span class="hljs-name">catch</span> Exception _
(<span class="hljs-name">println</span> <span class="hljs-string">"Whoops!"</span>))))
</code></pre><p>This yields code that is indented abnormally far to the right, forcing the reader’s eyeballs to move in two dimensions – even more so if the tabular feel is desired. If <em>both</em> the test and the expression is multi-lined, it just looks plain weird.</p><h3 id="stack-all-forms-vertically,-no-extra-spacing">Stack all forms vertically, no extra spacing</h3><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">cond</span></span>
(<span class="hljs-name"><span class="hljs-built_in">=</span></span> (<span class="hljs-name">some-function</span> something) expected-value)
(<span class="hljs-name"><span class="hljs-built_in">do</span></span>
(<span class="hljs-name">do-this</span>)
(<span class="hljs-name">and-also-do-that</span>))
(<span class="hljs-name">another-predicate</span> something-else)
(<span class="hljs-name"><span class="hljs-built_in">try</span></span>
(<span class="hljs-name">do-another-thing</span>)
(<span class="hljs-name">catch</span> Exception _
(<span class="hljs-name">println</span> <span class="hljs-string">"Whoops!"</span>))))
</code></pre><p>This gets rid of the long lines, but introduces another problem: it’s hard to tell at a glance</p><ul><li><span>where a given test or expression starts or ends;</span></li><li><span>which tests are paired with which expression;</span></li><li><span>whether a given line corresponds to a test or an expression, and which one.</span></li></ul><h3 id="stack-all-forms-vertically,-blank-lines-between-test/expr-pairs">Stack all forms vertically, blank lines between test/expr pairs</h3><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">cond</span></span>
(<span class="hljs-name"><span class="hljs-built_in">=</span></span> (<span class="hljs-name">some-function</span> something) expected-value)
(<span class="hljs-name"><span class="hljs-built_in">do</span></span>
(<span class="hljs-name">do-this</span>)
(<span class="hljs-name">and-also-do-that</span>))
(<span class="hljs-name">another-predicate</span> something-else)
(<span class="hljs-name"><span class="hljs-built_in">try</span></span>
(<span class="hljs-name">do-another-thing</span>)
(<span class="hljs-name">catch</span> Exception _
(<span class="hljs-name">println</span> <span class="hljs-string">"Whoops!"</span>))))
</code></pre><p>The Style Guide <a href="https://github.com/bbatsov/clojure-style-guide#short-forms-in-cond">says</a> that this is an “ok-ish” thing to do.</p><p>But with the added blank lines, logical structure of the code is much more apparent. However, it breaks another assumption that I make when reading the code: <em>functions contain no blank lines.</em> The Style Guide even <a href="https://github.com/bbatsov/clojure-style-guide#no-blank-lines-within-def-forms">mentions it</a>, saying that <code>cond</code> forms are an acceptable exception.</p><p>It is now harder to tell at a glance where the enclosing function starts or ends. And once this assumption is broken once, the brain expects it to be broken again, causing reading disruption across the entire file.</p><h3 id="forms-one-under-another,-extra-indentation-for-expressions-only">Forms one under another, extra indentation for expressions only</h3><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">cond</span></span>
(<span class="hljs-name"><span class="hljs-built_in">=</span></span> (<span class="hljs-name">some-function</span> something) expected-value)
(<span class="hljs-name"><span class="hljs-built_in">do</span></span>
(<span class="hljs-name">do-this</span>)
(<span class="hljs-name">and-also-do-that</span>))
(<span class="hljs-name">another-predicate</span> something-else)
(<span class="hljs-name"><span class="hljs-built_in">try</span></span>
(<span class="hljs-name">do-another-thing</span>)
(<span class="hljs-name">catch</span> Exception _
(<span class="hljs-name">println</span> <span class="hljs-string">"Whoops!"</span>))))
</code></pre><p>I resorted to this several times. The lines are not too long; the visual cues are there; it’s obvious what is the condition, what is the test, and what goes with what.</p><p>Except… it’s against the rules. List items stacked vertically should be aligned one under the other. I have to actively fight my Emacs to enforce this formatting, and it will be lost next time I press <code>C-M-q</code> on this form. No good.</p><h3 id="forms-one-under-another,-expressions-prefixed-by-#-=>">Forms one under another, expressions prefixed by <code>#_=></code></h3><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">cond</span></span>
(<span class="hljs-name"><span class="hljs-built_in">=</span></span> (<span class="hljs-name">some-function</span> something) expected-value)
#_=> (<span class="hljs-name"><span class="hljs-built_in">do</span></span>
(<span class="hljs-name">do-this</span>)
(<span class="hljs-name">and-also-do-that</span>))
(<span class="hljs-name">another-predicate</span> something-else)
#_=> (<span class="hljs-name"><span class="hljs-built_in">try</span></span>
(<span class="hljs-name">do-another-thing</span>)
(<span class="hljs-name">catch</span> Exception _
(<span class="hljs-name">println</span> <span class="hljs-string">"Whoops!"</span>))))
</code></pre><p>This one is my own invention: I haven’t seen it anywhere else. But I think it manages to avoid most problems.</p><p><code>#_</code> is a reader macro that causes the next form to be elided and not seen by the compiler. <code>=></code> is a valid form. Thus, <code>#_=></code> is effectively whitespace as far as the compiler is concerned, and the indentation rules treat it as yet another symbol (although it technically isn’t one). No tooling is broken, no assumptions are broken, and the <code>#_=></code> tends to be syntax-highlighted unintrusively so it doesn’t stand out. I tend to read it aloud as “then.”</p><h3 id="meanwhile,-in-another-galaxy">Meanwhile, in another galaxy</h3><p>Other Lisps (Scheme and CL) wrap each test/expression pair in an extra pair of parens, thereby avoiding the blending of conditions and expressions when indented one under the other. But I’m still happy Clojure went with fewer parens. As I say, this is a corner case where additional pair of parens would somewhat help, but most of the time I find them less aesthetic and a visual clutter.</p></div>tag:blog.danieljanus.pl,2020-01-21:post:middlewareCareful with that middleware, Eugene2020-01-21T00:00:00Z<div><h2 id="prologue">Prologue</h2><p>I’ll be releasing version 0.3 of <a href="https://github.com/nathell/skyscraper">Skyscraper</a>, my Clojure framework for scraping entire sites, in a few days.</p><p>More than three years have passed since its last release. During that time, I’ve made a number of attempts at redesigning it to be more robust, more usable, and faster; the last one, resulting in an almost complete rewrite, is now almost ready for public use as I’m ironing out the rough edges, documenting it, and adding tests.</p><p>It’s been a long journey and I’ll blog about it someday; but today, I’d like to tell another story: one of a nasty bug I had encountered.</p><h2 id="part-one:-wrap,-wrap,-wrap,-wrap">Part One: Wrap, wrap, wrap, wrap</h2><p>While updating the code of one of my old scrapers to use the API of Skyscraper 0.3, I noticed an odd thing: some of the output records contained scrambled text. Apparently, the character encoding was not recognised properly.</p><p>“Weird,” I thought. Skyscraper should be extra careful about honoring the encoding of pages being scraped (declared either in the headers, or the <code><meta http-equiv></code> tag). In fact, I remembered having seen it working. What was wrong?</p><p>For every page that it downloads, Skyscraper 0.3 caches the HTTP response body along with the headers so that it doesn’t have to be downloaded again; the headers are needed to ensure proper encoding when parsing a cached page. The headers are lower-cased, so that Skyscraper can then call <code>(get all-headers "content-type")</code> to get the encoding declared in headers. If this step is missed, and the server returns the encoding in a header named <code>Content-Type</code>, it won’t be matched. Kaboom!</p><p>I looked at the cache, and sure enough, the header names in the cache were not lower-cased, even though they should be. But why?</p><p>Maybe I was mistaken, and I had forgotten the lower-casing after all? A glance at the code: no. The lower-casing was there, right around the call to the download function.</p><p>Digression: Skyscraper uses <a href="https://github.com/dakrone/clj-http">clj-http</a> to download pages. clj-http, in turn, uses the <a href="http://clojure-doc.org/articles/cookbooks/middleware.html">middleware pattern</a>: there’s a “bare” request function, and then there are wrapper functions that implement things like redirects, OAuth, exception handling, and what have you. I say “wrapper” because they literally wrap the bare function: <code>(wrap-something request)</code> returns another function that acts just like <code>request</code>, but with added functionality. And that other function can in turn be wrapped with yet another one, and so on.</p><p>There’s a default set of middleware wrappers defined by clj-http, and it also provides a macro, <code>with-additional-middleware</code>, which allows you to specify additional wrappers. One such wrapper is <code>wrap-lower-case-headers</code>, which, as the name suggests, causes the response’s header keys to be returned in lower case.</p><p>Back to Skyscraper. We’re ready to look at the code now. Can you spot the problem?</p><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [request-fn (<span class="hljs-name"><span class="hljs-built_in">or</span></span> (<span class="hljs-symbol">:request-fn</span> options)
http/request)]
(<span class="hljs-name">http/with-additional-middleware</span> [http/wrap-lower-case-headers]
(<span class="hljs-name">request-fn</span> req
success-fn
error-fn)))
</code></pre><p>I stared at it for several minutes, did some dirty experiments in the REPL, perused the code of clj-http, until it dawned on me.</p><p>See that <code>request-fn</code>? Even though Skyscraper uses <code>http/request</code> by default, you can override it in the options to supply your own way of doing HTTP. (Some of the tests use it to mock calls to a HTTP server.) In this particular case, it was not overridden, though: the usual <code>http/request</code> was used. So things looked good: within the body of <code>http/with-additional-middleware</code>, headers should be lower-cased because <code>request-fn</code> is <code>http/request</code>.</p><p>Or is it?</p><p>Let me show you how <code>with-additional-middleware</code> is implemented. It expands to another macro, <code>with-middleware</code>, which is defined as follows (docstring redacted):</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defmacro</span> <span class="hljs-title">with-middleware</span>
[middleware & body]
`(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [m# ~middleware]
(<span class="hljs-name">binding</span> [*current-middleware* m#
clj-http.client/request (<span class="hljs-name"><span class="hljs-built_in">reduce</span></span> #(%<span class="hljs-number">2</span> %<span class="hljs-number">1</span>)
clj-http.core/request
m#)]
~@body)))
</code></pre><p>That’s right: <code>with-middleware</code> works by dynamically rebinding <code>http/request</code>. Which means the <code>request-fn</code> I was calling is not actually the wrapped version, but the one captured by the outer <code>let</code>, the one that wasn’t rebound, the one without the additional middleware!</p><p>After this light-bulb moment, I moved <code>with-additional-middleware</code> outside of the <code>let</code>:</p><pre><code class="hljs clojure">(<span class="hljs-name">http/with-additional-middleware</span> [http/wrap-lower-case-headers]
(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [request-fn (<span class="hljs-name"><span class="hljs-built_in">or</span></span> (<span class="hljs-symbol">:request-fn</span> options)
http/request)]
(<span class="hljs-name">request-fn</span> req
success-fn
error-fn)))
</code></pre><p>And, sure enough, it worked.</p><h2 id="part-two:-the-tests-are-screaming-loud">Part Two: The tests are screaming loud</h2><p>Is it the end of the story? I’m guessing you’re thinking it is. I thought so too. But I wanted to add one last thing: a regression test, so I’d never run into the same problem in the future.</p><p>I whipped up a test in which one ISO-8859-2-encoded page was scraped, and a check for the correct string was made. I ran it against the fixed code. It was green. I ran it against the previous, broken version…</p><p>It was <em>green</em>, too.</p><p>At this point, I knew I had to get to the bottom of this.</p><p>Back to experimenting. After a while, I found out that extracting encoding from a freshly-downloaded page actually worked fine! It only failed when parsing headers fetched from a cache. But the map was the same in both cases! In both cases, the code was effectively doing</p><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">get</span></span> {<span class="hljs-string">"Content-Type"</span> <span class="hljs-string">"text/html; charset=ISO-8859-2"</span>}
<span class="hljs-string">"content-type"</span>)
</code></pre><p>This lookup <em>shouldn’t</em> succeed: in map lookup, string comparison is case-sensitive. And yet, for freshly-downloaded headers, it <em>did</em> succeed!</p><p>I checked the <code>type</code> of both maps. One of them was a <code>clojure.lang.PersistentHashMap</code>, as expected. The other one was not. It was actually a <code>clj_http.headers.HeaderMap</code>.</p><p>I’ll let the comment of that one speak for itself:</p><blockquote><p>a map implementation that stores both the original (or canonical) key and value for each key/value pair, but performs lookups and other operations using the normalized – this allows a value to be looked up by many similar keys, and not just the exact precise key it was originally stored with.</p></blockquote><p>And so it turned out that the library authors have actually foreseen the need for looking up headers irrespective of case, and provided a helpful means for that. The whole lowercasing business was not needed, after all!</p><p>I stripped out the <code>with-additional-middleware</code> altogether, added some code elsewhere to ensure that the header map is a <code>HeaderMap</code> regardless of whether it comes from the cache or not, and they lived happily ever after.</p><h2 id="epilogue">Epilogue</h2><p>Moral of the story? It’s twofold.</p><ul><li><span>Dynamic rebinding can be dangerous. Having a public API that is implemented in terms of dynamic rebinding, even more so. I’d prefer if clj-http just allowed the custom middleware to be explicitly specified as an argument, thusly:</span></li></ul><pre><code class="hljs clojure">(<span class="hljs-name">http/request</span> req
<span class="hljs-symbol">:additional-middleware</span> [http/wrap-lower-case-headers])
</code></pre><ul><li><span>Know your dependencies. If you have a problem that might be generically addressed by the library you’re using, look deeper. It might be there already.</span></li></ul><p>Thanks to <a href="https://www.3jane.co.uk">3Jane</a> for proofreading this article.</p></div>tag:blog.danieljanus.pl,2020-01-03:post:word-championsWord Champions2020-01-03T00:00:00Z<div><p>This story begins on August 9, 2017, when a friend messaged me on Facebook: “Hey, I’m going to be on a TV talent show this weekend. They’ll be giving me this kind of problems. Any ideas how to prepare?”</p><p>He attached a link to this video:</p><iframe width="100%" height="500" src="https://www.youtube.com/embed/34AcKyYdNBo" frameborder="0" allowfullscreen></iframe>
<p>Now, we’re both avid Scrabble players, so we explored some ideas about extracting helpful data out of the <a href="http://www.pfs.org.pl/english.php">Official Polish Scrabble Player’s Dictionary</a>. I launched a Clojure REPL and wrote some throwaway code to generate sample training problems for Krzysztof. The code used a brute-force algorithm, so it was dog slow, but it was a start. It was Wednesday.</p><p>I woke up next morning with the problem still in my head. Clearly, I had found myself in a <a href="https://xkcd.com/356/">nerd sniping</a> situation.</p><img src="https://imgs.xkcd.com/comics/nerd_sniping.png">
<p>There was only one obvious way out—to write a full-blown training app so that Krzysztof could practice as if he were in the studio. The clock was ticking: we had two days left.</p><p>After work, I started a fresh <a href="https://github.com/day8/re-frame/">re-frame</a> project. (I was a recent re-frame convert those days, so I wanted to see how well it could cope with the task at hand.) Late that night, or rather early next morning, the prototype was ready.</p><p>It had very messy code. It only worked on Chrome. It failed miserably on mobile. It took ages to load. It had native JS dependencies, notably <a href="https://material-ui.com/">Material-UI</a> and <a href="https://react-dnd.github.io/react-dnd/about">react-dnd</a>, and for some reason it would not compile with ClojureScript’s advanced optimization turned on; so it weighed in at more than 6 MB, slurping in more than 300 JS files on load.</p><p>But it worked.</p><p>Krzysztof didn’t win his episode against the other contestants, ending up third, but he completed his challenge successfully. It took him 3 minutes and 42 seconds, out of 5 minutes allotted. The episode aired on 24 October.</p><iframe width="100%" height="500" src="https://www.youtube.com/embed/7ec6j31nlAk" frameborder="0" allowfullscreen></iframe>
<p>Krzysztof said that the problem he ended up solving on the show was way easier than the ones generated by the app: had they been more difficult, the wow factor might have been higher.</p><p>Several months later, we met at a Scrabble tournament, and I received a present. I wish I had photographed that bottle of wine, so I could show it here, but I hadn’t.</p><p>Meanwhile, the code remained messy and low-priority. But I kept returning to it when I felt like it, fixing up things one at a time. I’ve added difficulty levels, so you can have only one diagram, or three. I’ve made it work on Firefox. I’ve done a major rewrite, restructuring the code in a sane way and removing the JS dependencies other than React. I’ve made advanced compilation work, getting the JS down to 400K. I’ve made it work on mobile devices. I’ve written a puzzle generator in C, which ended up several orders of magnitude faster than the prototype Clojure version (it’s still brute-force, but uses some dirty C tricks to speed things up; I hope to rewrite it in Rust someday).</p><p>And now, 2½ years later, I’ve added an English version, with an accompanying set of puzzles (generated from a wordlist taken from <a href="https://github.com/first20hours/google-10000-english">this repo</a>), for the English-speaking world to enjoy.</p><p><a href="http://danieljanus.pl/wladcyslow/">Play Word Champions now!</a></p><p>The code is <a href="https://github.com/nathell/wordchampions">on GitHub</a> if you’d like to check it out or try hacking on it. It’s small, less than 1KLOC in total, so I think it can be a learning tool for re-frame or ClojureScript.</p><p>(This game as featured on the TV shows is called Gridlock. The name “Word Champions” was inspired by the title of Krzysztof’s video on YouTube, literally meaning “Lord of the Words”. There is no pun in the Polish title.)</p></div>tag:blog.danieljanus.pl,2019-10-07:post:web-of-documentsWeb of Documents2019-10-07T00:00:00Z<div><p>In 1960, Ted Nelson envisioned a web of documents.</p><p>It was called <a href="https://en.wikipedia.org/wiki/Project_Xanadu">Xanadu</a>. It was a grand, holistic vision: of documents that, once published, are available basically forever; of bidirectional links that could glue together not just documents, but parts thereof; of managing copyright and royalties. It was complex. And it never really came to fruition.</p><p>But thirty-one years later, another web of documents took off. A much more modest undertaking than Xanadu, with a simple markup language, simple protocol to retrieve the documents, unidirectional, ever-rotting links, and not much else. The World Wide Web. It was prototyped by one man in a few months. And then its popularity exploded.</p><p>As the WWW spread, it grew features. Soon, it was not enough for the documents to contain just text: support for images was added. People wanted to customize the look of the documents, so HTML gained presentational markup abilities, eventually obsoleted by CSS. It was not enough to be able to view the menu of your local pizza store – people wanted to actually order a pizza: the need for sessions yielded cookies and non-idempotent HTTP methods. And people wanted the pages to be interactive, so they became scriptable.</p><p>All these features were good. They helped the Web meet actual needs. But having them has a significant consequence, one that is seldom realized:</p><p>We don’t have a Web of Documents anymore.</p><p>Let me pause at this point. I’ve been using the word “document” intuitively and vaguely so far, so let’s try to pinpoint it. I don’t have a precise definition in mind, but I’ll share some examples. A book is a document, to me. So is a picture, an illustrated text, a scientific paper, a MP3 song, or a video. By contrast, a page that lets you play Tetris isn’t. The essence of this distinction seems to be that documents have well-defined <em>content</em> that does not change between viewings and does not depend on the state of the outside world. A document is stateless. It exists in and of itself; it is its own microcosm. It may be experienced interactively, but only insofar as it enables the experiencer to focus their attention on the part of their own choosing; the potential state of that interaction is external to the document, not part of itself.</p><p>Obviously, this is not very accurate: there are border cases. For example, does a film DVD with a menu qualify as a document? Or how about a choose-your-own-adventure book? Or a HTML page with links to other pages? On the surface, the latter does provide out-of-microcosm interactivity; but viewed from another angle, it is no different than putting a reference in a book. The browser just makes it very easy to go to a shelf and pick another book.</p><p>The distinction is there, and it’s important. And with it in mind, let me reiterate:</p><p><em>We don’t have a Web of Documents anymore.</em></p><p>These days, the WWW is mostly a <em>Web of Applications</em>. An application is a broader concept: it can display text or images, but also lets you interact not just with itself, but with the world at large. And that’s all well and good, as long as you consciously intend these interactions to happen.</p><p>A document is safe. A book is safe: it will not explode in your hands, it will not magically alter its contents tomorrow, and if it happens to be illegal to possess, it will not call the authorities to denounce you. You can implicitly trust a document by virtue of it being one. An application, not so much.</p><p>I don’t want to name names, but it’s all too easy these days to follow a link to a news site, expecting an article, only to be greeted with “You have read N articles this month, please register to continue”. Definitely an application-y thing to say, not a document-y one. Now, the purveyors of such sites typically have legitimate economic interest in doing so—but once you sign up, they are able to record your actions, link them with your identity and build your shadow profile. This way, we have applications actively <em>masquerading</em> as documents, when in reality they <em>do non-documenty things</em> without telling you.</p><p>Legislation such as the EU Cookie Law and the GDPR (insofar as it requires disclosure of data processing) tries to remedy this. But the more I think about it, the more sense it makes to me to attack the problem closer to its root: to decomplect the notions of a document and an application; to keep the Web of Applications as it is, and to recreate a Web of Documents—either parallel to it, or as its sub-web.</p><p>To do this, we need to take a step back. (Or do a clean start and invent a whole new technology, but this is unlikely to succeed). Fortunately, we don’t have to travel all the way back to 1992, when the WWW was still a Web of Documents. (I still remember table-based layouts and spacer gifs, and the very memory makes me shudder). I think we can base the new Web of Documents on ol’ trusty HTTP (or, better, HTTPS), HTML and CSS as we know them today, with just three restraints:</p><ol><li><p><em>No methods other than GET</em> (and perhaps HEAD). POST, PUT, DELETE and friends just have no place in a world of documents. They are not idempotent; they potentially modify the state of the world, which documents should not be able to do. (I was also thinking “no forms”, but with #1 in place, it seems like an unnecessary refinement. After all, forms that translate to GET requests just facilitate creating URLs: a user could just as well have typed the resulting URL by hand.)</p></li><li><p><em>No scripts of any kind.</em> Not JavaScript, not WebAssembly. Not even to enrich a document, such as syntax-highlight the code snippets. This one may seem too stringent, but I think it’s better to err on the safe side, and it’s very easy to enforce.</p></li><li><p><em>No cookies.</em> Cookies by themselves aren’t interactive, but having them makes it all too easy to abuse the semantics of HTTP to recreate sessions, and on top of them reinvent the app-wheel and eventually forfeit the Web of Documents again.</p></li></ol><p>Again, there may be corner cases that may have escaped me. But if a WWW page conforms to these restrictions, I think it may be pretty safe to call it a “document” and make it a part of the Web of Documents.</p><p>How do we achieve this? I don’t know, really. I don’t have a concrete proposal. Perhaps we could have dedicated browsers for the WoD; perhaps we could make existing browsers prominently advertise to the user whether they are browsing a document or an application. On top of all the technical decisions to make, there’ll be significant campaigning and lobbying needed if the idea is ever to take off.</p><p>I don’t dare dream that it ever will. My intent in this article is to provide food for thought. All I ask from you, my reader, is consideration and attention. And if you got this far, chances are I got them. I’m grateful.</p><p>This page is a document. Thank you for reading it.</p></div>tag:blog.danieljanus.pl,2019-02-05:post:clj-tvisionRe-framing text-mode apps2019-02-05T00:00:00Z<div><h2 id="intro">Intro</h2><blockquote><p>“But, you know, many explorers liked to go to places that are unusual. And, it’s only for the fun of it.” – Richard P. Feynman</p></blockquote><p>A couple of nights ago, I hacked together a small Clojure program.</p><p>All it does is displays a terminal window with a red rectangle in it. You can use your cursor keys to move it around the window, and space bar to change its colour. It’s fun, but it doesn’t sound very useful, does it?</p><p>In this post, I’ll try to convince you that there’s more to this little toy than might at first sight appear. You may want to check out <a href="https://github.com/nathell/clj-tvision">the repo</a> as you go along.</p><h2 id="in-which-an-unexpected-appearance-is-made">In which an unexpected appearance is made</h2><p>(I’ve always envied <a href="https://technomancy.us">Phil Hagelberg</a> this kind of headlines.)</p><p>As you might have guessed from this article’s title, clj-tvision (a working name for the program) is a <a href="https://github.com/Day8/re-frame">re-frame</a> app.</p><p>For those of you who haven’t heard of re-frame, a word of explanation: it’s a ClojureScripty way of writing React apps, with Redux-like management of application state. If you do know re-frame (shameless plug: we at <a href="https://works-hub.com">WorksHub</a> do, and use it a lot: it powers the site you’re looking at right now!), you’ll instantly find yourself at home. However, a few moments later, a thought might dawn upon you, and you might start to feel a little uneasy…</p><p>Because I’ve mentioned React and ClojureScript, and yet I’d said earlier that we’re talking a text-mode application here. And I’ve mentioned that it’s written in Clojure. It is, in fact, not using React at all, and it has nothing to do whatsoever with ClojureScript, JavaScript, or the browser.</p><p>How is that even possible?</p><p>Here’s the catch: re-frame is implemented in <code>.cljc</code> files. So while it’s mostly used in the ClojureScript frontend, it <em>can</em> be used from Clojure. You may know this if you’re testing your events or subscriptions on the JVM.</p><p>While it’s mostly – if not hitherto exclusively – used for just that, I wanted to explore whether it could be used to manage state in an actual, non-web app. Text-mode is a great playground for this kind of exploration. Rather than picking a GUI toolkit and concern myself with its intricacies, I chose to just put things on a rectangular sheet of text characters.</p><p>(But if you are interested in pursuing a React-ish approach for GUIs, check out what Bodil Stokke’s been doing in <a href="https://github.com/bodil/vgtk">vgtk</a>.)</p><h2 id="living-without-the-dom">Living without the DOM</h2><p>The building blocks of a re-frame app are subscriptions, events, and views. While the first two work in Clojureland pretty much the same way they do in the browser (although there are differences, of which more anon), views are a different beast.</p><p><a href="https://github.com/Day8/re-frame/blob/master/docs/SubscriptionFlow.md">re-frame’s documentation</a> says that views are “data in, Hiccup out. Hiccup is ClojureScript data structures which represent DOM.” But outside of the browser realm, there’s no DOM. So let’s rephrase that more generally: re-frame views should produce <em>data structures which declaratively describe the component’s appearance to the user</em>. In web apps, those structures correspond to the DOM. What they will look like outside is up to us. We’ll be growing our own DOM-like model, piecemeal, as needs arise.</p><p>For clj-tvision, I’ve opted for a very simple thing. Let’s start with a concrete example. Here’s a view:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defn</span> <span class="hljs-title">view</span> []
[{<span class="hljs-symbol">:type</span> <span class="hljs-symbol">:rectangle</span><span class="hljs-punctuation">,</span> <span class="hljs-symbol">:x1</span> <span class="hljs-number">10</span><span class="hljs-punctuation">,</span> <span class="hljs-symbol">:y1</span> <span class="hljs-number">5</span><span class="hljs-punctuation">,</span> <span class="hljs-symbol">:x2</span> <span class="hljs-number">20</span><span class="hljs-punctuation">,</span> <span class="hljs-symbol">:y2</span> <span class="hljs-number">10</span><span class="hljs-punctuation">,</span> <span class="hljs-symbol">:color</span> <span class="hljs-symbol">:red</span>}])
</code></pre><p>Unlike in the DOM, in this model the UI state isn’t a tree. It’s a flat sequence of maps that each represent individual “primitive elements”. We could come up with a fancy buzzword-compliant name and call it Component List Model, or CLiM for short, in homage to <a href="https://en.wikipedia.org/wiki/Common_Lisp_Interface_Manager">the venerable GUI toolkit</a>.</p><p>Like normal re-frame views, CLiM views can include subviews. An example follows:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defn</span> <span class="hljs-title">square</span> [left top size color]
[{<span class="hljs-symbol">:type</span> <span class="hljs-symbol">:rectangle</span><span class="hljs-punctuation">,</span>
<span class="hljs-symbol">:x1</span> left<span class="hljs-punctuation">,</span>
<span class="hljs-symbol">:y1</span> top<span class="hljs-punctuation">,</span>
<span class="hljs-symbol">:x2</span> (<span class="hljs-name"><span class="hljs-built_in">+</span></span> left size <span class="hljs-number">-1</span>)<span class="hljs-punctuation">,</span>
<span class="hljs-symbol">:y2</span> (<span class="hljs-name"><span class="hljs-built_in">+</span></span> top size <span class="hljs-number">-1</span>)<span class="hljs-punctuation">,</span>
<span class="hljs-symbol">:color</span> color}])
(<span class="hljs-keyword">defn</span> <span class="hljs-title">view</span> []
[[square <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">5</span> <span class="hljs-symbol">:red</span>]
[square <span class="hljs-number">9</span> <span class="hljs-number">9</span> <span class="hljs-number">5</span> <span class="hljs-symbol">:blue</span>]])
</code></pre><p>How to render a view? Simple. First, flatten the list, performing funcalls on subviews so that you get a sequence containing only primitives. Then, draw each of them in order. (If there is an overlap, the trailers will obscure the leaders. Almost biblical.)</p><p>I’ve defined a multimethod, <code>render-primitive</code>, dispatching on <code>:type</code>. Its methods draw the corresponding primitive to a Lanterna screen.</p><p>Oh, didn’t I mention <a href="https://github.com/mabe02/lanterna">Lanterna</a>? It’s a Java library for terminals. Either real ones or emulated in Swing (easier to work with when you’re in a CIDER REPL). Plus, it sports virtual screens which can be blitted to a real terminal. This gives us a rough poor man’s equivalent of React’s VDOM. And it has a <a href="https://github.com/AvramRobert/clojure-lanterna">Clojure wrapper</a>!</p><h2 id="events-at-eventide">Events at eventide</h2><p>So now we know how to draw our UI. But an app isn’t made up of just drawing. It has a main loop: it listens to events, which cause the app state to change and the corresponding components to redraw.</p><p>re-frame does provide an event mechanism, but it doesn’t <em>define</em> any events per se. So we need to ask ourselves: who calls <em>dispatch</em>? How do events originate? How to write the main loop?</p><p>clj-tvision is a proof-of-concept, so it doesn’t concern itself with mouse support. There’s only one way a user can interact with the app: via the keyboard. So keystrokes will be the only “source events”, as it were, for the app; and so writing the event loop should be simple. Sketching pseudocode:</p><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">loop</span></span> []
(<span class="hljs-name">render-app</span>)
(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [keystroke (<span class="hljs-name">wait-for-key</span>)] <span class="hljs-comment">;; blocking!</span>
(<span class="hljs-name">dispatch</span> [<span class="hljs-symbol">:key-pressed</span> keystroke])
(<span class="hljs-name"><span class="hljs-built_in">recur</span></span>)))
</code></pre><p>Simple as that, should work, right?</p><p>Wrong.</p><p>If you actually try that, it’ll <em>somewhat</em> work. Hit right arrow to move the rectangle, nothing happens! Hit right arrow again, it moves. Hit left, it moves right. Hit right, it moves left. Not what you want.</p><p>You see, there’s a complication stemming from the fact that re-frame’s events are asynchronous by default. (Hence the <code>dispatch</code> vs. <code>dispatch-sync</code> dichotomy.) They don’t get dispatched immediately; rather, re-frame places them on a queue and processes them asynchronously, so that they don’t hog the browser. The Clojure version of re-frame handles that using a single-threaded executor with a dedicated thread.</p><p>We <em>almost</em> could use <code>dispatch-sync</code> everywhere, but for re-frame that’s a no-no: once within a <code>dispatch-sync</code> handler, you cannot dispatch other events. If you try anyway, re-frame will detect it and politely point its dragon-scaly head at you, explaining it doesn’t like it. (It is a benevolent dragon, you know.)</p><p>So we need to hook into that “next-tick” machinery of re-frame’s somehow. There are probably better ways of doing this, but I opted to blatantly redefine <code>re-frame.interop/next-tick</code> to tell the main loop: “hey, events have been handled and we have a new state, dispatch an event so we can redraw.” This is one of the rare cases where monkey-patching third-party code with <code>alter-var-root</code> saves you the hassle of forking that entire codebase.</p><p>So now we have <em>two</em> sources of events: keystrokes, and <code>next-tick</code>. To multiplex them, I’ve whipped up a channel with core.async. Feels hacky, but allows to add mouse support in the future. Or time-based events that will be fired periodically every so often.</p><p>For completeness, I should also add that Clojure-side re-frame doesn’t have the luxury of having reactive atoms provided by Reagent. Its ratoms are ordinary Clojure atoms. Unlike in ClojureScript, any time the app state changes, <em>every</em> subscription in the signal graph will be recomputed. It may well be possible to port Reagent’s ratoms to Clojure, but it is a far more advanced exercise. For simple apps, what re-frame provides on its own might just be enough.</p><p>And with that final bit, we can swipe all that hackitude under the carpet… or, should I say, tuck it into an internal ns that hopefully no-one will ever look into. And we’re left with shiny, declarative, re-framey, beautiful UI code on the surface. <a href="https://github.com/nathell/clj-tvision/blob/master/src/tvision/core.clj">Just look</a>.</p><h2 id="closing-thoughts">Closing thoughts</h2><blockquote><p>“Within C++, there is a much smaller and cleaner language struggling to get out.” – Bjarne Stroustrup</p></blockquote><p>If you’ve ever encountered legacy C++ code, this will ring true. Come to think of it, Stroustrup’s words are true of every system that has grown organically over its lifetime, with features being added to it but hardly ever removed.</p><p>And modern webapps may well be the epitome of that kind of system. We now have desktop apps that are fully self-contained on a single machine, yet use an overwhelmingly complex and vast machinery grown out of a simple system originally devised to <a href="http://info.cern.ch/hypertext/WWW/TheProject.html">view static documents over the Internet</a>.</p><p>For all that complexity, we continue to use it. Partly owing to its ubiquity, partly for convenience. In my experience, the abstractions provided by re-frame allow you to wrap your head around large apps and reason about them much more easily than, say, object-oriented approaches. It just feels right. Conversely, writing an app in, say, GTK+ would now feel like a setback by some twenty years.</p><p>So this toy, this movable rectangle on a black screen, is not so much an app as it is a philosophical exercise. It is what my typing fingers produced while I pondered, weak and weary: “can we throw away most of that cruft, while still enjoying the abstractions that make life so much easier?”</p><p>Can we?</p><p>This post was originally published on <a href="https://functional.works-hub.com/learn/re-framing-text-mode-apps-fd5cf">Functional Works</a>.</p></div>tag:blog.danieljanus.pl,2014-09-13:post:happy-programmers-dayHappy Programmers’ Day!2014-09-13T00:00:00Z<div><p>Happy <a href="https://en.wikipedia.org/wiki/Programmers'_Day">Programmers’ Day</a>, everyone!</p><p>A feast isn’t a feast, though, until it has a proper way of celebrating it. The <a href="https://en.wikipedia.org/wiki/Pi_Day">Pi Day</a>, for instance, has one: you eat a pie (preferably exactly at 1:59:26.535am), but I haven’t heard of any way of celebrating the Programmers’ Day, so I had to invent one. An obvious way would be to write a program, preferably a non-trivial one, but that requires time and dedication, which not everyone is able to readily spare.</p><p>So here’s my idea: on Programmers’ Day, dust off a program that you wrote some time ago — something that is just lying around in some far corner of your hard disk, that you haven’t looked at in years, but that you had fun writing — and put it on <a href="https://github.com/">GitHub</a> for all the world to see, to share the joy of programming.</p><p>Let me initialize the new tradition by doing this myself. Here’s <a href="https://github.com/nathell/haze">HAZE</a>, the Haskellish Abominable Z-machine Emulator. It was my final assignment for a course in Advanced Functional Programming, in my fourth year at the Uni, way back in 2004. It is an emulator for an ancient kind of virtual machine, the <a href="https://en.wikipedia.org/wiki/Z-machine">Z-machine</a>, written from scratch in Haskell. It allows you to play text adventure games, such as <a href="https://en.wikipedia.org/wiki/Zork">Zork</a>, much in the vein of <a href="https://davidgriffith.gitlab.io/frotz/">Frotz</a>. It’s not very complete, and supports versions of the Z-machine up to 3 only, so newer games won’t run on it as it stands, but Zork is playable.</p><p>It probably won’t even compile in modern Haskell systems: it was originally written for GHC version 6.2.1, and extensively uses the FiniteMap data type, which was obsoleted shortly after and is no longer found in modern systems. I should have Linux and Windows binaries lying around (yes, I had compiled it under Windows, using MinGW/PDCurses); I’ll put them on GitHub once I find them.</p><p>My mind now wanders ten years back in time, to the days when I was writing it. It took me about three summer weeks to write HAZE from scratch, most of that time on a slow laptop where it took quite a lot of seconds to get GHC to compile even a simple thing. I would do some of it differently if I were doing it now — for one, the state of a <code>ZMachine</code> is a central datatype to HAZE, and you’ll find a lot of functions that take and return ZMachines, so a state monad is an obvious choice; I didn’t understand monads well enough back then. But I still remember how I had the framework in place already and I was adding implementations of Z-code opcodes, one by one, to <code>ZMachine/ZCode/Impl.hs</code>, recompiling, rerunning, getting messages about unimplemented opcodes, when all of a sudden I got the familiar message about a white house and a small mailbox. Freude!</p><p>I hope you enjoy looking at it at least half as much as I had enjoyed writing it.</p></div>tag:blog.danieljanus.pl,2014-05-20:post:you-already-use-lisp-syntaxYou already use Lisp syntax2014-05-20T00:00:00Z<div><p><strong>Unix Developer:</strong> I’m not going to touch Lisp. It’s horrible!</p><p><strong>Me:</strong> Why so?</p><p><strong>UD:</strong> The syntax! This illegible prefix-RPN syntax that nobody else uses. And just look at all these parens!</p><p><strong>Me:</strong> Well, many people find it perfectly legible, although most agree that it takes some time to get accustomed to. But I think you’re mistaken. Lots of people are using Lisp syntax on a daily basis…</p><p><strong>UD:</strong> I happen to know no one doing this.</p><p><strong>Me:</strong> …without actually realizing this. In fact, I think <em>you</em> yourself are using it.</p><p><strong>UD:</strong> Wait, <em>what</em>?!</p><p><strong>Me:</strong> And the particular variant of Lisp syntax you’re using is called Bourne shell.</p><p><strong>UD:</strong> Now I don’t understand. What on earth does the shell have to do with Lisp?</p><p><strong>Me:</strong> Just look: in the shell, you put the name of the program first, followed by the arguments, separated by spaces. In Lisp it’s exactly the same, except that you put an opening paren at the beginning and a closing paren at the end.</p><p>Shell: <code>run-something arg1 arg2 arg3</code></p><p>Lisp: <code>(run-something arg1 arg2 arg3)</code></p><p><strong>UD:</strong> I still don’t get the analogy.</p><p><strong>Me:</strong> Then you need a mechanism for expression composition — putting the output of one expression as an input to another. In Lisp, you just nest the lists. And in the shell?</p><p><strong>UD:</strong> Backticks.</p><p><strong>Me:</strong> That’s right. Or <code>$()</code>, which has the advantage of being more easily nestable. Let’s try arithmetic. How do you do arithmetic in the shell?</p><p><strong>UD:</strong> <code>expr</code>. Or the Bash builtin <code>let</code>. For example,</p><pre><code class="hljs bash">$ <span class="hljs-built_in">let</span> x=<span class="hljs-string">'2*((10+4)/7)'</span>; <span class="hljs-built_in">echo</span> <span class="hljs-variable">$x</span>
4
</code></pre><p><strong>Me:</strong> Now wouldn’t it be in line with the spirit of Unix — to have programs do just one thing — if we had one program to do addition, and another to do subtraction, and yet another to do multiplication and division?</p><p>It’s trivial to write it in C:</p><pre><code class="hljs c"><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><stdio.h></span></span>
<span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><stdlib.h></span></span>
<span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><string.h></span></span>
<span class="hljs-type">int</span> <span class="hljs-title function_">main</span><span class="hljs-params">(<span class="hljs-type">int</span> argc, <span class="hljs-type">char</span> **argv)</span> {
<span class="hljs-type">int</span> mode = <span class="hljs-number">-1</span>, cnt = argc - <span class="hljs-number">1</span>, val, i;
<span class="hljs-type">char</span> **args = argv + <span class="hljs-number">1</span>;
<span class="hljs-keyword">switch</span> (argv[<span class="hljs-number">0</span>][<span class="hljs-built_in">strlen</span>(argv[<span class="hljs-number">0</span>]) - <span class="hljs-number">1</span>]) {
<span class="hljs-keyword">case</span> <span class="hljs-string">'+'</span>: mode = <span class="hljs-number">0</span>; <span class="hljs-keyword">break</span>;
<span class="hljs-keyword">case</span> <span class="hljs-string">'-'</span>: mode = <span class="hljs-number">1</span>; <span class="hljs-keyword">break</span>;
<span class="hljs-keyword">case</span> <span class="hljs-string">'x'</span>: mode = <span class="hljs-number">2</span>; <span class="hljs-keyword">break</span>;
<span class="hljs-keyword">case</span> <span class="hljs-string">'d'</span>: mode = <span class="hljs-number">3</span>; <span class="hljs-keyword">break</span>;
}
<span class="hljs-keyword">if</span> (mode == <span class="hljs-number">-1</span>) {
<span class="hljs-built_in">fprintf</span>(<span class="hljs-built_in">stderr</span>, <span class="hljs-string">"invalid math operation\n"</span>);
<span class="hljs-keyword">return</span> <span class="hljs-number">1</span>;
}
<span class="hljs-keyword">if</span> ((mode == <span class="hljs-number">1</span> || mode == <span class="hljs-number">3</span>) && !cnt) {
<span class="hljs-built_in">fprintf</span>(<span class="hljs-built_in">stderr</span>, <span class="hljs-string">"%s requires at least one arg\n"</span>, argv[<span class="hljs-number">0</span>]);
<span class="hljs-keyword">return</span> <span class="hljs-number">1</span>;
}
<span class="hljs-keyword">switch</span> (mode) {
<span class="hljs-keyword">case</span> <span class="hljs-number">0</span>: val = <span class="hljs-number">0</span>; <span class="hljs-keyword">break</span>;
<span class="hljs-keyword">case</span> <span class="hljs-number">2</span>: val = <span class="hljs-number">1</span>; <span class="hljs-keyword">break</span>;
<span class="hljs-keyword">default</span>: val = atoi(*args++); cnt--; <span class="hljs-keyword">break</span>;
}
<span class="hljs-keyword">while</span> (cnt--) {
<span class="hljs-keyword">switch</span> (mode) {
<span class="hljs-keyword">case</span> <span class="hljs-number">0</span>: val += atoi(*args++); <span class="hljs-keyword">break</span>;
<span class="hljs-keyword">case</span> <span class="hljs-number">1</span>: val -= atoi(*args++); <span class="hljs-keyword">break</span>;
<span class="hljs-keyword">case</span> <span class="hljs-number">2</span>: val *= atoi(*args++); <span class="hljs-keyword">break</span>;
<span class="hljs-keyword">case</span> <span class="hljs-number">3</span>: val /= atoi(*args++); <span class="hljs-keyword">break</span>;
}
}
<span class="hljs-built_in">printf</span>(<span class="hljs-string">"%d\n"</span>, val);
<span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre><p>This dispatches on the last character of its name, so it can be symlinked to <code>+</code>, <code>-</code>, <code>x</code> and <code>d</code> (I picked unusual names for multiplication and division to make them legal and avoid escaping).</p><p>Now behold:</p><pre><code class="hljs bash">$ x 2 $(d $(+ 10 4) 7)
4
</code></pre><p><strong>UD:</strong> Wow, this sure looks a lot like Lisp!</p><p><strong>Me:</strong> And yet it’s the shell. Our two basic rules — program-name-first and <code>$()</code>-for-composition — allowed us to explicitly specify the order of evaluation, so there was no need to do any fancy parsing beyond what the shell already provides.</p><p><strong>UD:</strong> So is the shell a Lisp?</p><p><strong>Me:</strong> Not really. The shell is <a href="http://blog.codinghorror.com/new-programming-jargon/">stringly typed</a>: a program takes textual parameters and produces textual output. To qualify as a Lisp, it would have to have a composite type: a list or a cons cell to build lists on top of. Then, you’d be able to represent code as this data structure, and write programs to transform code to other code.</p><p>But the Tao of Lisp lingers in the shell syntax.</p><hr>
<p>I know I’ve glossed over many details here, like the shell syntax for redirection, globbing, subprocesses, the fact that programs have standard input in addition to command-line arguments, pipes, etc. — all these make the analogy rather weak. But I think it’s an interesting way to teach Lisp syntax to people.</p></div>tag:blog.danieljanus.pl,2014-04-06:post:dos-debugging-quirkDOS debugging quirk2014-04-06T00:00:00Z<div><p>While hacking on Lithium, I’ve noticed an interesting thing. Here’s a sample DOS program in assembly (TASM syntax):</p><pre><code class="hljs x86asm"><span class="hljs-meta">.model</span> tiny
<span class="hljs-meta">.code</span>
org <span class="hljs-number">100h</span>
N <span class="hljs-built_in">equ</span> <span class="hljs-number">2</span>
<span class="hljs-symbol">
start:</span>
<span class="hljs-keyword">mov</span> <span class="hljs-built_in">bp</span>,<span class="hljs-built_in">sp</span>
<span class="hljs-keyword">mov</span> <span class="hljs-built_in">ax</span>,<span class="hljs-number">100</span>
<span class="hljs-keyword">mov</span> [<span class="hljs-built_in">bp</span>-N],<span class="hljs-built_in">ax</span>
<span class="hljs-keyword">mov</span> <span class="hljs-built_in">cx</span>,[<span class="hljs-built_in">bp</span>-N]
<span class="hljs-keyword">cmp</span> <span class="hljs-built_in">cx</span>,<span class="hljs-built_in">ax</span>
<span class="hljs-keyword">jne</span> wrong
<span class="hljs-keyword">mov</span> <span class="hljs-built_in">dx</span>,offset msg
<span class="hljs-keyword">jmp</span> disp
<span class="hljs-symbol">wrong:</span>
<span class="hljs-keyword">mov</span> <span class="hljs-built_in">dx</span>,offset msg2
<span class="hljs-symbol">disp:</span>
<span class="hljs-keyword">mov</span> <span class="hljs-number">ah</span>,<span class="hljs-number">9</span>
<span class="hljs-keyword">int</span> <span class="hljs-number">21h</span>
<span class="hljs-keyword">mov</span> <span class="hljs-built_in">ax</span>,<span class="hljs-number">4c00h</span>
<span class="hljs-keyword">int</span> <span class="hljs-number">21h</span>
msg <span class="hljs-built_in">db</span> <span class="hljs-string">"ok$"</span>
msg2 <span class="hljs-built_in">db</span> <span class="hljs-string">"wrong$"</span>
end start
</code></pre><p>If you assemble, link and then execute it normally, typing <code>prog</code> in the DOS command line, it will output the string “ok”. But if you trace through the program in a debugger instead, it will say “wrong”! What’s wrong?</p><p>The problem is in lines 10-11 (instructions 3-4). Here’s what happens when you trace through this program in DOS 6.22’s <code>DEBUG.EXE</code>:</p><img src="/img/blog/debug.png">
<p>Note how in instruction 3 (actually displayed as the second above) we set the word <code>SS:0xFFFC</code> to <code>100</code>. When about to execute the following instruction, we would expect that word to continue to hold the value <code>100</code>, because nothing which could have changed that value has happened in between. Instead, the debugger still reports it as <code>0x0D8A</code>, as if instruction 3 had not been executed at all — and, interestingly, after actually executing this instruction, <code>CX</code> gets yet another value of <code>0x7302</code>!</p><p>Normally, thinking of DOS <code>.COM</code> programs, you assume a 64KB-long chunk of memory that the program has all to itself: the code starts at <code>0x100</code>, the stack grows from <code>0xFFFE</code> downwards (at any given time, the region from <code>SP</code> to <code>0xFFFE</code> contains data currently on the stack), and all memory in between is free for the program to use however it deems fit. It turns out that, when debugging, it is not the case: the debuggers need to manipulate the region just underneath the program’s stack in order to handle the tracing/breakpoint interrupt traps.</p><p>I’ve verified that both DOS’s DEBUG and Borland’s Turbo Debugger 5 do this. The unsafe-to-touch amount of space below SP that they need, however, varies. Manipulating the N constant in the original program, I’ve determined that DEBUG only needs 8 bytes below SP, whereas for TD it is a whopping 18 bytes.</p></div>tag:blog.danieljanus.pl,2014-04-02:post:20482048: A close look at the source2014-04-02T00:00:00Z<div><p>Dust has now mostly settled down on <a href="https://gabrielecirulli.github.io/2048/">2048</a>. Yet, in all the deluge of variants and clones that has swept through <a href="https://news.ycombinator.com/">Hacker News</a>, little has been written about the experience of modifying the game. As I too have jumped on the 2048-modding bandwagon, it’s time to fill that gap, because, as we shall see, the code more than deserves a close look.</p><p>I’ll start with briefly describing my variant. It’s called <a href="http://danieljanus.pl/wosg">“words oh so great”</a> (a rather miserable attempt at a pun on “two-oh-four-eight”) and is a consequence of a thought I had, being an avid Scrabble player, after seeing the <a href="http://joppi.github.io/2048-3D/">3D</a> and <a href="http://huonw.github.io/2048-4D/">4D</a> versions: “what if we mashed 2048 and Scrabble together?” The answer just lended itself automatically.</p><p>Letters instead of number tiles, that was obvious. And you use them to form words. It is unclear how merging tiles should work: merging two identical tiles, as in the original, just wouldn’t make sense here, so drop the concept of merging and make the tiles disappear instead when you form a word. In Scrabble, the minimum length of a word is two, but allowing two-letter words here would mean too many words formed accidentally, so make it at least three. And 16 squares sounds like too tight a space, so increase it to 5x5. And there you have the modified rules.</p><p>I <a href="https://github.com/nathell/wosg">cloned</a> the Git repo, downloaded an English word list (<a href="http://dreamsteep.com/projects/the-english-open-word-list.html">EOWL</a>), and set out to work. It took me just over three hours from the initial idea to putting the modified version online and submitting a link to HN. I think three hours is not bad, considering that I’ve significantly changed the game mechanics. And, in my opinion, this is a testimony to the quality of Gabriele Cirulli’s code.</p><p>The code follows the MVC pattern, despite not relying on any frameworks or libraries. The model is comprised of the <code>Tile</code> and <code>Grid</code> classes, laying out the universe for the game as well as some basic rules governing it, and the <code>GameManager</code> that implements the game mechanics: how tiles move around, when they can merge together, when the game ends, and so on. It also uses a helper class called <code>LocalStorageManager</code> to keep the score and save it in the browser’s local storage.</p><p>The view part is called an “actuator” in 2048 parlance. The <code>HTMLActuator</code> takes the game state and updates the DOM tree accordingly. It also uses a micro-framework for animations. The controller takes the form of a <code>KeyboardInputManager</code>, whose job is to receive keyboard events and translate them to changes of the model.</p><p>The <code>GameManager</code> also contains some code to tie it all together — not really a part of the model as in MVC. Despite this slight inconsistency, the separation of concerns is very neatly executed in 2048’s code; I would even go so far as to say that it could be used as a demonstration in teaching MVC to people.</p><p>The only gripe I had with the code is that it violates the DRY principle in several places. Specifically, to change the board size to 5x5, I had to modify as many as three places: the HTML (it contains the initial definition for the DOM, including 16 empty divs making up the grid, which is unfortunate — I’d change it to set up the DOM at runtime during initialization); the model (instantiation of <code>GameManager</code>); and the <code>.scss</code> file from which the CSS is generated.</p><p>While on this topic, let me add that 2048’s usage of SASS is a prime example of its capabilities. It is very instructive to see how the sizing and positioning of the grid, and also styling for the tiles down to the glow, is done programmatically. I was aware of the existence of SASS before, but never got around to explore it. Now, I’m sold on it.</p><p>To sum up: 2048 rocks. And it’s fun to modify. Go try it.</p></div>tag:blog.danieljanus.pl,2013-05-26:post:lithium-revisitedLithium revisited: A 16-bit kernel (well, sort of) written in Clojure (well, sort of)2013-05-26T00:00:00Z<div><p>Remember <a href="http://blog.danieljanus.pl/blog/2012/05/14/lithium/">Lithium</a>? The x86 assembler written in Clojure, and a simple stripes effect written in it? Well, here’s another take on that effect:</p><img src="/img/blog/stripes2.png">
<p>And here is the source code:</p><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">do</span></span> (<span class="hljs-name">init-graph</span>)
(<span class="hljs-name"><span class="hljs-built_in">loop</span></span> [x <span class="hljs-number">0</span> y <span class="hljs-number">0</span>]
(<span class="hljs-name">put-pixel</span> x y (<span class="hljs-name"><span class="hljs-built_in">let</span></span> [z (<span class="hljs-name"><span class="hljs-built_in">mod</span></span> (<span class="hljs-name"><span class="hljs-built_in">+</span></span> (<span class="hljs-name"><span class="hljs-built_in">-</span></span> <span class="hljs-number">319</span> x) y) <span class="hljs-number">32</span>)]
(<span class="hljs-name"><span class="hljs-built_in">if</span></span> (<span class="hljs-name"><span class="hljs-built_in"><</span></span> z <span class="hljs-number">16</span>) (<span class="hljs-name"><span class="hljs-built_in">+</span></span> <span class="hljs-number">16</span> z) (<span class="hljs-name"><span class="hljs-built_in">+</span></span> <span class="hljs-number">16</span> (<span class="hljs-name"><span class="hljs-built_in">-</span></span> <span class="hljs-number">31</span> z)))))
(<span class="hljs-name"><span class="hljs-built_in">if</span></span> (<span class="hljs-name"><span class="hljs-built_in">=</span></span> y <span class="hljs-number">200</span>)
<span class="hljs-literal">nil</span>
(<span class="hljs-name"><span class="hljs-built_in">if</span></span> (<span class="hljs-name"><span class="hljs-built_in">=</span></span> x <span class="hljs-number">319</span>)
(<span class="hljs-name"><span class="hljs-built_in">recur</span></span> <span class="hljs-number">0</span> (<span class="hljs-name"><span class="hljs-built_in">inc</span></span> y))
(<span class="hljs-name"><span class="hljs-built_in">recur</span></span> (<span class="hljs-name"><span class="hljs-built_in">inc</span></span> x) y)))))
</code></pre><p>I’ve implemented this several months ago, pushed it to Github and development has pretty much stalled since then. And after seeing <a href="https://news.ycombinator.com/item?id=5771276">this recent post</a> on HN today, I’ve decided to give Lithium a little more publicity, in the hope that it will provide a boost of motivation to me. Because what we have here is pretty similar to Rustboot: it’s a 16-bit kernel written in Clojure.</p><p>Well, sort of.</p><p>After writing a basic assembler capable of building bare binaries of simple x86 real-mode programs, I’ve decided to make it a building block of a larger entity. So I’ve embarked on a project to implement a compiler for a toy Lisp-like language following the paper <a href="http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf">“An Incremental Approach to Compiler Construction”</a>, doing it in Clojure and making the implemented language similar to Clojure rather than to Scheme.</p><p>(Whether it actually can be called Clojure is debatable. It’s unclear what the definition of Clojure the language is. Is running on JVM a part of what makes Clojure Clojure? Or running on any host platform? Is ClojureScript Clojure? What about ClojureCLR, or clojure-py?)</p><p>So far I’ve only gotten to step 7 of 24 or so, but that’s already enough to have a working <code>loop/recur</code> implementation, and it was trivial to throw in some graphical mode 13h primitives to be able to implement this effect.</p><p>By default I’m running Lithium programs as DOS .COM binaries under DOSBox, but technically, the code doesn’t depend on DOS in any way (it doesn’t ever invoke interrupt 21h) and so it can be combined with a simple bootloader into a kernel runnable on the bare metal.</p><p>The obligatory HOWTO on reproducing the effect: install DOSBox and Leiningen, checkout [the code][3], launch a REPL with <code>lein repl</code>, execute the following forms, and enjoy the slowness with which individual pixels are painted:</p><pre><code class="hljs clojure">(<span class="hljs-name">require</span> 'lithium.compiler)
(<span class="hljs-name"><span class="hljs-built_in">in-ns</span></span> 'lithium.compiler)
(<span class="hljs-name">run!</span> (<span class="hljs-name">compile-program</span> <span class="hljs-string">"/path/to/lithium/examples/stripes-grey.clj"</span>))
</code></pre></div>tag:blog.danieljanus.pl,2012-09-13:post:my-top-three-ios-apps-for-mappingMy top three iOS apps for mapping2012-09-13T00:00:00Z<div><p>Living in London means that I now have a whole lot of new area to explore by cycling or walking. I try to take every opportunity to spend a free day or weekend out. One of the most important things when on the move is knowing where you are, where to go, and how to get there — and for that, you need a map. As I soon learned, the maps to use in the UK are the Ordnance Survey ones (either the Landranger/Explorer series, or maps by another publisher, such as AA, based on OS data). However, the Landranger series encompasses over 200 1:50000 maps, standing at some £8 each, and when that level of detail is not enough, there are more than 400 Explorer maps on top of that. Not only does this get pricey after a while, but also the sheer volume of map juggling quickly becomes impractical when you cycle a lot outside of town.</p><p>So I’ve turned to my old trusty iPhone 3GS as a mapping device instead, and set out to complete a set of mapping apps that do the job for me. In this post, I’d like to share my list.</p><p>I briefly thought of directly using OS maps on the iPhone via the <a href="http://itunes.apple.com/gb/app/outdoors-gb-national-parks/id336150457?mt=8">Outdoors GPS GB app</a>; it does meet my requirement of being accessible off-network, but the pricing of individual maps is on par with the paper version, so I ruled it out.</p><p>Instead, I am using this trio now:</p><ol><li><p>The official <a href="http://itunes.apple.com/gb/app/the-complete-national-cycle/id436521445?mt=8&ls=1">National Cycle Network</a> app by Sustrans. Beside being free, it has an advantage of detailing every numbered national cycle route, as well as most local routes (that often predate NCN or are not yet integrated into the network). At high detail, the data seem to be OS-sourced, which is good.<br><br> It downloads maps from the Internet on demand, but you can also save a map portion for future use. The app asks you how much detail you want, tells you how large the download will be, then proceeds to get the data. The nuisance here is that you can only download 40 MB in one go, which corresponds to an area stretching for approximately 50-60 km at 1:50000 (and correspondingly smaller at 1:25000), so it takes a lot of tapping and downloading if you’re planning a longer trip.<br><br> The other downsides are that the app is a little shaky at times, and GPS positioning sometimes displays your position somewhat misplaced. I mitigate this by using this app in combination with the next one…<br><br></p></li><li><p>…which is <a href="http://www.mapswithme.com/">MapsWithMe</a>. The tagline “Offline Mobile Maps” nails it down: it’s just maps, easily downloadable, covering the entire world, and nothing else. This really does one thing well. The map data source is OpenStreetMap, so all the maps are available for free as well; one ‘Download’ tap and you’ve got the whole country covered, once and for all. It also displays GPS position much more reliably than NCN. On the other hand, it can’t offer quite the same level of detail as NCN, and doesn’t know anything about cycle routes, but it’s still highly convenient.<br><br> My typical flow when cycling in the UK is: check my position with MapsWithMe, then optionally switch to NCN, locate the same position on the map by hand and see where the route goes. I’ve also done one continental three-day trip, from Dunkirk in France to Hoek van Holland in the Netherlands, using just MapsWithMe to navigate, and it worked out very well.<br><br></p></li><li><p>Unlike the other two, the last app I want to point out, <a href="http://www.codeartisans.co.uk/index.html">GPS2OS</a>, is paid. And it’s more than worth its meager price, despite being next to useless when cycling. But when hiking, especially in remote mountainous areas, it can literally be a lifesaver. Here’s the catch: my basic navigation tools in harsh conditions are a compass and a plain ol’ paper map, and the iPhone is treated only as a supplementary aid (you never know when the battery goes out). However, instead of indicating the latitude and longitude in degrees/minutes/seconds, OS maps use <a href="http://en.wikipedia.org/wiki/Ordnance_Survey_National_Grid">their own grid</a>. So you cannot use the default Compass app, which tells you your position in degrees, directly with them, and you need a tool just like this one to do the coordinate translation. Works very well; it helped me find my way in dense mist down from the summit of Ben Macdui during my recent holiday in Scotland.</p></li></ol><p>One final tip: when you want to conserve battery as much as possible, airplane mode is a real saver. However, GPS doesn’t seem to work when airplane mode is on. So the next best thing is to remove the SIM card (you can then reinsert it, just don’t enter the PIN), so that the phone doesn’t keep trying to connect to cellular networks. And keep it warm in a pocket beside your body: cold devices discharge much faster.</p></div>tag:blog.danieljanus.pl,2012-05-14:post:lithiumLithium: an x86 assembler for Clojure2012-05-14T00:00:00Z<div><p>Ah, the golden days of childhood’s hackage. Don’t you have fond memories of them?</p><p>I got my first PC when I was 10. It was a 486DX2/66 with 4 megs of RAM and a 170 meg HDD; it ran DOS and had lots of things installed on it, notably Turbo Pascal 6. I hacked a lot in it. These were pre-internet days when knowledge was hard to come by, especially for someone living in a <a href="http://en.wikipedia.org/wiki/W%C4%85chock">small town in Poland</a>; my main sources were the software I had (TP’s online help was of excellent quality), a couple of books, and a <a href="http://www.cpcwiki.eu/index.php/Bajtek">popular computing magazine</a> that published articles on programming. From the latter, I learned how to program the VGA: how to enter mode 13h, draw pixels on screen, wait for vertical retrace, manipulate the palette and how to combine these things into neat effects. One of the very first thing I discovered was when you plot every pixel using sum of its coordinates modulo 40 as color, you get a nice-looking diagonal stripes effect. Because of the initially incomprehensible inline assembly snippets appearing all over the place, I eventually learned x86 assembly, too.</p><img src="/img/blog/stripes.png">
<p>Back to 2012: I’ve long been wanting to hack on something just for pure fun, a side pet project. Writing code for the bare metal is fun because it’s just about as close as you can get to wielding the ultimate power. And yet, since Clojure is so much fun too, I wanted the project to have something to do with Clojure.</p><p>So here’s <a href="http://github.com/nathell/lithium">Lithium</a>, an x86 16-bit assembler written in pure Clojure and capable of assembling a binary version of the stripes effect.</p><p>To try it, clone the git repo to your Linux or OS X machine, install DOSBox, launch a REPL with Leiningen, change to the <code>lithium</code> namespace and say:</p><pre><code class="hljs clojure">(<span class="hljs-name">run!</span> <span class="hljs-string">"/home/you/lithium/src/stripes.li.clj"</span>)
</code></pre><h3 id="faq">FAQ</h3><p>(Well, this is not really a FAQ since nobody actually asked me any questions about Lithium yet. This is more in anticipation of questions that may arise.)</p><p><strong>What’s the importance of this?</strong></p><p><a href="http://www.physics.ohio-state.edu/~kilcup/262/feynman.html">None whatsoever</a>. It’s just for fun.</p><p><strong>How complete is it?</strong></p><p>Very incomplete. To even call it pre-pre-alpha would be an exaggeration. It’s currently little more than pure minimum required to assemble <code>stripes.li.clj</code>. Output format wise, it only produces bare binaries (similar to DOS .COMs), and that’s unlikely to change anytime soon.</p><p><strong>Do you intend to continue developing it?</strong></p><p>Absolutely. I will try to make it more complete, add 32- and possibly 64-bit modes, see how to add a macro system (since the input is s-expressions, it should be easy to produce Clojure macros to write assembly), write something nontrivial in it, and see how it can be used as a backend for some higher-level language compiler (I’m not sure yet which language that will turn out to be).</p></div>tag:blog.danieljanus.pl,2012-04-25:post:how-to-call-a-private-function-in-clojureHow to call a private function in Clojure2012-04-25T00:00:00Z<div><p><strong>tl;dr:</strong> Don’t do it. If you really have to, use <code>(#'other-library/private-function args)</code>.</p><hr>
<p>A private function in Clojure is one that has been defined using the <code>defn-</code> macro, or equivalently by setting the metadata key <code>:private</code> to <code>true</code> on the var that holds the function. It is normally not allowed in Clojure to call such functions from outside of the namespace where they have been defined. Trying to do so results in an <code>IllegalStateException</code> stating that the var is not public.</p><p>It is possible to circumvent this and call the private function, but it is not recommended. That the author of the library decided to make a function private probably means that he considers it to be an implementation detail, subject to change at any time, and that you should not rely on it being there. If you think it would be useful to have this functionality available as part of the public API, your best bet is to contact the library author and consult the change, so that it may be included officially in a future version.</p><p>Contacting the author, however, is not always feasible: she may not be available or you might be in haste. In this case, several workarounds are available. The simplest is to use <code>(#'other-library/private-function args)</code>, which works in Clojure 1.2.1 and 1.3.0 (it probably works in other versions of Clojure as well, but I haven’t checked that).</p><p>Why does this work? When the Clojure compiler encounters a form <code>(sym args)</code>, it invokes <code>analyzeSeq</code> on that form. If its first element is a symbol, it proceeds to analyze that symbol. One of the first operation in that analysis is checking if it names an inline function, by calling <code>isInline</code>. That function looks into the metadata of the Var named by the symbol in question. If it’s not public, it <a href="https://github.com/clojure/clojure/blob/clojure-1.3.0/src/jvm/clojure/lang/Compiler.java#L6281">throws an exception</a>.</p><p>On the other hand, <code>#'</code> is the reader macro for var. So our workaround is equivalent to <code>((var other-library/private-function) args)</code>. In this case, the first element of the form is not a symbol, but a form that evaluates to a var. The compiler is not able to check for this so it does not insert a check for privateness. So the code compiles to calling a Var object.</p><p>Here’s the catch: Vars are callable, just like functions. They <a href="https://github.com/clojure/clojure/blob/clojure-1.3.0/src/jvm/clojure/lang/Var.java#L18">implement <code>IFn</code></a>. When a var is called, it delegates the call to the <code>IFn</code> object it is holding. This has been recently <a href="https://groups.google.com/d/msg/clojure/1Su9o_8JZ8g/uZL-n4uRSiUJ">discussed on the Clojure group</a>. Since that delegation does not check for the var’s privateness either, the net effect is that we are able to call a private function this way.</p></div>tag:blog.danieljanus.pl,2012-04-12:post:lifehacking-gumtreeLifehacking: How to get cheap home equipment using Clojure2012-04-12T00:00:00Z<div><p>I’ve moved to London last September. Like many new Londoners, I have changed accommodation fairly quickly, being already after one removal and with another looming in a couple of months; my current flat was largely unfurnished when I moved in, so I had to buy some basic homeware. I didn’t want to invest much in it, since it’d be only for a few months. Luckily, it is not hard to do that cheaply: many people are moving out and getting rid of their stuff, so quite often you can search for the desired item on <a href="http://www.gumtree.com/london">Gumtree</a> and find there’s a cheap one a short bike ride away.</p><p>Except when there isn’t. In this case, it’s worthwhile to check again within a few days as new items are constantly being posted. Being lazy, I’ve decided to automate this. A few hours and a hundred lines of Clojure later, <a href="https://github.com/nathell/gumtree-scraper">gumtree-scraper</a> was born.</p><p>I’ve packaged it using <code>lein uberjar</code> into a standalone jar, which, when run, produces a <code>gumtree.rss</code> that is included in my Google Reader subscriptions. This way, whenever something I’m interested in appears, I get notified within an hour or so.</p><p>It’s driven by a Google spreadsheet. I’ve created a sheet that has three columns: item name, minimum price, maximum price; then I’ve made it available to anyone who knows the URL. This way I can edit it pretty much from everywhere without touching the script. Each time the script is run (by cron), it downloads that spreadsheet as a CSV that looks like this:</p><pre><code>hand blender,,5
bike rack,,15
</code></pre><p>For each row the script queries Gumtree’s category “For Sale” within London given the price range, gets each result and transforms it to a RSS entry.</p><p>Gumtree has no API, so I’m using screenscraping to retrieve all the data. Because the structure of the pages is much simpler, I’m actually scraping the <a href="http://m.gumtree.com/">mobile version</a>; a technical twist here is that the mobile version is only served to actual browsers so I’m supplying a custom User-Agent, pretending to be Safari. For actual scraping, the code uses <a href="https://github.com/cgrand/enlive">Enlive</a>; it works out nicely.</p><p>About half of the code is RSS generation — mostly XML emitting. I’d use <code>clojure.xml/emit</code> but it’s known to <a href="http://clojure-log.n01se.net/date/2012-01-03.html#17:28a">produce malformed XML</a> at times, so I include a variant that should work.</p><p>In case anyone wants to tries it out, be aware that the location and category are hardcoded in the search URL template; if you want, change the template line in <code>get-page</code>. The controller spreadsheet URL is not, however, hardcoded; it’s built up using the <code>spreadsheet.key</code> system property. Here’s the wrapper script I use that is actually run by cron:</p><pre><code class="hljs bash"><span class="hljs-meta">#!/bin/bash</span>
<span class="hljs-keyword">if</span> [ <span class="hljs-string">"`ps ax | grep java | grep gumtree`"</span> ]; <span class="hljs-keyword">then</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"already running, exiting"</span>
<span class="hljs-built_in">exit</span> 0
<span class="hljs-keyword">fi</span>
<span class="hljs-built_in">cd</span> <span class="hljs-string">"`dirname <span class="hljs-variable">$0</span>`"</span>
java -Dspreadsheet.key=MY_SECRET_KEY -jar <span class="hljs-variable">$HOME</span>/gumtree/gumtree.jar
<span class="hljs-built_in">cp</span> <span class="hljs-variable">$HOME</span>/gumtree/gumtree.rss <span class="hljs-variable">$HOME</span>/public_html
</code></pre><p>Now let me remove that entry for a blender — I’ve bought one yesterday for £4…</p></div>tag:blog.danieljanus.pl,2012-03-21:post:court-with-an-apiEver wanted to programmatically file a lawsuit? In Poland, you can.2012-03-21T00:00:00Z<div><p>This has somehow escaped me: just over a year ago, the Sixth Civil Division of the Lublin-West Regional Court in Lublin, Poland, has opened its <a href="https://www.e-sad.gov.pl/">online branch</a>. It serves the entire territory of Poland and is competent to recognize lawsuits concerning payment claims. There is <a href="https://www.e-sad.gov.pl/Subpage.aspx?page_id=35">basic information</a> available in English. It has proven immensely popular, having processed about two million cases in its first year of operation.</p><p>And the really cool thing is, <em>they have an API</em>.</p><p>It’s SOAP-based and has a <a href="https://www.e-sad.gov.pl/Subpage.aspx?page_id=32">publicly available spec</a>. (Due to the way their web site is constructed, I cannot link to the spec directly; this last link leads to a collection of files related to the web service. The spec is called <code>EpuWS_ver.1.14.1.pdf</code>; it’s in Polish only, but it should be easy to run it through Google Translate.) There are a couple of XML schemas as well, plus the spec contains links to a WSDL and some code samples (in C#) at the end.</p><p>To actually use the API, you need to get yourself an account of the appropriate type (there are two types corresponding to two groups of methods one can use: that of a bailiff and of a mass plaintiff). You then log on to the system, where you can create an API key that is later used for authentication. They throttle the speed down to 1 req/s per user to mitigate DoS attacks.</p><p>The methods include <code>FileLawsuits</code>, <code>FileComplaints</code>, <code>SupplyDocuments</code>, <code>GetCaseHistory</code> and so on (the actual names are in Polish). To give you an example, the <code>FileLawsuits</code> method returns a structure that consists of, <em>inter alia</em>, the amount of court fee to pay, the value of the matter of dispute (both broken down into individual lawsuits), and a status code with a description.</p><p>iOS app, anyone?</p></div>tag:blog.danieljanus.pl,2011-12-09:post:combining-virtual-sequencesCombining virtual sequences<br>or, Sequential Fun with Macros<br>or, How to Implement Clojure-Like Pseudo-Sequences with Poor Man’s Laziness in a Predominantly Imperative Language2011-12-09T00:00:00Z<div><h2 id="sequences-and-iteration">Sequences and iteration</h2><p>There are a number of motivations for this post. One stems from my extensive exposure to Clojure over the past few years: this was, and still is, my primary programming language for everyday work. Soon, I realized that much of the power of Clojure comes from a <em>sequence</em> abstraction being one of its central concepts, and a standard library that contains many sequence-manipulating functions. It turns out that by combining them it is possible to solve a wide range of problems in a concise, high-level way. In contrast, it pays to think in terms of whole sequences, rather than individual elements.</p><p>Another motivation comes from a classical piece of functional programming humour, [The Evolution of a Haskell Programmer][1]. If you don’t know it, go check it out: it consists of several Haskell implementations of factorial, starting out from a straightforward recursive definition, passing through absolutely hilarious versions involving category-theoretical concepts, and finally arriving at this simple version that is considered most idiomatic:</p><pre><code class="hljs haskell"><span class="hljs-title">fac</span> n = product [<span class="hljs-number">1</span>..n]
</code></pre><p>This is very Clojure-like in that it involves a sequence (a list comprehension). In Clojure, this could be implemented as</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defn</span> <span class="hljs-title">fac</span> [n]
(<span class="hljs-name"><span class="hljs-built_in">reduce</span></span> * <span class="hljs-number">1</span> (<span class="hljs-name"><span class="hljs-built_in">range</span></span> <span class="hljs-number">1</span> (<span class="hljs-name"><span class="hljs-built_in">inc</span></span> n)))
</code></pre><p>Now, I thought to myself, how would I write factorial in an imperative language? Say, Pascal?</p><pre><code class="hljs pascal"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">fac</span><span class="hljs-params">(n : integer)</span> :</span> integer;
<span class="hljs-keyword">var</span>
i, res : integer;
<span class="hljs-keyword">begin</span>
res := <span class="hljs-number">1</span>;
<span class="hljs-keyword">for</span> i := <span class="hljs-number">1</span> <span class="hljs-keyword">to</span> n <span class="hljs-keyword">do</span>
res := res * i;
fac := res;
<span class="hljs-keyword">end</span>;
</code></pre><p>This is very different from the functional version that works with sequences. It is much more elaborate, introducing an explicit loop. On the other hand, it’s memory efficient: it’s clear that its memory requirements are O(1), whereas a naïve implementation of a sequence would need O(n) to construct it all in memory and then reduce it down to a single value.</p><p>Or is it really that different? Think of the changing values of <code>i</code> in that loop. On first iteration it is 1, on second iteration it’s 2, and so on up to n. Therefore, one can really think of a <code>for</code> loop as a sequence! I call it a “virtual” sequence, since it is not an actual data structure; it’s just a snippet of code.</p><p>To rephrase it as a definition: a virtual sequence is a snippet of code that (presumably repeatedly) <em>yields</em> the member values.</p><h2 id="let’s-write-some-code!">Let’s write some code!</h2><p>To illustrate it, throughout the remainder of this article I will be using Common Lisp, for the following reasons:</p><ul><li><span>It allows for imperative style, including GOTO-like statements. This will enable us to generate very low-level code.</span></li><li><span>Thanks to macros, we will be able to obtain interesting transformations.</span></li></ul><p>Okay, so let’s have a look at how to generate a one-element sequence. Simple enough:</p><pre><code class="hljs lisp">(<span class="hljs-name">defmacro</span> vsingle (<span class="hljs-name">x</span>)
`(yield ,x))
</code></pre><p>The name <code>VSINGLE</code> stands for “Virtual sequence that just yields a SINGLE element”. (In general, I will try to define virtual sequences named and performing similarly to their Clojure counterparts here; whenever there is a name clash with an already existing CL function, the name will be prefixed with <code>V</code>.) We will not concern ourselves with the actual definition of <code>YIELD</code> at the moment; for debugging, we can define it just as printing the value to the standard output.</p><pre><code class="hljs lisp">(<span class="hljs-name">defun</span> yield (<span class="hljs-name">x</span>)
(<span class="hljs-name">format</span> <span class="hljs-literal">t</span> <span class="hljs-string">"~A~%"</span> x))
</code></pre><p>We can also convert a Lisp list to a virtual sequence which just yields each element of the list in turn:</p><pre><code class="hljs lisp">(<span class="hljs-name">defmacro</span> vseq (<span class="hljs-name">list</span>)
`(loop for x in ,list do (yield x)))
(<span class="hljs-name">defmacro</span> vlist (<span class="hljs-name">&rest</span> elems)
`(vseq (list ,@elems)))
</code></pre><p>Now let’s try to define <code>RANGE</code>. We could use <code>loop</code>, but for the sake of example, let’s pretend that it doesn’t exist and write a macro that expands to low-level GOTO-ridden code. For those of you who are not familiar with Common Lisp, <code>GO</code> is like GOTO, except it takes a label that should be established within a <code>TAGBODY</code> container.</p><pre><code class="hljs lisp">(<span class="hljs-name">defmacro</span> range (<span class="hljs-name">start</span> <span class="hljs-symbol">&optional</span> end (<span class="hljs-name">step</span> <span class="hljs-number">1</span>))
(<span class="hljs-name">unless</span> end
(<span class="hljs-name">setf</span> end start start <span class="hljs-number">0</span>))
(<span class="hljs-name">let</span> ((<span class="hljs-name">fv</span> (<span class="hljs-name">gensym</span>)))
`(let ((,fv ,start))
(tagbody
loop
(when (>= ,fv ,end)
(go out))
(yield ,fv)
(incf ,fv ,step)
(go loop)
out))))
</code></pre><p><em>Infinite</em> virtual sequences are also possible. After all, there’s nothing preventing us from considering a snippet of code that loops infinitely, executing <code>YIELD</code>, as a virtual sequence! We will define the equivalent of Clojure’s iterate: given a function <code>fun</code> and initial value <code>val</code>, it will repeatedly generate <code>val</code>, <code>(fun val)</code>, <code>(fun (fun val))</code>, etc.</p><pre><code class="hljs lisp">(<span class="hljs-name">defmacro</span> iterate (<span class="hljs-name">fun</span> val)
(<span class="hljs-name">let</span> ((<span class="hljs-name">fv</span> (<span class="hljs-name">gensym</span>)))
`(let ((,fv ,val))
(tagbody loop
(yield ,fv)
(setf ,fv (funcall ,fun ,fv))
(go loop)))))
</code></pre><p>So far, we have defined a number of ways to create virtual sequences. Now let’s ask ourselves: is there a way, given code for a virtual sequence, to yield only the elements from the original that satisfy a certain predicate? In other words, can we define a <code>filter</code> for virtual sequences? Sure enough. Just replace every occurrence of <code>yield</code> with code that checks whether the yielded value satisfies the predicate, and only if it does invokes <code>yield</code>.</p><p>First we write a simple code walker that applies some transformation to every <code>yield</code> occurrence in a given snippet:</p><pre><code class="hljs lisp">(<span class="hljs-name">defun</span> replace-yield (<span class="hljs-name">tree</span> replace)
(<span class="hljs-name">if</span> (<span class="hljs-name">consp</span> tree)
(<span class="hljs-name">if</span> (<span class="hljs-name">eql</span> (<span class="hljs-name">car</span> tree) 'yield)
(<span class="hljs-name">funcall</span> replace (<span class="hljs-name">cadr</span> tree))
(<span class="hljs-name">loop</span> for x in tree collect (<span class="hljs-name">replace-yield</span> x replace)))
tree))
</code></pre><p>We can now write <code>filter</code> like this:</p><pre><code class="hljs lisp">(<span class="hljs-name">defmacro</span> filter (<span class="hljs-name">pred</span> vseq <span class="hljs-symbol">&environment</span> env)
(<span class="hljs-name">replace-yield</span> (<span class="hljs-name">macroexpand</span> vseq env)
(<span class="hljs-name">lambda</span> (<span class="hljs-name">x</span>) `(when (funcall ,pred ,x) (yield ,x)))))
</code></pre><p>It is important to point out that since <code>filter</code> is a macro, the arguments are passed to it unevaluated, so if <code>vseq</code> is a virtual sequence definition like <code>(range 10)</code>, we need to macroexpand it before replacing <code>yield</code>.</p><p>We can now verify that <code>(filter #'evenp (range 10))</code> works. It macroexpands to something similar to</p><pre><code class="hljs lisp">(<span class="hljs-name">LET</span> ((<span class="hljs-name">#</span><span class="hljs-symbol">:G70192</span> <span class="hljs-number">0</span>))
(<span class="hljs-name">TAGBODY</span>
LOOP (<span class="hljs-name">IF</span> (<span class="hljs-name">>=</span> #<span class="hljs-symbol">:G70192</span> <span class="hljs-number">10</span>)
(<span class="hljs-name">PROGN</span> (<span class="hljs-name">GO</span> OUT)))
(<span class="hljs-name">IF</span> (<span class="hljs-name">FUNCALL</span> #'EVENP #<span class="hljs-symbol">:G70192</span>)
(<span class="hljs-name">PROGN</span> (<span class="hljs-name">YIELD</span> #<span class="hljs-symbol">:G70192</span>)))
(<span class="hljs-name">SETQ</span> #<span class="hljs-symbol">:G70192</span> (<span class="hljs-name">+</span> #<span class="hljs-symbol">:G70192</span> <span class="hljs-number">1</span>))
(<span class="hljs-name">GO</span> LOOP)
OUT))
</code></pre><p><code>concat</code> is extremely simple. To produce all elements of <code>vseq1</code> followed by all elements of <code>vseq2</code>, just execute code corresponding to <code>vseq1</code> and then code corresponding to <code>vseq2</code>. Or, for multiple sequences:</p><pre><code class="hljs lisp">(<span class="hljs-name">defmacro</span> concat (<span class="hljs-name">&rest</span> vseqs)
`(progn ,@vseqs))
</code></pre><p>To define <code>take</code>, we’ll need to wrap the original code in a block that can be escaped from by means of <code>return-from</code> (which is just another form of <code>goto</code>). We’ll add a counter that will start from <code>n</code> and keep decreasing on each <code>yield</code>; once it reaches zero, we escape the block:</p><pre><code class="hljs lisp">(<span class="hljs-name">defmacro</span> take (<span class="hljs-name">n</span> vseq <span class="hljs-symbol">&environment</span> env)
(<span class="hljs-name">let</span> ((<span class="hljs-name">x</span> (<span class="hljs-name">gensym</span>))
(<span class="hljs-name">b</span> (<span class="hljs-name">gensym</span>)))
`(let ((,x ,n))
(block ,b
,(replace-yield (macroexpand vseq env)
(lambda (y) `(progn (yield ,y)
(decf ,x)
(when (zerop ,x)
(return-from ,b)))))))))
</code></pre><p><code>rest</code> (or, rather, <code>vrest</code>, as that name is taken) can be defined similarly:</p><pre><code class="hljs lisp">(<span class="hljs-name">defmacro</span> vrest (<span class="hljs-name">vseq</span> <span class="hljs-symbol">&environment</span> env)
(<span class="hljs-name">let</span> ((<span class="hljs-name">skipped</span> (<span class="hljs-name">gensym</span>)))
(<span class="hljs-name">replace-yield</span>
`(let ((,skipped <span class="hljs-literal">nil</span>)) ,(macroexpand vseq env))
(<span class="hljs-name">lambda</span> (<span class="hljs-name">x</span>) `(if ,skipped (yield ,x) (setf ,skipped <span class="hljs-literal">t</span>))))))
</code></pre><p><code>vfirst</code> is another matter. It should return a value instead of producing a virtual sequence, so we need to actually execute the code — but with <code>yield</code> bound to something else. We want to establish a block as with <code>take</code>, but our <code>yield</code> will immediately return from the block once the first value is yielded:</p><pre><code class="hljs lisp">(<span class="hljs-name">defmacro</span> vfirst (<span class="hljs-name">vseq</span>)
(<span class="hljs-name">let</span> ((<span class="hljs-name">block-name</span> (<span class="hljs-name">gensym</span>)))
`(block ,block-name
(flet ((yield (x) (return-from ,block-name x)))
,vseq))))
</code></pre><p>Note that so far we’ve seen three classes of macros:</p><ul><li><span>macros that create virtual sequences;</span></li><li><span>macros that transform virtual sequences to another virtual sequences;</span></li><li><span>and finally, vfirst is our first example of a macro that produces a result out of a virtual sequence.</span></li></ul><p>Our next logical step is <code>vreduce</code>. Again, we’ll produce code that rebinds <code>yield</code>: this time to a function that replaces the value of a variable (the accumulator) by result of calling a function on the accumulator’s old value and the value being yielded.</p><pre><code class="hljs lisp">(<span class="hljs-name">defmacro</span> vreduce (<span class="hljs-name">f</span> val vseq)
`(let ((accu ,val))
(flet ((yield (x) (setf accu (funcall ,f accu x))))
,vseq
accu)))
</code></pre><p>We can now build a constructs that executes a virtual sequence and wraps the results up as a Lisp list, in terms of <code>vreduce</code>.</p><pre><code class="hljs lisp">(<span class="hljs-name">defun</span> conj (<span class="hljs-name">x</span> y)
(<span class="hljs-name">cons</span> y x))
(<span class="hljs-name">defmacro</span> realize (<span class="hljs-name">vseq</span>)
`(nreverse (vreduce #'conj <span class="hljs-literal">nil</span> ,vseq)))
</code></pre><p>Let’s verify that it works:</p><pre><code class="hljs lisp">CL-USER> (<span class="hljs-name">realize</span> (<span class="hljs-name">range</span> <span class="hljs-number">10</span>))
(<span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">2</span> <span class="hljs-number">3</span> <span class="hljs-number">4</span> <span class="hljs-number">5</span> <span class="hljs-number">6</span> <span class="hljs-number">7</span> <span class="hljs-number">8</span> <span class="hljs-number">9</span>)
CL-USER> (<span class="hljs-name">realize</span> (<span class="hljs-name">take</span> <span class="hljs-number">5</span> (<span class="hljs-name">filter</span> #'oddp (<span class="hljs-name">iterate</span> #'<span class="hljs-number">1</span>+ <span class="hljs-number">0</span>))))
(<span class="hljs-number">1</span> <span class="hljs-number">3</span> <span class="hljs-number">5</span> <span class="hljs-number">7</span> <span class="hljs-number">9</span>)
</code></pre><p>Hey! Did we just manipulate an <em>infinite</em> sequence and got the result in a <em>finite</em> amount of time? And that without explicit support for laziness in our language? How cool is that?!</p><p>Anyway, let’s finally define our factorial:</p><pre><code class="hljs lisp">(<span class="hljs-name">defun</span> fac (<span class="hljs-name">n</span>)
(<span class="hljs-name">vreduce</span> #'* <span class="hljs-number">1</span> (<span class="hljs-name">range</span> <span class="hljs-number">1</span> (<span class="hljs-number">1</span>+ n))))
</code></pre><h2 id="benchmarking">Benchmarking</h2><p>Factorials grow too fast, so for the purpose of benchmarking let’s write a function that adds numbers from 0 below n, in sequence-y style. First using Common Lisp builtins:</p><pre><code class="hljs lisp">(<span class="hljs-name">defun</span> sum-below (<span class="hljs-name">n</span>)
(<span class="hljs-name">reduce</span> #'+ (<span class="hljs-name">loop</span> for i from <span class="hljs-number">0</span> below n collect i) <span class="hljs-symbol">:initial-value</span> <span class="hljs-number">0</span>))
</code></pre><p>And now with our virtual sequences:</p><pre><code class="hljs lisp">(<span class="hljs-name">defun</span> sum-below-2 (<span class="hljs-name">n</span>)
(<span class="hljs-name">vreduce</span> #'+ <span class="hljs-number">0</span> (<span class="hljs-name">range</span> n)))
</code></pre><p>Let’s try to time the two versions. On my Mac running Clozure CL 1.7, this gives:</p><pre><code class="hljs lisp">CL-USER> (<span class="hljs-name">time</span> (<span class="hljs-name">sum-below</span> <span class="hljs-number">10000000</span>))
(<span class="hljs-name">SUM-BELOW</span> <span class="hljs-number">10000000</span>) took <span class="hljs-number">8</span>,<span class="hljs-number">545</span>,<span class="hljs-number">512</span> microseconds (<span class="hljs-number">8.545512</span> seconds) to run
with <span class="hljs-number">2</span> available CPU cores.
During that period, <span class="hljs-number">2</span>,<span class="hljs-number">367</span>,<span class="hljs-number">207</span> microseconds (<span class="hljs-number">2.367207</span> seconds) were spent in user mode
<span class="hljs-number">270</span>,<span class="hljs-number">481</span> microseconds (<span class="hljs-number">0.270481</span> seconds) were spent in system mode
<span class="hljs-number">5</span>,<span class="hljs-number">906</span>,<span class="hljs-number">274</span> microseconds (<span class="hljs-number">5.906274</span> seconds) was spent in GC.
<span class="hljs-number">160</span>,<span class="hljs-number">000</span>,<span class="hljs-number">016</span> bytes of memory allocated.
<span class="hljs-number">39</span>,<span class="hljs-number">479</span> minor page faults, <span class="hljs-number">1</span>,<span class="hljs-number">359</span> major page faults, <span class="hljs-number">0</span> swaps.
<span class="hljs-number">49999995000000</span>
CL-USER> (<span class="hljs-name">time</span> (<span class="hljs-name">sum-below-2</span> <span class="hljs-number">10000000</span>))
(<span class="hljs-name">SUM-BELOW-2</span> <span class="hljs-number">10000000</span>) took <span class="hljs-number">123</span>,<span class="hljs-number">081</span> microseconds (<span class="hljs-number">0.123081</span> seconds) to run
with <span class="hljs-number">2</span> available CPU cores.
During that period, <span class="hljs-number">127</span>,<span class="hljs-number">632</span> microseconds (<span class="hljs-number">0.127632</span> seconds) were spent in user mode
<span class="hljs-number">666</span> microseconds (<span class="hljs-number">0.000666</span> seconds) were spent in system mode
<span class="hljs-number">4</span> minor page faults, <span class="hljs-number">0</span> major page faults, <span class="hljs-number">0</span> swaps.
<span class="hljs-number">49999995000000</span>
</code></pre><p>As expected, <code>SUM-BELOW-2</code> is much faster, causes less page faults and presumably conses less. (Critics will be quick to point out that we could idiomatically write it using <code>LOOP</code>’s <code>SUM/SUMMING</code> clause, which would probably be yet faster, and I agree; yet if we were reducing by something other than <code>+</code> — something that <code>LOOP</code> has not built in as a clause — this would not be an option.)</p><h2 id="conclusion">Conclusion</h2><p>We have seen how snippets of code can be viewed as sequences and how to combine them to produce other virtual sequences. As we are nearing the end of this article, it is perhaps fitting to ask: what are the limitations and drawbacks of this approach?</p><p>Clearly, this kind of sequences is less powerful than “ordinary” sequences such as Clojure’s. The fact that we’ve built them on macros means that once we escape the world of code transformation by invoking some macro of the third class, we can’t manipulate them anymore. In Clojure world, <code>first</code> and <code>rest</code> are very similar; in virtual sequences, they are altogether different: they belong to different worlds. The same goes for <code>map</code> (had we defined one) and <code>reduce</code>.</p><p>But imagine that instead of having just one programming language, we have a high-level language A in which we are writing macros that expand to code in a low-level language B. It is important to point out that the generated code is very low-level. It could almost be assembly: in fact, most of the macros we’ve written don’t even require language B to have composite data-types beyond the type of elements of collections (which could be simple integers)!</p><p>Is there a practical side to this? I don’t know: to me it just seems to be something with hack value. Time will tell if I can put it to good use.</p></div>tag:blog.danieljanus.pl,2011-07-11:post:color-your-own-europeColor your own Europe with Clojure!2011-07-11T00:00:00Z<div><p>This is a slightly edited translation of <a href="http://plblog.danieljanus.pl/zippery-w-clojure">an article</a> I first published on my Polish blog on January 19, 2011. It is meant to target newcomers to Clojure and show how to use Clojure to solve a simple real-life problems.</p><h2 id="the-problem">The problem</h2><p>Some time ago I was asked to prepare a couple of differently-colored maps of Europe. I got some datasets which mapped countries of Europe to numerical values: the greater the value, the darker the corresponding color should be. A sample colored map looked like this:</p><img src="/img/blog/europa.png">
<p>I began by downloading an easily editable <a href="http://commons.wikimedia.org/wiki/File:Blank_map_of_Europe.svg">map</a> from Wikipedia Commons, calculated the required color intensities for the first dataset, launched <a href="http://www.inkscape.org">Inkscape</a> and started coloring. After half an hour of tedious clicking, I realized that I would be better off writing a simple program in Clojure that would generate the map for me. It turned out to be an easy task: the remainder of this article will be an attempt to reconstruct my steps.</p><h2 id="svg">SVG</h2><p>The format of the source image is SVG. I knew it was an XML-based vector graphics format, I’d often encountered images in this format on Wikipedia — but editing it by hand was new to me. Luckily, it turned out that the image has a simple structure. Each country’s envelope curve is described with a <code>path</code> element that looks like this:</p><pre><code class="hljs xml"><span class="hljs-tag"><<span class="hljs-name">path</span>
<span class="hljs-attr">id</span>=<span class="hljs-string">"pl"</span>
<span class="hljs-attr">class</span>=<span class="hljs-string">"eu europe"</span>
<span class="hljs-attr">d</span>=<span class="hljs-string">"a long list of curve node coordinates"</span> /></span>
</code></pre><p>An important thing to note here is the <code>id</code> attribute — this is the two-letter ISO-3166-1-ALPHA2 country code. In fact, there is an informative comment right at the beginning of the image that explains the naming conventions used. Having such a splendid input was of great help.</p><p>Just like HTML, SVG <a href="http://www.w3.org/TR/SVG/styling.html">uses CSS stylesheets</a> to define the look of an element. All that is needed to color Poland red is to style the element with a <code>fill</code> attribute:</p><pre><code class="hljs xml"><span class="hljs-tag"><<span class="hljs-name">path</span>
<span class="hljs-attr">id</span>=<span class="hljs-string">"pl"</span>
<span class="hljs-attr">style</span>=<span class="hljs-string">"fill: #ff0000;"</span>
<span class="hljs-attr">class</span>=<span class="hljs-string">"eu europe"</span>
<span class="hljs-attr">d</span>=<span class="hljs-string">"a long list of curve node coordinates"</span> /></span>
</code></pre><p>Now that we know all this, let’s start coding!</p><h2 id="xml-in-clojure">XML in Clojure</h2><p>The basic way to handle XML in Clojure is to use the <code>clojure.xml</code> namespace, which contains functions that parse XML (on a DOM basis, i.e., into an in-memory tree structure) and serialize such structures back into XML. Let us launch a REPL and start by reading our map and parsing it:</p><pre><code class="hljs clojure">> (<span class="hljs-name"><span class="hljs-built_in">use</span></span> 'clojure.xml)
<span class="hljs-literal">nil</span>
> (<span class="hljs-keyword">def</span> <span class="hljs-title">m</span> (<span class="hljs-name">parse</span> <span class="hljs-string">"/home/nathell/eur/Blank_map_of_Europe.svg"</span>))
[...a long while...]
Unexpected end of file from server
[Thrown class java.net.SocketException]
</code></pre><p>Hold on in there! What’s that <code>SocketException</code> doing here? Firefox displays this map properly, so does Chrome, WTF?! Shouldn’t everything work fine in such a great language as Clojure?</p><p>Well, the language is as good as its libraries — and when it comes to Clojure, one can stretch that thought further: Clojure libraries are as good as the Java libraries they use under the hood. In this case, we’ve encountered a feature of the standard Java XML parser (from <code>javax.xml</code> package). It is restrictive and tries to reject invalid documents (even if they are well-formed). If the file being parsed contains a <code>DOCTYPE</code> declaration, the Java parser, and hence <code>clojure.xml/parse</code>, tries to download the DTD schema from the given address and validate the document against that schema. This is unfortunate in many aspects, especially from the point of view of the <a href="http://www.w3.org">World Wide Web Consortium</a>, since their servers hold the Web standards. One can easily imagine the volume of network traffic this generates: W3C has a <a href="http://www.w3.org/blog/systeam?cat=68">blog post</a> about it. Many Java programmers have encountered this problem at some time. There are a few solutions; we will go the simplest way and just manually remove the offending <code>DOCTYPE</code> declaration.</p><pre><code class="hljs clojure">> (<span class="hljs-keyword">def</span> <span class="hljs-title">m</span> (<span class="hljs-name">parse</span> <span class="hljs-string">"/home/nathell/eur/bm.svg"</span>))
#'user/m
> m
[...many screenfuls of numbers...]
</code></pre><p>This time we managed to parse the image. Viewing the structure is not easy because of its sheer size (as expected: the file weighs in at over 0,5 MB!), but from the very first characters of the REPL’s output we can make out that’s it a Clojure map (no pun intended). Let’s examine its keys:</p><pre><code class="hljs clojure">> (<span class="hljs-name"><span class="hljs-built_in">keys</span></span> m)
(<span class="hljs-symbol">:tag</span> <span class="hljs-symbol">:attrs</span> <span class="hljs-symbol">:content</span>)
</code></pre><p>So the map contains three entries with descriptive names. <code>:tag</code> contains the name of the XML element, <code>:attrs</code> is a map of attributes for this element, and <code>:content</code> is a vector of its subelements, each in turn being represented by similarly structured map (or a string if it’s a text node):</p><pre><code class="hljs clojure">> (<span class="hljs-symbol">:tag</span> m)
<span class="hljs-symbol">:svg</span>
> (<span class="hljs-symbol">:attrs</span> m)
{<span class="hljs-symbol">:xmlns</span> <span class="hljs-string">"http://www.w3.org/2000/svg"</span><span class="hljs-punctuation">,</span> <span class="hljs-symbol">:width</span> <span class="hljs-string">"680"</span><span class="hljs-punctuation">,</span> <span class="hljs-symbol">:height</span> <span class="hljs-string">"520"</span><span class="hljs-punctuation">,</span> <span class="hljs-symbol">:viewBox</span> <span class="hljs-string">"1754 161 9938 7945"</span><span class="hljs-punctuation">,</span> <span class="hljs-symbol">:version</span> <span class="hljs-string">"1.0"</span><span class="hljs-punctuation">,</span> <span class="hljs-symbol">:id</span> <span class="hljs-string">"svg2"</span>}
> (<span class="hljs-name"><span class="hljs-built_in">count</span></span> (<span class="hljs-symbol">:content</span> m))
<span class="hljs-number">68</span>
</code></pre><p>Just for the sake of practice, let’s try to write the serialized representation of the parsed back as XML. The function <code>emit</code> should be able to do it, but it prints XML to standard output. We can use the <code>with-out-writer</code> macro from the namespace <code>clojure.contrib.io</code> to dump the XML to a file:</p><pre><code class="hljs clojure">> (<span class="hljs-name"><span class="hljs-built_in">use</span></span> 'clojure.contrib.io)
<span class="hljs-literal">nil</span>
> (<span class="hljs-name">with-out-writer</span> <span class="hljs-string">"/tmp/a.svg"</span> (<span class="hljs-name">emit</span> m))
<span class="hljs-literal">nil</span>
</code></pre><p>We try to view <code>a.svg</code> in Firefox and…</p><pre><code>Error parsing XML: not well-formed
Area: file:///tmp/a.xml
Row 15, column 44: Updated to reflect dissolution of Serbia & Montenegro: http://commons.wikimedia.org/wiki/User:Zirland
-------------------------------------------^
</code></pre><p>It turns out that using <code>clojure.xml/emit</code> is not recommended, because it does not handle XML entities in comments correctly; we should use <code>clojure.contrib.lazy-xml</code> instead. For the sake of example, though, let’s stay with <code>emit</code> and manually remove the offending line once again (we can safely do it, since that’s just a comment).</p><h2 id="coloring-poland">Coloring Poland</h2><p>We saw earlier that our main XML node contains 68 subnodes. Let’s see what they are — tag names will suffice:</p><pre><code class="hljs clojure">> (<span class="hljs-name"><span class="hljs-built_in">map</span></span> <span class="hljs-symbol">:tag</span> (<span class="hljs-symbol">:content</span> m))
(<span class="hljs-symbol">:title</span> <span class="hljs-symbol">:desc</span> <span class="hljs-symbol">:defs</span> <span class="hljs-symbol">:rect</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:g</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:g</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span> <span class="hljs-symbol">:path</span>)
</code></pre><p>So far, so good. Seems that all country descriptions are contained directly in the main node. Let us try to find Poland:</p><pre><code>> (count (filter #(and (= (:tag %) :path)
(= ((:attrs %) :id) "pl"))
(:content m)))
1
</code></pre><p>(This snippet of code filters the list of subnodes of <code>m</code> to pick only those elements whose tag name is <code>path</code> and value of attribute <code>id</code> is <code>pl</code>, and returns the length of such list.) Let’s try to add a <code>style</code> attribute to that element, according to what we said earlier. Because Clojure data structures are immutable, we have to define a new top-level element which will be the same as <code>m</code>, except that we will set the style of the appropriate subnode:</p><pre><code class="hljs clojure">> (<span class="hljs-keyword">def</span> <span class="hljs-title">m2</span> (<span class="hljs-name"><span class="hljs-built_in">assoc</span></span> m
<span class="hljs-symbol">:content</span>
(<span class="hljs-name"><span class="hljs-built_in">map</span></span> #(<span class="hljs-name"><span class="hljs-built_in">if</span></span> (<span class="hljs-name"><span class="hljs-built_in">and</span></span> (<span class="hljs-name"><span class="hljs-built_in">=</span></span> (<span class="hljs-symbol">:tag</span> %) <span class="hljs-symbol">:path</span>)
(<span class="hljs-name"><span class="hljs-built_in">=</span></span> ((<span class="hljs-symbol">:attrs</span> %) <span class="hljs-symbol">:id</span>) <span class="hljs-string">"pl"</span>))
(<span class="hljs-name"><span class="hljs-built_in">assoc</span></span> % <span class="hljs-symbol">:attrs</span> (<span class="hljs-name"><span class="hljs-built_in">assoc</span></span> (<span class="hljs-symbol">:attrs</span> %) <span class="hljs-symbol">:style</span> <span class="hljs-string">"fill: #ff0000;"</span>))
%)
(<span class="hljs-symbol">:content</span> m))))
#'user/m<span class="hljs-number">2</span>
> (<span class="hljs-name">with-out-writer</span> <span class="hljs-string">"/tmp/a.svg"</span> (<span class="hljs-name">emit</span> m2))
<span class="hljs-literal">nil</span>
</code></pre><p>We open the created file and see a map with Poland colored red. Yay!</p><h2 id="generalization">Generalization</h2><p>We will generalize our code a bit. Let us write a function that colors a single state, taking a <code>path</code> element (subnode of <code>svg</code>) as an argument:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defn</span> <span class="hljs-title">color-state</span>
[{<span class="hljs-symbol">:keys</span> [tag attrs] <span class="hljs-symbol">:as</span> element} colorize-fn]
(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [state (<span class="hljs-symbol">:id</span> attrs)]
(<span class="hljs-name"><span class="hljs-built_in">if-let</span></span> [color (<span class="hljs-name">colorize-fn</span> state)]
(<span class="hljs-name"><span class="hljs-built_in">assoc</span></span> element <span class="hljs-symbol">:attrs</span> (<span class="hljs-name"><span class="hljs-built_in">assoc</span></span> attrs <span class="hljs-symbol">:style</span> (<span class="hljs-name"><span class="hljs-built_in">str</span></span> <span class="hljs-string">"fill:"</span> color)))
element)))
</code></pre><p>This function is similar to the anonymous one we used above in the <code>map</code> call, but differs in some respects. It takes two arguments. As mentioned, the first one is the XML element (destructured into <code>tag</code> and <code>attrs</code>: you can read more about destructuring in <a href="http://clojure.org/special_forms">the appropriate part of Clojure docs</a>), and the second argument is… a function that should take a two-letter country code and return a HTML color description (or <code>nil</code>, if that country’s color is not specified — <code>color-state</code> will cope with this and return the element unchanged).</p><p>Now that we have <code>color-state</code>, we can easily write a higher-level function that processes and writes XML in one step:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defn</span> <span class="hljs-title">save-color-map</span>
[svg colorize-fn outfile]
(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [colored-map (<span class="hljs-name"><span class="hljs-built_in">assoc</span></span> svg <span class="hljs-symbol">:content</span> (<span class="hljs-name"><span class="hljs-built_in">map</span></span> #(<span class="hljs-name">color-state</span> % colorize-fn) (<span class="hljs-symbol">:content</span> svg)))]
(<span class="hljs-name">with-out-writer</span> out
(<span class="hljs-name">emit</span> colored-map))))
</code></pre><p>Let’s test it:</p><pre><code class="hljs clojure">> (<span class="hljs-name">save-color-map</span> m {<span class="hljs-string">"pl"</span> <span class="hljs-string">"#00ff00"</span>} <span class="hljs-string">"/tmp/a.svg"</span>)
<span class="hljs-literal">nil</span>
</code></pre><p>This time Poland is green (we used a country→color map as an argument to <code>color-state</code>, since Clojure maps are callable like functions). Let’s try to add blue Germany:</p><pre><code class="hljs clojure">> (<span class="hljs-name">save-color-map</span> m {<span class="hljs-string">"pl"</span> <span class="hljs-string">"#00ff00"</span><span class="hljs-punctuation">,</span> <span class="hljs-string">"de"</span> <span class="hljs-string">"#0000ff"</span>} <span class="hljs-string">"/tmp/a.svg"</span>)
<span class="hljs-literal">nil</span>
</code></pre><p>It works!</p><h2 id="problem-with-the-uk">Problem with the UK</h2><p>Inspired by our success, we try to color different countries. It mostly works, but the United Kingdom remains gray, regardless of whether we specify its code as “uk” or “gb”. We resort to the source of our image, and the beginning comment once again proves helpful:</p><blockquote><p>Certain countries are further subdivided the United Kingdom has gb-gbn for Great Britain and gb-nir for Northern Ireland. Russia is divided into ru-kgd for the Kaliningrad Oblast and ru-main for the Main body of Russia. There is the additional grouping #xb for the “British Islands” (the UK with its Crown Dependencies – Jersey, Guernsey and the Isle of Man)</p></blockquote><p>Perhaps we have to specify “gb-gbn” and “gb-nir”, instead of just “gb”? We try that, but still no luck. After a while of thought: oh yes! Our initial assumption that <em>all</em> the country definitions are <code>path</code> subnodes of the toplevel <code>svg</code> node is false. We have to fix that.</p><p>So far we have been doing a “flat” transform of the SVG tree: we only changed the subnodes of the toplevel node, but no deeper. We should change all the <code>path</code> elements (and <code>g</code>, if we want to color groups of paths like the UK), regardless of how deep they occur in the tree.</p><p>We can use a <a href="http://clojure.org/other_libraries">zipper</a> to do a depth-first walk of the SVG tree. Let us define a function that takes a zipper, a predicate that tells whether to edit the node in question, and the transformation function to apply to the node if the predicate returns <code>true</code>:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defn</span> <span class="hljs-title">map-zipper</span> [f pred z]
(<span class="hljs-name"><span class="hljs-built_in">if</span></span> (<span class="hljs-name">zip/end?</span> z)
(<span class="hljs-name">zip/root</span> z)
(<span class="hljs-name"><span class="hljs-built_in">recur</span></span> f pred (<span class="hljs-name"><span class="hljs-built_in">-></span></span> z (<span class="hljs-name">zip/edit</span> #(<span class="hljs-name"><span class="hljs-built_in">if</span></span> (<span class="hljs-name">pred</span> %) (<span class="hljs-name">f</span> %) %)) zip/next)))))
</code></pre><p>Now we rewrite <code>save-color-map</code> as:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defn</span> <span class="hljs-title">save-color-map</span>
[svg colorize-fn outfile]
(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [colored-map (<span class="hljs-name">map-zipper</span> #(<span class="hljs-name">color-state</span> % colorize-fn) (<span class="hljs-name"><span class="hljs-built_in">fn</span></span> [x] (#{<span class="hljs-symbol">:g</span> <span class="hljs-symbol">:path</span>} (<span class="hljs-symbol">:tag</span> x))) (<span class="hljs-name">zip/xml-zip</span> svg))]
(<span class="hljs-name">with-out-writer</span> out
(<span class="hljs-name">emit</span> colored-map))))
</code></pre><p>This time the UK can be colored.</p><h2 id="colorizers">Colorizers</h2><p>We have automated the process of styling countries to make them appear in color, but translating particular numbers to RGB is tedious. In the last part of this article we will see how to ease this: we are going to write a <em>colorizer</em>, i.e., a function suitable for passing to <code>color-state</code> and <code>save-color-map</code> (so far we’ve been using maps for this).</p><p>Let’s start by writing a function that translates a triplet of numbers into a HTML RGB notation, because it will be easier for us to work with integers than with strings:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defn</span> <span class="hljs-title">htmlize-color</span>
[[r g b]]
(<span class="hljs-name"><span class="hljs-built_in">format</span></span> <span class="hljs-string">"#%02x%02x%02x"</span> r g b))
</code></pre><p>Now we insert a call to <code>htmlize-color</code> into the appropriate pace in <code>color-state</code>:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defn</span> <span class="hljs-title">color-state</span>
[{<span class="hljs-symbol">:keys</span> [tag attrs] <span class="hljs-symbol">:as</span> element} colorize-fn]
(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [state (<span class="hljs-symbol">:id</span> attrs)]
(<span class="hljs-name"><span class="hljs-built_in">if-let</span></span> [color (<span class="hljs-name">colorize-fn</span> state)]
(<span class="hljs-name"><span class="hljs-built_in">assoc</span></span> element <span class="hljs-symbol">:attrs</span> (<span class="hljs-name"><span class="hljs-built_in">assoc</span></span> attrs <span class="hljs-symbol">:style</span> (<span class="hljs-name"><span class="hljs-built_in">str</span></span> <span class="hljs-string">"fill:"</span> (<span class="hljs-name">htmlize-color</span> color))))
element)))
</code></pre><p>Now imagine we have a table with numeric values for states, like this:</p><table class="entry">
<tr class="header"><th>State</th><th>Value</th></tr>
<tr><td>Poland</td><td class="center">20</td></tr>
<tr><td>Germany</td><td class="center">15</td></tr>
<tr><td>Netherlands</td><td class="center">30</td></tr>
</table>
<p>We want to have a function that assigns colors to states, such that the intensity of a color should be proportional to the value assigned to a given state. To be more general, assume we have two colors, c1 and c2, and for a given state, for each of the R, G, B components we assign a value proportional to the difference between the state’s value and the smallest value in the dataset, normalized to lie between c1 and c2.</p><p>This sounds complex, but I hope an example will clear things up. This is the Clojure implementation of the described algorithm:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defn</span> <span class="hljs-title">make-colorizer</span>
[dataset ranges]
(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [minv (<span class="hljs-name"><span class="hljs-built_in">apply</span></span> min (<span class="hljs-name"><span class="hljs-built_in">vals</span></span> dataset))
maxv (<span class="hljs-name"><span class="hljs-built_in">apply</span></span> max (<span class="hljs-name"><span class="hljs-built_in">vals</span></span> dataset))
progress (<span class="hljs-name"><span class="hljs-built_in">map</span></span> (<span class="hljs-name"><span class="hljs-built_in">fn</span></span> [[min-col max-col]] (/ (<span class="hljs-name"><span class="hljs-built_in">-</span></span> max-col min-col) (<span class="hljs-name"><span class="hljs-built_in">-</span></span> maxv minv))) ranges)]
(<span class="hljs-name"><span class="hljs-built_in">into</span></span> {}
(<span class="hljs-name"><span class="hljs-built_in">map</span></span> (<span class="hljs-name"><span class="hljs-built_in">fn</span></span> [[k v]] [(<span class="hljs-name">.toLowerCase</span> k) (<span class="hljs-name"><span class="hljs-built_in">map</span></span> (<span class="hljs-name"><span class="hljs-built_in">fn</span></span> [progress [min-color _]] (<span class="hljs-name"><span class="hljs-built_in">int</span></span> (<span class="hljs-name"><span class="hljs-built_in">+</span></span> min-color (<span class="hljs-name"><span class="hljs-built_in">*</span></span> (<span class="hljs-name"><span class="hljs-built_in">-</span></span> v minv) progress)))) progress ranges)])
dataset))))
</code></pre><p>Let us see how it works on our sample data:</p><pre><code class="hljs clojure">> (<span class="hljs-name">make-colorizer</span> {<span class="hljs-string">"pl"</span> <span class="hljs-number">20</span><span class="hljs-punctuation">,</span> <span class="hljs-string">"de"</span> <span class="hljs-number">15</span><span class="hljs-punctuation">,</span> <span class="hljs-string">"nl"</span> <span class="hljs-number">30</span>} [[<span class="hljs-number">0</span> <span class="hljs-number">255</span>] [<span class="hljs-number">0</span> <span class="hljs-number">0</span>] [<span class="hljs-number">0</span> <span class="hljs-number">0</span>]])
{<span class="hljs-string">"pl"</span> (<span class="hljs-number">85</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span>)<span class="hljs-punctuation">,</span> <span class="hljs-string">"de"</span> (<span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span>)<span class="hljs-punctuation">,</span> <span class="hljs-string">"nl"</span> (<span class="hljs-number">255</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span>)}
</code></pre><p>The second argument means that the red component is to range between 0 and 255, and the green and blue components are to be fixed at 0.</p><p>Like we wanted, Germany ends up darkest (because it has the least value), the Netherlands is lightest (because it has the greatest value), and Poland’s intensity is one third that of the Netherlands (because 20 is in one third of the way between 15 and 30).</p><h2 id="wrapping-up">Wrapping up</h2><p>The application we created can be further developed in many ways. One can, for instance, add a Web interface for it, or write many different colorizers (e.g., discrete colorizer: fixed colours for ranges of input values, or a temperature colorizer transitioning smoothly from blue through white to red — to do this we would have to pass through the HSV color space).</p><p>What is your idea to improve on it? For those of you who are tired of pasting snippets of code into the REPL, I’m putting the complete source code with a Leiningen project on <a href="https://github.com/nathell/color-europe">GitHub</a>. Forks are welcome.</p></div>tag:blog.danieljanus.pl,2011-07-08:post:meet-my-little-friend-createtreeMeet my little friend createTree2011-07-08T00:00:00Z<div><p>I’ve recently been developing an iPhone application in my spare time. I’m not going to tell you what it is just yet (I will post a separate entry once I manage to get it into the App Store); for now, let me just say that I’m writing it in JavaScript and HTML5, using [PhoneGap][1] and [jQTouch][2] to give it a native touch.</p><p>After having written some of code, I began testing it on a real device and encountered a nasty issue. It turned out that some of the screens of my app, containing a dynamically-generated content, sometimes would not show up. I tried to chase the problem down, but it seemed totally random. Finally, I googled up [this blog post][3] that gave me a clue.</p><p>My code was using jQuery’s <code>.html()</code> method (and hence <code>innerHTML</code> under the hood) to display the dynamic content. It turns out that, on Mobile Safari, using <code>innerHTML</code> is highly unreliable (at least on iOS 4.3, but this seems to be a long-standing bug). Sometimes, the change just does not happen. I changed one of my screens, to build and insert DOM objects explicitly, and sure enough, it started to work predictably well.</p><p>So I had to remove all usages of <code>.html()</code> from my app. The downside to it was that explicit DOM-building code is much more verbose than the version that constructs HTML and then sets it up. It’s tedious to write and contains much boilerplate.</p><p>To not be forced to change code, the above-quoted article advocates using a pure-JavaScript HTML parser outputting DOM to replace jQuery’s <code>.html()</code> method. I considered this for a while, but finally decided against it — I didn’t want to include another big, complex dependency that potentially could misbehave at times (writing HTML parsers is <em>hard</em>).</p><p>Instead, I came up with this:</p><pre><code class="hljs javascript"><span class="hljs-keyword">function</span> <span class="hljs-title function_">createTree</span>(<span class="hljs-params">tree</span>) {
<span class="hljs-keyword">if</span> (<span class="hljs-keyword">typeof</span> tree === <span class="hljs-string">'string'</span> || <span class="hljs-keyword">typeof</span> tree === <span class="hljs-string">'number'</span>)
<span class="hljs-keyword">return</span> <span class="hljs-variable language_">document</span>.<span class="hljs-title function_">createTextNode</span>(tree);
<span class="hljs-keyword">var</span> tag = tree[<span class="hljs-number">0</span>], attrs = tree[<span class="hljs-number">1</span>], res = <span class="hljs-variable language_">document</span>.<span class="hljs-title function_">createElement</span>(tag);
<span class="hljs-keyword">for</span> (<span class="hljs-keyword">var</span> attr <span class="hljs-keyword">in</span> attrs) {
val = attrs[attr];
<span class="hljs-keyword">if</span> (attr === <span class="hljs-string">'class'</span>)
res.<span class="hljs-property">className</span> = val;
<span class="hljs-keyword">else</span>
$(res).<span class="hljs-title function_">attr</span>(attr, val);
}
<span class="hljs-keyword">for</span> (<span class="hljs-keyword">var</span> i = <span class="hljs-number">2</span>; i < tree.<span class="hljs-property">length</span>; i++)
res.<span class="hljs-title function_">appendChild</span>(<span class="hljs-title function_">createTree</span>(tree[i]));
<span class="hljs-keyword">return</span> res;
}
</code></pre><p>This is very similar in spirit to <code>.html()</code>, except that instead of passing HTML, you give it a data structure representing the DOM tree to construct. It can either be a string (which yields a text node), or a list consisting of the HTML tag name, an object mapping attributes to their values, and zero or more subtrees of the same form. Compare:</p><p>Using <code>.html()</code>:</p><pre><code class="hljs javascript"><span class="hljs-keyword">var</span> html = <span class="hljs-string">'<p>This is an <span class="red">example.</span></p>'</span>;
$(<span class="hljs-string">'#myDiv'</span>).<span class="hljs-title function_">html</span>(html);
</code></pre><p>Using <code>createTree</code>:</p><pre><code class="hljs javascript"><span class="hljs-keyword">var</span> tree = [<span class="hljs-string">'p'</span>, {},
<span class="hljs-string">'This is an '</span>,
[<span class="hljs-string">'span'</span>, {<span class="hljs-string">'class'</span>: <span class="hljs-string">'red'</span>}, <span class="hljs-string">'example.'</span>]];
$(<span class="hljs-string">'#myDiv'</span>).<span class="hljs-title function_">empty</span>().<span class="hljs-title function_">append</span>(<span class="hljs-title function_">createTree</span>(tree));
</code></pre><p>A side benefit is that it is just as easy to build up a tree dynamically as it is to create HTML, and the code often gets clearer. Note how the <code>createTree</code> version above does not mix single and double quotes which is easy to mess up in the <code>.html()</code> version.</p></div>tag:blog.danieljanus.pl,2011-05-15:post:a-quirk-with-javascript-closuresA quirk with JavaScript closures2011-05-15T00:00:00Z<div><p>I keep running into this obstacle every now and then. Consider this example:</p><pre><code class="hljs javascript">> q = []
[]
> <span class="hljs-keyword">for</span> (<span class="hljs-keyword">var</span> i = <span class="hljs-number">0</span>; i < <span class="hljs-number">3</span>; i++)
q.<span class="hljs-title function_">push</span>(<span class="hljs-keyword">function</span>(<span class="hljs-params"></span>) { <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(i); });
> q[<span class="hljs-number">0</span>]()
<span class="hljs-number">3</span>
</code></pre><p>I wanted an array of three closures, each printing a different number to the console when called. Instead, each prints 3 (or, rather, whatever the value of the variable <code>i</code> happens to be).</p><p>I am not exactly sure about the reason, but presumably this happens because the <code>i</code> in each lambda refers to the <em>variable</em> <code>i</code> itself, not to its binding from the creation time of the function.</p><p>One solution is to enforce the bindings explicitly on each iteration, like this:</p><pre><code class="hljs javascript"><span class="hljs-keyword">for</span> (<span class="hljs-keyword">var</span> i = <span class="hljs-number">0</span>; i < <span class="hljs-number">3</span>; i++)
(<span class="hljs-keyword">function</span>(<span class="hljs-params">v</span>) {
q.<span class="hljs-title function_">push</span>(<span class="hljs-keyword">function</span>(<span class="hljs-params"></span>) { <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(v); });
})(i);
</code></pre><p>Or use <a href="http://documentcloud.github.com/underscore/">Underscore.js</a>, which is what I actually do:</p><pre><code class="hljs javascript"><span class="hljs-title function_">_</span>([<span class="hljs-number">1</span>,<span class="hljs-number">2</span>,<span class="hljs-number">3</span>]).<span class="hljs-title function_">each</span>(<span class="hljs-keyword">function</span>(<span class="hljs-params">i</span>) {
q.<span class="hljs-title function_">push</span>(<span class="hljs-keyword">function</span>(<span class="hljs-params"></span>) { <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(i); });
});
</code></pre></div>tag:blog.danieljanus.pl,2011-03-28:post:the-dijkstran-wheel-of-fortuneThe Dijkstran wheel of fortune: SPSS, Excel, VBA2011-03-28T00:00:00Z<div><blockquote><p>It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.</p><p style="text-align: right;">— Edsger W. Dijkstra, EWD 498</p>
</blockquote><p>I like to think of myself somewhat egotistically as a counterexample to the above Dijkstra’s statement. Granted, some of my code is definitely of poor quality, and I dare not call myself a good programmer. But, having started with BASIC on a Commodore 64, then proceeding to learn Pascal (of the Turbo/Borland flavour), then C, x86 assembly, OCaml, Smalltalk, Java, C++, Haskell, Common Lisp, Clojure, and a couple of other languages, with a few enlightenments achieved along the way, I do think I managed to regenerate from the mental wounds that BASIC had inflicted upon me. And now I feel a strange sensation, now that the Dijkstran wheel of fortune has made a full spin: I’ve spend the last few days writing BASIC code. I’ve written several Excel macros in Visual Basic for Applications.</p><p>Why the strange selection of a language? Well, this was simply the best tool for the job. What I needed to do was postprocess the output of some statistical analyses performed in [SPSS][1] running under Windows, altering the way the results were presented. SPSS can export data to HTML, Word, and Excel; of these three, the latter is most convenient, because it preserves the structure of the output tables most thoroughly. (In principle, HTML does too, and in fact my first stab was with Clojure, but I stopped after realizing just how much ad-hoc, throwaway code that parses the SPSS-generated HTML, munges it several times to and fro, and then outputs back HTML I’d have to write). So I went the Excel way, and in this post I’d like to share my mixed feelings from that encounter.</p><p>Visual Basic the language is icky. It is certainly a step forward from the BASIC I remember from decades ago, in that I didn’t have to number my lines, and it is possible to structure the code nicely so that it doesn’t contain any GOTOs, GOSUBs or RETURNs. And it has this object-oriented feel to it. But compared to modern languages, programming in it resembles voluntarily putting on handcuffs, and then jumping around to avoid stumbling over the logs it throws under your legs. Not quite so big and scary logs as C++ does, but still. I mean, why on earth does VB have to distinguish between expressions and statements? Many languages do, but in most of them an expression is at least a valid statement. Not so in VB. Also, VB is still line-oriented: whether or not you require an <code>End If</code> in the conditional statement depends on whether it fits in one line or not. But my biggest pain was with the assignments. VB makes a distinction between reference assignments and other assignments, requiring a <code>Set</code> statement in the first case, and disallowing it in the second. So, <code>Set myCell = thatOtherCell</code> but <code>foo = 42</code>. Worse, forgetting the <code>Set</code> in the first case does not result in an error, which makes such bugs very hard to debug. Yurgh.</p><p>Also, the IDE built into Excel for developing VB macros is mediocre. There is an editor, which highlights the syntax and automatically reformats the code, inserting spaces as appropriate, which is nice. It slaps me in face with a modal dialog whenever I make a syntax error and move off the line, which is not so nice. There is a REPL of sorts, taking the form of an “Immediate” window, into which you can type statements (not expressions, remember?) and tap Enter to execute them. You can also <code>Debug.Print</code> to them, like to a JavaScript console. It is not reachable by Ctrl-Tab from the editor, so I ended up using mouse much more often than normally. I want my Emacs back!</p><p>On the other hand, I find the object-oriented API for actually accessing the spreadsheets quite well-designed and pleasant to use. You just grab the object representing your worksheets from the global <code>Worksheets</code> object (indexable by number or by name), and from there you access your cells. The basic object you work with is the <code>Range</code> object, representing either a single cell or a bunch of them; you can get or set cell values, change the formatting, call <code>Offset</code> to navigate around as if with cursor keys. You also can search for specific content in the sheet. Simple enough, easy to use and pick up; and above all, allows to get the job done without getting in the way much.</p><p>As for SPSS itself: it sucks. In fact, it sucks so great and in so many different ways that it merits its own blog entry (which will follow someday). For now, I’ll only note down the things pertaining to Excel interop; hopefully it will save somebody’s time.</p><p>Problem is, SPSS 19’s Excel export is buggy. In fact, it’s so unreliable that I’ve wasted more hours struggling with it than actually writing my macros. (We’re talking SPSS 19 here; I’ve also tried version 17, with the same results.) It exports small data chunks fine, but the larger your output, the more likely it is that Excel alerts about unreadable content in your file. Excel then offers to repair the data, which mostly succeeds, but inevitably loses the formatting — which for me was a no-no.</p><p>So, after long hours of experimentation and attempting different workarounds, I found that it is much, much more reliable to just copy your data and paste it into Excel directly, without exporting to a temporary file. Just do <code>Edit → Copy special</code> and select Excel BIFF format, to make sure you’re copying the right data. If Excel complains about not being able to understand the copied content (turn on the Clipboard preview to find out), save your output to .spv, restart SPSS, re-run your syntax and try again. With luck, it will eventually work. At least for me it did.</p></div>tag:blog.danieljanus.pl,2011-03-11:post:hello-world-againHello world, again2011-03-11T00:00:00Z<div><p>I’ve been quiet on the front of blogging in English recently. But that doesn’t mean I’ve given up.</p><p>After more than a year, I had become tired of maintaining a <a href="http://blosxom.com/">Blosxom</a> installation. I greatly admire Blosxom, its minimalism and extensibility, but the default installation is just too minimal for my needs. And the plugins tend to have rough edges. Like the Disqus comments that I’ve enabled at one time on the otherwise static blog pages: the correct number of comments appears in some places but not all; besides, they just don’t feel right.</p><p>So I’ve embarked on an experiment with a blogging platform, namely Posterous. I’ve started a <a href="http://plblog.danieljanus.pl/">blog in Polish</a> there to comment on local affairs in my mother tongue and to popularize Clojure among Polish programmers. And after a few months, I consider this experiment successful. Posterous supports Markdown, which I grew accustomed to while using Blosxom. It automatically syntax-highlights snippets of Clojure code that I post, which is a big win. It is highly customizable, easy to use (blogging via email FTW!), and lets me control my data. It does have its deficiencies, but on the whole it gets in the way less. So I’m switching to Posterous for “Musings of a Lispnik” too.</p><p>It is unclear for me how to migrate the old content to new platform, so for now I’ll leave it as is under <a href="http://danieljanus.pl/oldblog">a temporary address</a>, while posting new things exclusively here. (<em>Update 2012-09-24:</em> After another migration, this time to Octopress, I’ve merged the old contents back where it belongs.)</p><p>In the near future, I plan to translate a few articles about Clojure I’d written in Polish and post their English versions here. Stay tuned!</p></div>tag:blog.danieljanus.pl,2011-03-10:post:last-post-hereLast post here2011-03-10T00:00:00Z<div><p>I’ve decided to move my English blog to Posterous. The new address is <a href="http://danieljanus-en.posterous.com"><code>http://danieljanus-en.posterous.com</code></a>. This URL (<a href="http://blog.danieljanus.pl"><code>http://blog.danieljanus.pl</code></a>) will point to the new blog in about a week’s time.</p><p>I’m only posting this to let people update their RSS feeds to the new address, which is</p><p><a href="http://feeds.feedburner.com/MusingsOfALispnik"><code>http://feeds.feedburner.com/MusingsOfALispnik</code></a></p></div>tag:blog.danieljanus.pl,2010-05-04:post:defnkKeyword arguments2010-05-04T00:00:00Z<div><p>There’s been an <a href="http://stuartsierra.com/2010/01/15/keyword-arguments-in-clojure">ongoing</a> <a href="http://www.fatvat.co.uk/2009/01/passing-parameters-in-clojure.html">debate</a> about how to pass optional named arguments to Clojure functions. One way to do this is the <a href="http://richhickey.github.com/clojure-contrib/def-api.html#clojure.contrib.def/defnk">defnk</a> macro from <code>clojure.contrib.def</code>; I hesitate to call it <em>canonical</em>, since apparently not everyone uses it, but I’ve found it useful a number of times. Here’s a sample:</p><pre><code class="hljs clojure">user> (<span class="hljs-name"><span class="hljs-built_in">use</span></span> 'clojure.contrib.def)
<span class="hljs-literal">nil</span>
user> (<span class="hljs-name">defnk</span> f [<span class="hljs-symbol">:b</span> <span class="hljs-number">43</span>] (<span class="hljs-name"><span class="hljs-built_in">inc</span></span> b))
#'user/f
user> (<span class="hljs-name">f</span>)
<span class="hljs-number">44</span>
user> (<span class="hljs-name">f</span> <span class="hljs-symbol">:b</span> <span class="hljs-number">100</span>)
<span class="hljs-number">101</span>
</code></pre><p>This is an example of <em>keyword arguments</em> in action. Keyword arguments are a core feature of some languages, notably <a href="http://www.gigamonkeys.com/book/functions.html#keyword-parameters">Common Lisp</a> and <a href="http://caml.inria.fr/pub/docs/manual-ocaml/manual006.html#htoc38">Objective Caml</a>. Clojure doesn’t have them, but it’s pretty easy to emulate their basic usage with macros, as <code>defnk</code> does.</p><p>But there’s more to Common Lisp’s keyword arguments than <code>defnk</code> provides. In CL, the default value of a keyword argument can be an expression referring to other arguments of the same function. For example:</p><pre><code class="hljs lisp">CL-USER> (<span class="hljs-name">defun</span> f (<span class="hljs-name">&key</span> (<span class="hljs-name">a</span> <span class="hljs-number">1</span>) (<span class="hljs-name">b</span> a))
(<span class="hljs-name">+</span> a b))
F
CL-USER> (<span class="hljs-name">f</span>)
<span class="hljs-number">2</span>
CL-USER> (<span class="hljs-name">f</span> <span class="hljs-symbol">:a</span> <span class="hljs-number">45</span>)
<span class="hljs-number">90</span>
CL-USER> (<span class="hljs-name">f</span> <span class="hljs-symbol">:b</span> <span class="hljs-number">101</span>)
<span class="hljs-number">102</span>
</code></pre><p>I wish <code>defnk</code> had this feature. Or is there some better way that I don’t know of?</p></div>tag:blog.danieljanus.pl,2010-04-18:post:sunflowerSunflower2010-04-18T00:00:00Z<div><p>The program I’ve been [writing about recently][1] has come to a point where I think it can be shown to the wide public. It’s called [Sunflower][2] and has its home on GitHub. It’s nowhere near being completed, and of alpha quality right now, but even at this stage it might be useful.</p><p>Just as sunflower seed kernels come wrapped in hulls, most HTML documents seen in the wild come wrapped in noise that is not really part of the document itself. Take any news site: a document from such a site contains things such as advertisements, header, footer, and many links. Now suppose you have many documents grabbed from the same site. Is it possible to somehow automate the extraction of the document “essences”?</p><p>Sunflower to the rescue. It relies on the assumption that documents coming from the same source have the same structure. It presents a list of strings to the user, and asks to pick those that are contained in the text essence. Then it finds the coordinates of the smallest HTML subtree that contains all those strings, and uses those coordinates to extract information from all documents. And it comes with a nice, easily understandable GUI for that.</p><p>This technique works remarkably well for many collections, although not all. An earlier, proof-of-concept implementation (in Common Lisp) has been used to extract many press texts for the [National Corpus of Polish][3].</p><p>I’ve given up on the symbol-capturing approach to wizards I’ve presented in my previous posts. Inspired by the DOM tree in Web apps, with a bag of elements with identifiers, I now have a central bag of Swing widgets (implemented as an atom) identified by keywords. This bag contains tidbits of the mutable state of Sunflower. This means that I can write callback functions like this:</p><pre><code class="hljs clojure">#(<span class="hljs-name">with-components</span> [strings-model selected-dir]
(<span class="hljs-name">.removeAllElements</span> strings-model)
(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [p (<span class="hljs-name"><span class="hljs-built_in">-></span></span> selected-dir htmls first parse)]
(<span class="hljs-name">add-component</span> <span class="hljs-symbol">:parsed</span> p)
(<span class="hljs-name"><span class="hljs-built_in">doseq</span></span> [x (<span class="hljs-name">strings</span> p)]
(<span class="hljs-name">.addElement</span> strings-model x))))
</code></pre><p>Name and conquer: having parts of state explicitly named mean that I can reliably access them from just about anywhere. This reduces confusion and allows for less tangled, more self-contained and understandable code.</p></div>tag:blog.danieljanus.pl,2010-04-05:post:symbol-captureA case for symbol capture2010-04-05T00:00:00Z<div><p>Clojure by default protects macro authors from incidentally capturing a local symbol. Stuart Halloway <a href="http://blog.thinkrelevance.com/2008/12/17/on-lisp-clojure-chapter-9">describes this</a> in more detail, explaining why this is a Good Thing. However, sometimes this kind of symbol capture is called for. I’ve encountered one such case today while hacking a Swing application.</p><p>As I develop the app, I find new ways to express Swing concepts and interact with Swing objects in a more Clojuresque way, so a library of GUI macros and functions gets written. One of them is a <code>wizard</code> macro for easy creation of installer-like wizards, where there is a sequence of screens that can be navigated with <em>Back</em> and <em>Next</em> buttons at the bottom of the window.</p><p>The API (certainly not finished yet) currently looks like this:</p><pre><code class="hljs clojure">(<span class="hljs-name">wizard</span> & components)
</code></pre><p>where each Swing <code>component</code> corresponding to one wizard screen can be augmented by a supplementary map, which can contain, <em>inter alia</em>, a function to execute upon showing the screen in question.</p><p>Now, I want those functions to be able to access the <em>Back</em> and <em>Next</em> buttons in case they want to disable or enable them at need. I thus want the API user to be able to use two symbols, <code>back-button</code> and <code>next-button</code>, in the macro body, and have them bound to the corresponding buttons.</p><p>It is crucial that these bindings be lexical and not dynamic. If they were dynamic, they would be only effective during the definition of the wizard, but not when my closures are invoked later on. Thus, my implementation looks like this:</p><pre><code class="hljs clojure">(<span class="hljs-keyword">defmacro</span> <span class="hljs-title">wizard</span> [& panels]
`(<span class="hljs-name"><span class="hljs-built_in">let</span></span> [~'back-button (<span class="hljs-name">button</span> <span class="hljs-string">"< Back"</span>)
~'next-button (<span class="hljs-name">button</span> <span class="hljs-string">"Next >"</span>)]
(<span class="hljs-name">do-wizard</span> ~'back-button ~'next-button ~(<span class="hljs-name"><span class="hljs-built_in">vec</span></span> panels))))
</code></pre><p>where <code>do-wizard</code> is a private function implementing the actual wizard creation, and the <code>~'foo</code> syntax forces symbol capture.</p><p>By the way, if all goes well, this blog post should be the first one syndicated to Planet Clojure. Hello, Planet Clojure readers!</p></div>tag:blog.danieljanus.pl,2010-04-04:post:apennines-hikingHiking in the Apennines2010-04-04T00:00:00Z<div><p>I’ve recently done a week-long hike in the Umbria-Marche region of the Italian Apennines (the vicinity of <a href="http://en.wikipedia.org/wiki/Monte_Catria">Monte Catria</a>, near <a href="http://en.wikipedia.org/wiki/Cantiano">Cantiano</a>, to be more precise), and here are some tips I’d like to share.</p><ul><li><span>The Umbria-Marche Apennine doesn’t seem to be frequented by a lot of tourists, especially in mid-March. The information offices, although helpful, are often closed (this is not only the case with the mountain region: contrary to information available on the Web, the tourist information at Forlì airport was closed on Sunday morning), and most of the Italians we’ve met didn’t speak English.</span></li><li><span>The tourist trails in the region are not well marked. Direction marks are nowhere to be found, nor are the signs visible on junctions. We had to ask the locals when leaving Cantiano for Monte Tenetra (and ended up on M. Alto instead anyway).</span></li><li><span>There are a lot of <em>rifugi</em> (mountain huts), but most of them are closed at this time of year. We passed by six or seven, out of which only one was available for sleep: Rifugio Fonte del Faggio (depicted), merely a small bothy with one worm-eaten bunk bed. Another one, <a href="http://www.montecatria.com/it/rifugi/rifugio_cupa_delle_cotaline.aspx">Cupa delle Cotaline</a>, with restaurant facilities and situated by a station of a local skilift, opened in the morning, but was closed for the night.</span></li></ul><figure class="image"><img alt="" src="/img/blog/faggio.jpg" /><figcaption>Rifugio Fonte del Faggio</figcaption></figure></div>tag:blog.danieljanus.pl,2010-03-31:post:lein-swankThe pitfalls of <code>lein swank</code>2010-03-31T00:00:00Z<div><p>A couple of weeks ago I finally got around to acquainting myself with [Leiningen][1], one of the most popular build tools for Clojure. The thing that stopped me the most was that Leiningen uses [Maven][2] under the hood, which seemed a scary beast at first sight — but once I’ve overcome the initial fear, it turned out to be a quite simple and useful tool.</p><p>One feature in particular is very useful for Emacs users like me: <code>lein swank</code>. You define all dependencies in <code>project.clj</code> as usual, add a magical line to <code>:dev-dependencies</code>, then say</p><pre><code>$ lein swank
</code></pre><p>and lo and behold, you can <code>M-x slime-connect</code> from your Emacs and have all the code at your disposal.</p><p>There is, however, an issue that you must be aware of when using <code>lein swank</code>: Leiningen uses a custom class loader — [AntClassLoader][3] to be more precise — to load the Java classes referenced by the code. Despite being a seemingly irrelevant thing — an implementation detail — this can bite you in a number of most surprising and obscure ways. Try evaluating the following code in a Leiningen REPL:</p><pre><code class="hljs clojure">(<span class="hljs-name"><span class="hljs-built_in">str</span></span> (<span class="hljs-name">.decode</span>
(<span class="hljs-name">java.nio.charset.Charset/forName</span> <span class="hljs-string">"ISO-8859-2"</span>)
(<span class="hljs-name">java.nio.ByteBuffer/wrap</span>
(<span class="hljs-name"><span class="hljs-built_in">into-array</span></span> Byte/TYPE (<span class="hljs-name"><span class="hljs-built_in">map</span></span> byte [<span class="hljs-number">-79</span> <span class="hljs-number">-26</span> <span class="hljs-number">-22</span>])))))
<span class="hljs-comment">;=> "???"</span>
</code></pre><p>The same code evaluated in a plain Clojure REPL will give you <code>"ąćę"</code>, which is a string represented in ISO-8859-2 by the three bytes from the above snippet.</p><p>Whence the difference? Internally, each charset is represented as a unique instance of its specific class. These are loaded lazily as needed by the <code>Charset/forName</code> method. Presumably, the system class loader is used for that, and somewhere along the way a <code>SecurityException</code> gets thrown and caught.</p><p>Note also that there are parts of Java API which use the charset lookup under the hood and are thus vulnerable to the same problem, for example <code>Reader</code> constructors taking charset names. If you use <code>clojure.contrib.duck-streams</code>, then rebinding <code>*default-encoding*</code> will not work from a Leiningen REPL. Jars and überjars produced by Leiningen should be fine, though.</p></div>tag:blog.danieljanus.pl,2010-02-16:post:downcasingDowncasing strings2010-02-16T00:00:00Z<div><p>I just needed to convert a big (around 200 MB) text file, encoded in UTF-8 and containing Polish characters, all into lowercase. <code>tr</code> to the rescue, right? Well, not quite.</p><pre><code>$ echo ŻŹŚÓŃŁĘĆĄ | tr A-ZĄĆĘŁŃÓŚŹŻ a-ząćęłńóśźż
żźśóńłęćą
</code></pre><p>Looks reasonable (apart from the fact that I need to specify an explicit character mapping — it would be handy to just have a lcase utility or suchlike); but here’s what happens on another random string:</p><pre><code>$ echo abisyński | tr A-ZĄĆĘŁŃÓŚŹŻ a-ząćęłńóśźż
abisyŅski
</code></pre><p>I was just about to report this as a bug, when I spotted the following in the manual:</p><blockquote><p>Currently <code>tr</code> fully supports only single-byte characters. Eventually it will support multibyte characters; when it does, the <code>-C</code> option will cause it to complement the set of characters, whereas <code>-c</code> will cause it to complement the set of values.</p></blockquote><p>Turns out some of the basic tools don’t support multibyte encodings. <code>dd conv=lcase</code>, for instance, doesn’t even pretend to touch non-ASCII letters, and perl’s <code>tr</code> operator likewise fails miserably even when one specifies <code>use utf8</code>.</p><p>This is a sad, sad state of affairs. It’s 2010, UTF-8 has been around for seventeen years, and it’s still not supported by one of the core operating system components as other encodings are becoming more and more obsolete. I’m dreaming of the day my system uses it internally for everything.</p><p>Fortunately, not everything is broken. Gawk, for example, works:</p><pre><code>$ echo koŃ i żÓłw | gawk '{ print tolower($0); }'
koń i żółw
</code></pre><p>and so does sed.</p><p><em>Update 2010-04-04:</em> I should have been more specific. The above rant applies to the GNU tools (<code>tr</code> and <code>dd</code>) as found in most Linux distributions; other versions can be more featureful. As <a href="http://alexott.net/">Alex Ott</a> points out in an email comment, tr on OS X works as expected for characters outside of ASCII, and also supports character classes as in <code>tr '[:upper:]' '[:lower:]'</code>. This is yet another testimony to general high quality of Apple software; in this particular case, though, it may well be a direct effect of OS X’s BSD heritage. Does it work on *BSD?</p></div>tag:blog.danieljanus.pl,2010-02-10:post:clojure-setClojure SET2010-02-10T00:00:00Z<div><p>I’ve just taken a short breath off work to put <a href="http://github.com/nathell/setgame">some code</a> on GitHub that I had written over one night some two months ago. It is an implementation of the <a href="http://en.wikipedia.org/wiki/Set_(game)">Set</a> game in Clojure, using Swing for GUI.</p><p>I do not have time to clean up or comment the code, so I’m leaving it as is for now; however, I hope that even in its current state it can be of interest, especially for Clojure learners.</p><p>Some random notes on the code:</p><ul><li><span>Clojure is concise! The whole thing is just under 250 lines of code, complete with game logic and the GUI. Of these, the logic is about 50 LOC. Despite this it reads clearly and has been a pleasure to write, thanks to Clojure’s supports for sets as a data structure (in vein of the game’s title and theme).</span></li><li><span>There are no graphics included. All the drawing is done in the GUI part of code (I’ve replaced the canonical squiggle shape by a triangle and stripes by gradients, for the sake of easier drawing).</span></li><li><span>I’ve toyed around with different Swing layout managers for this game. Back in the days when I wrote in plain Java, I used to use <a href="https://tablelayout.dev.java.net/">TableLayout</a>, but it has a non-free license; <a href="http://www.jgoodies.com/freeware/forms/">JGoodies Forms</a> is also nice, but has a slightly more complicated API (and it’s an additional dependency, after all). In the end I’ve settled with the standard GridBagLayout, which is similar in spirit to those two, but requires more boilerplate to set up. As it turned out, simple macrology makes it quite pleasurable to use; see <code>add-gridbag</code> in the code for details.</span></li><li><span>Other things of interest might be my function to randomly shuffle seqs, which strikes a nice balance between simplicity/conciseness of implementation and randomness; and a useful debugging macro.</span></li></ul><p>Comments?</p></div>tag:blog.danieljanus.pl,2010-01-18:post:reactivationReactivation (and some ramblings on my blogging infrastructure)2010-01-18T00:00:00Z<div><p>This blog has not seen content updates in more than a year. Plenty of things can happen in such a long period, and in fact many aspect of my life have seen major changes over this time. I’m not, however, going to write a lengthy post about all that right now. Instead, I just would like to announce the reactivation of the blog.</p><p>You might have noticed that many things have changed. First, the blog has a new address: <a href="http://blog.danieljanus.pl"><code>http://blog.danieljanus.pl</code></a>; the address of the RSS feed has also changed and is now <a href="http://blog.danieljanus.pl/index.rss"><code>http://blog.danieljanus.pl/index.rss</code></a> — please update your readers!</p><p>Probably the most important change is that you now may post comments under the entries, even though this blog continues to be just a bunch of static HTML pages. This is possible thanks to the <a href="http://disqus.com/">Disqus</a> service. I wonder whether it will encourage people to give feedback: I have received very few email comments since I started blogging. Also, the static calendar at the top of each page is gone, replaced by a bunch of links to archive posts.</p><p>I have long been considering changing <a href="http://www.blosxom.com/">Blosxom</a> to something else. The main reason for such a step is that it’s written in Perl, which makes it particularly hard to debug upon encountering an unexpected behaviour. The single most irritating thing was that Blosxom would unexpectedly change the date of a post that was edited (which did not let me fix typos and other glitches); I found a patch for this somewhere, but lost it.</p><p>On the other hand, I really liked — and still like — Blosxom’s minimalistic approach and the ease of adding posts. (The very idea of installing a monstrosity such as Wordpress, with its gazillion of features I don’t need, posts kept in a database and what not, makes me feel dizzy.) I fiddled for a while with the thought of reimplementing Blosxom in Common Lisp, but that turned out to be a more time-consuming project than it initially seemed. So when I found <a href="http://blosxom.ookee.com/">The Unofficial Blosxom User Group</a> and learned that, contrary to my belief, Blosxom is still actively maintained and has a thriving community, I ended up staying with the original Perl version, refining my installation so that it no longer gets in the way (<a href="http://blosxom.ookee.com/blog/help/howto_update_posts_without_making_the_date_change.html">this FAQ entry</a> did the trick). I also rewrote all my source text files to <a href="http://daringfireball.net/projects/markdown/">Markdown</a>, which made them vastly more readable and easy to edit, updating links and adding short followup notes where appropriate, but otherwise leaving old entries as they were.</p><p>I’d like to thank <a href="http://www.3ofcoins.net/2010/01/08/revive-the-blog-project-52/">Maciek Pasternacki</a> for inspiring me to finally get around to this. While my plans are not as ambitious as his — I am not courageous enough to publicly prove my perseverance, so my blogging will likely continue to be irregular — I plan to write more (having accumulated many ideas for blog posts) and I hope the periods of silence will be much shorter than hitherto.</p><p>I would like to take this opportunity to wish my readers all the best in the New Year!</p></div>tag:blog.danieljanus.pl,2009-01-02:post:google-booksGoogle Books2009-01-02T00:00:00Z<div><p>Yesterday, upon a midnight dreary, while I pondered, weak and weary, over <a href="http://mitpress.mit.edu/algorithms/">a renowned volume of the olden lore</a> (and specifically, upon one of the problems contained in the Polish translation of the first edition), I suddenly felt a need to consult the original version, to check whether there are no mistranslations or unincluded corrections for my copy. So I headed for <a href="http://books.google.com/">Google Book Search</a>, and apart from finding what I needed, I followed a link that sounded interesting. Quoth the link, <a href="http://books.google.com/googlebooks/agreement/">“Groundbreaking Agreement”</a>.</p><p>Basically, what it all boils to is two pieces of news — you guessed it, a good one and a bad one. The good news is that Google have come to agreement with several major U.S. publishers that will allow them to provide online access to digitized copies of out-of-print but still copyrighted books. <em>Lots</em> of books, and even though the service is not going to be free, that means all this richness will be at the fingertips — no more need to travel half the world to the Library of Congress to get one of the rare copies we’re after. Sounds cool, huh? Well, here comes the bad news: it will only be available to U.S. citizens.</p><p>Or will it?</p><p>I wonder how are they going to check for this precondition. IP-based geolocalization springs to mind. And unless they blacklist some IPs or restrict the credit cards used for payment, all I will need is some proxy on some server physically in the U.S. Say, a shell account on someone’s Linux box. I remember reading a Polish blog post about gaining access to American-exclusive content of some website (<a href="http://last.fm">last.fm</a> I believe it was) in a similar way. Hmm, hmm. We will see.</p><p>So, anyone got a shell account to spare?</p></div>tag:blog.danieljanus.pl,2008-12-18:post:fighting-procrastinationanti-procrastination.el2008-12-18T00:00:00Z<div><p>Fighting procrastination has been my major concern these days. I’ve devised a number of experimental tools to help me with that. One of them is called <a href="http://bach.ipipan.waw.pl/~nathell/projects/snafu.php">snafu</a> and can generate reports of your activity throughout the whole day of work. It’s in a preliminary state, but works (at least since I’ve found and fixed a long-standing bug in it which would cause it to barf every now and then), and I already have a number of ideas for its further expansion.</p><p>Reports alone, however, do not quite muster enough motivation for work. I’m doing most of my editing/programming work in Emacs, so yesterday I grabbed the Emacs Lisp manual and came up with a couple of extra lines at the end of my <code>.emacs</code>.</p><pre><code class="hljs lisp"><span class="hljs-comment">;;; Written by Daniel Janus, 2008/12/18.</span>
<span class="hljs-comment">;;; This snippet is placed into the public domain. Feel free</span>
<span class="hljs-comment">;;; to use it in any way you wish. I am not responsible for</span>
<span class="hljs-comment">;;; any damage resulting from its usage.</span>
(<span class="hljs-name">defvar</span> store-last-modification-time <span class="hljs-literal">t</span>)
(<span class="hljs-name">defvar</span> last-modification-time <span class="hljs-literal">nil</span>)
(<span class="hljs-name">defun</span> mark-last-modification-time (<span class="hljs-name">beg</span> end len)
(<span class="hljs-name">let</span> ((<span class="hljs-name">b1</span> (<span class="hljs-name">substring</span> (<span class="hljs-name">buffer-name</span> (<span class="hljs-name">current-buffer</span>)) <span class="hljs-number">0</span> <span class="hljs-number">1</span>)))
(<span class="hljs-name">when</span> (<span class="hljs-name">and</span> store-last-modification-time
(<span class="hljs-name">not</span> (<span class="hljs-name">string=</span> b1 <span class="hljs-string">" "</span>))
(<span class="hljs-name">not</span> (<span class="hljs-name">string=</span> b1 <span class="hljs-string">"*"</span>)))
(<span class="hljs-name">setq</span> last-modification-time (<span class="hljs-name">current-time</span>)))))
(<span class="hljs-name">add-hook</span> 'after-change-functions 'mark-last-modification-time)
(<span class="hljs-name">defun</span> write-lmt ()
(<span class="hljs-name">setq</span> store-last-modification-time <span class="hljs-literal">nil</span>)
(<span class="hljs-name">when</span> last-modification-time
(<span class="hljs-name">with-temp-file</span> <span class="hljs-string">"/tmp/emacs-lmt"</span>
(<span class="hljs-name">multiple-value-bind</span> (<span class="hljs-name">a</span> b c) last-modification-time
(<span class="hljs-name">princ</span> a (<span class="hljs-name">current-buffer</span>))
(<span class="hljs-name">terpri</span> (<span class="hljs-name">current-buffer</span>))
(<span class="hljs-name">princ</span> b (<span class="hljs-name">current-buffer</span>)))))
(<span class="hljs-name">setq</span> store-last-modification-time <span class="hljs-literal">t</span>))
(<span class="hljs-name">run-at-time</span> <span class="hljs-literal">nil</span> <span class="hljs-number">1</span> 'write-lmt)
</code></pre><p>Every second (to change that to every 10 seconds, change the <code>1</code> to <code>10</code> in the last line) it creates a file named <code>/tmp/emacs-lmt</code> which contains the time of last modification of any non-system buffer.</p><p>That’s all there is to it, at least on the Emacs side. The other part is a simple shell script, which uses <a href="http://www.mplayerhq.hu/">MPlayer</a> to display a nag-screen for five seconds, and then give me some time to start doing anything useful before nagging me again:</p><pre><code class="hljs bash"><span class="hljs-meta">#!/bin/bash</span>
TIMEOUT=300
<span class="hljs-keyword">while</span> <span class="hljs-literal">true</span>; <span class="hljs-keyword">do</span>
<span class="hljs-built_in">cat</span> /tmp/emacs-lmt | (
<span class="hljs-built_in">read</span> a; <span class="hljs-built_in">read</span> b;
c=<span class="hljs-string">"`date +%s`"</span>;
<span class="hljs-built_in">let</span> x=c-65536*a-b;
<span class="hljs-keyword">if</span> <span class="hljs-built_in">test</span> <span class="hljs-variable">$x</span> -gt <span class="hljs-variable">$TIMEOUT</span>;
<span class="hljs-keyword">then</span> mplayer -fs <span class="hljs-variable">$HOME</span>/p.avi;
<span class="hljs-built_in">sleep</span> 15;
<span class="hljs-keyword">fi</span>)
<span class="hljs-built_in">sleep</span> 1
<span class="hljs-keyword">done</span>
</code></pre><p>The nag-screen in my case is an animation which I’ve created using MEncoder from a single frame which looks <a href="http://bach.ipipan.waw.pl/~nathell/procrastination.png">like this</a>. Beware the expletives! (This is one of the few cases I find their usage justified, as the strong message bites the conscience more strongly.)</p><p>I’ve only been testing this setup for one day, but so far it’s working flawlessly: I got more done yesterday than for the two previous days combined, and that’s excluding the hour or so that took me to write these snippets.</p><p>If anyone else happens to give it a try, I’d love to hear any comments.</p></div>tag:blog.danieljanus.pl,2008-09-23:post:immensely-powerful-toolThe immensely powerful tool2008-09-23T00:00:00Z<div><p>A pen and a sheet of paper are simple utilities; but there lies vast and sheer power in them that I was not aware of. Up until now. So what can they be used for that one might possibly not realize?</p><p>Short answer: serializing the stream of consciousness.</p><p>Yes, it’s simple, and you may laugh at me now. I myself am a little amazed why I haven’t noticed this before. But this answer lends itself to another question: what good is this serialization, and what exactly do I mean by it, anyway? And the answer to <em>that</em> is a little longer. So here goes.</p><p>I’m one of the people who tend to have problems with concentrating when thinking, especially when thinking hard. This is not to say that I am not capable of thinking hard: I am, but doing so requires a level of concentration that is tricky for me to exert for a prolonged period. (Unless, of course, I am in the state of absolute fascination, where this is taken care of subconsciously. But that’s another story.) More often than not, a tough problem requiring a significant amount of work just has to be dealt with. And then things start to distract attention. There is an itch to scratch, thoughts are shreds, each one pertaining to a tiny bit of the problem, but intertwined with hundreds of other bits of other problems, forming a dense, tangled web, hard to navigate over, and jumping fast from one to another, it becomes more and more unclear what’s next.</p><p>So what can one do? One way is to grab a writing device and just <em>start writing</em>. Running text is linear in nature, so you end up traversing the thought graph depth-first and writing down each thought as you traverse its node. And what’s more, translating ideas to written language <em>slows you down</em>, which is a <a href="http://www.catb.org/jargon/html/G/Good-Thing.html">Good Thing</a> because it makes you see your way through the graph more consciously. It might take you longer to walk from point A to point B than to drive there by car, but definitely you will see more of the landscape as you go. Arriving at the final destination, or simply putting down the pen because enough thoughts have been collected and serialized (there’s never really any end of the stream), makes you end up with a half-product: an unsmithed lump of ore out of which you can forge ingots.</p><p>But why a pen and paper, as opposed to, say, a text editor? I think any writing utensil would work to some extent, but for me this seems to be the best option, for several reasons. First of all, I can type on the keyboard much faster than I can write legibly by hand, so this further slows down the pace (which is a Good Thing as we have observed already).</p><p>Second, there is something magical in handwriting which a text editor will never be able to achieve: it’s hard to describe. But the net effect is a very evident focus on Here and Now, the pen moving across the paper, the sheet filling up with more and more lines of script. This environment is naturally single-tasked: no Alt-Tab to press to switch to another terminal, no blinking icon of an instant-messaging program (unless a phone happens to ring). This causes synergy with the concentration caused by serializing thoughts.</p><p>If you have never tried this approach, feel free to do so. Although I cannot guarantee it will work for you, it certainly does work for me.</p></div>tag:blog.danieljanus.pl,2008-08-09:post:small-lispWho said Common Lisp programs cannot be small?2008-08-09T00:00:00Z<div><p>So, how much disk space does your average CL image eat up? A hundred megs? Fifty? Twenty? Five, perhaps, if you’re using LispWorks with a tree-shaker? Well then, how about this?</p><pre><code>[nathell@chamsin salza2-2.0.4]$ ./cl-gzip closures.lisp test.gz
[nathell@chamsin salza2-2.0.4]$ gunzip test
[nathell@chamsin salza2-2.0.4]$ diff closures.lisp test
[nathell@chamsin salza2-2.0.4]$ ls -l cl-gzip
-rwxr-xr-x 1 nathell nathell 386356 2008-08-09 11:08 cl-gzip
</code></pre><p>That’s right. A standalone executable of a mini-gzip, written in Common Lisp, taking up <em>under 400K!</em> And it only depends on glibc and GMP, which are available by default on pretty much every Linux installation. (This is on a 32-bit x86 machine, by the way).</p><p>I used the most recent version of <a href="http://ecls.sourceforge.net/">ECL</a> for compiling this tiny example. The key to the size was configuring ECL with <code>--disable-shared --enable-static CFLAGS="-Os -ffunction-sections -fdata-sections" LDFLAGS="-Wl,-gc-sections"</code>. This essentially gives you a poor man’s tree shaker for free at a linker level. And ECL in itself produces comparatively tiny code.</p><p>I built this example from <a href="http://www.xach.com/lisp/salza2">Salza2</a>’s source by loading the following code snippet:</p><pre><code class="hljs lisp">(<span class="hljs-name">defvar</span> salza
'(<span class="hljs-string">"package"</span> <span class="hljs-string">"reset"</span> <span class="hljs-string">"specials"</span>
<span class="hljs-string">"types"</span> <span class="hljs-string">"checksum"</span> <span class="hljs-string">"adler32"</span> <span class="hljs-string">"crc32"</span> <span class="hljs-string">"chains"</span>
<span class="hljs-string">"bitstream"</span> <span class="hljs-string">"matches"</span> <span class="hljs-string">"compress"</span> <span class="hljs-string">"huffman"</span>
<span class="hljs-string">"closures"</span> <span class="hljs-string">"compressor"</span> <span class="hljs-string">"utilities"</span> <span class="hljs-string">"zlib"</span>
<span class="hljs-string">"gzip"</span> <span class="hljs-string">"user"</span>))
(<span class="hljs-name">defvar</span> salza2
(<span class="hljs-name">mapcar</span> (<span class="hljs-name">lambda</span> (<span class="hljs-name">x</span>) (<span class="hljs-name">format</span> <span class="hljs-literal">nil</span> <span class="hljs-string">"~A.lisp"</span> x))
salza))
(<span class="hljs-name">defvar</span> salza3
(<span class="hljs-name">mapcar</span> (<span class="hljs-name">lambda</span> (<span class="hljs-name">x</span>) (<span class="hljs-name">format</span> <span class="hljs-literal">nil</span> <span class="hljs-string">"~A.o"</span> x))
salza))
(<span class="hljs-name">defun</span> build-cl-gzip ()
(<span class="hljs-name">dolist</span> (<span class="hljs-name">x</span> salza2)
(<span class="hljs-name">load</span> x)
(<span class="hljs-name">compile-file</span> x <span class="hljs-symbol">:system-p</span> <span class="hljs-literal">t</span>))
(<span class="hljs-name">c</span><span class="hljs-symbol">:build-program</span>
<span class="hljs-string">"cl-gzip"</span>
<span class="hljs-symbol">:lisp-files</span> salza3
<span class="hljs-symbol">:epilogue-code</span>
'(progn
(in-package :salza2)
(gzip-file (second (si::command-args))
(third (si::command-args))))))
(<span class="hljs-name">build-cl-gzip</span>)
</code></pre><p>(Sadly enough, there’s no ASDF in here. I have yet to figure out how to leverage ASDF to build small binaries in this constrained environment.)</p><p>This gave me a standalone executable 1.2 meg in size. I then proceeded to compress it with <a href="http://upx.sourceforge.net/">UPX</a> (with arguments <code>--best --crp-ms=999999</code>) and got the final result. How cool is that?</p><p>I am actively looking for a new job. If you happen to like my writings and think I might be just the right man for the team you’re building up, please feel free to consult my <a href="http://bach.ipipan.waw.pl/~nathell/cv-en.pdf">résumé</a> or pass it on.</p><p><em>Update 2010-Jan-17</em>: the above paragraph is no longer valid.</p></div>tag:blog.danieljanus.pl,2008-06-23:post:you-win-someYou win some, you lose some, you talk some2008-06-23T00:00:00Z<div><p>After my <a href="http://blog.danieljanus.pl/im-not-playing-this-stupid-game-anymore.html">shameful performance</a> in the previous tournament, this weekend saw my greatest achievement in tournament Scrabble to date: that of advancing to the quarterfinals of the Cup of Poland. For the record, <a href="http://www.pfs.org.pl/turnieje/2008/w080622.txt">here</a> are the final standings. In the quarterfinal, I lost both games to Tomasz Zwoliński (the former Champion of Poland), who went on to win the Cup.</p><p>On Thursday, I will be delivering a presentation about the dark side of programming: error handling and how to cope up with Murphy’s law. The talk will last around 30 minutes and be held within <a href="http://aulapolska.pl/">TechAula</a>, a place to hear about exciting and revolutionary technologies in software engineering. Feel invited to register and show up.</p><p>(Postscriptum 11 July: By public demand, the slides from my talk are now <a href="http://danieljanus.pl/2008-aula-errorhandling.pdf">available for download</a>.)</p></div>tag:blog.danieljanus.pl,2008-06-23:post:cl-morfeuszcl-morfeusz: A ninety minutes’ hack2008-06-23T00:00:00Z<div><p>Here’s what I came up with today, after no more than 90 minutes of coding (complete with comments and all):</p><pre><code>MORFEUSZ> (morfeusz-analyse "zażółć gęślą jaźń")
((0 1 "zażółć" "zażółcić" "impt:sg:sec:perf")
(1 2 "gęślą" "gęśl" "subst:sg:inst:f")
(2 3 "jaźń" "jaźń" "subst:sg:nom.acc:f"))
</code></pre><p>This is <a href="http://danieljanus.pl/code/morfeusz.lisp">cl-morfeusz</a> in action, a Common Lisp interface to <a href="http://nlp.ipipan.waw.pl/~wolinski/morfeusz/">Morfeusz</a>, the morphological analyser for Polish.</p><p>It’s a single Lisp file, so there’s no ASDF system definition or asdf-installability for now. I’m not putting it under version control, either. Or, should I say, not yet. When I get around to it, I plan to write a simple parser and write a Polish-language version of <a href="http://en.wikipedia.org/wiki/Colossal_Cave_Adventure">the text adventure that started it all</a>.</p><p>Meanwhile, you may use cl-morfeusz for anything you wish (of course, as long as you comply with Morfeusz’s license). Have fun!</p><p><em>Update 2010-Jan-17</em>: With the advent of UTF-8 support in CFFI, the ugly workarounds in the code are probably no longer necessary; I don’t have time to check it right now, though.</p></div>tag:blog.danieljanus.pl,2008-06-19:post:another-metapostAnother metapost2008-06-19T00:00:00Z<div><p>No, I am not going to write about <a href="http://foundry.supelec.fr/projects/metapost/">the programming language for generating vector graphics</a>. This is not a real post, but rather a note to self to write ones on certain topics once I get ready for that. And as for today’s title, I just couldn’t resist the pun. ;-)</p><p>I’ve been rewriting the Poliqarp Java (GUI) client for the last two weeks or so. The point is to convert it to use <a href="http://bach.ipipan.waw.pl/~nathell/blog/poliqarp-new-protocol.html">the new protocol</a>, and to take the opportunity of turning a messy, kludgey and bit-rotting pile of code into a neatly decoupled, cleanly designed, robust and comprehensible utility. And the further I get, the more I see how much it’s worth it. I strongly believe that at the end of this path, upon remergence with the mainline, Poliqarp will become better than ever before.</p><p>Thus, when I have something to show (I hope to deliver a preliminary working version, though not yet feature-complete, within the upcoming week or so), I will probably brag about the design solutions I’ve taken. While I’m at it, I will possibly also write the long-delayed description of the new build system.</p><p>If there is anybody out there besides me who actually cares for that, stay tuned!</p></div>tag:blog.danieljanus.pl,2008-06-11:post:mind-the-symlinksToday’s lesson: Mind the symlinks2008-06-11T00:00:00Z<div><p>Probably every day I keep learning new things, without even realizing it most of the time. The vast majority of them are minor or even tiny tidbits of knowledge; but even these might be worth noting down from time to time, especially when they are tiny pitfalls I’d fallen into and spent a couple of minutes getting out. By sharing them, I might hopefully prevent someone else for slipping and falling in.</p><p>So here’s a simple Unix question: If you enter a subdirectory of the current directory and back to <code>..</code>, where will you end up? The most obvious answer is, of course, “in the original directory”, and is mostly correct. But is it always? Let’s see.</p><pre><code>nathell@breeze:~$ pwd
/home/nathell
nathell@breeze:~$ cd foobar
nathell@breeze:~/foobar$ cd ..
nathell@breeze:~$ pwd
/home/nathell
</code></pre><p>So the hypothesis seems to be right. But let’s try doing this in Python, just for the heck of it:</p><pre><code>nathell@breeze:~$ python
Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print os.getcwd()
/home/nathell
>>> os.chdir("foobar")
>>> os.chdir("..")
>>> print os.getcwd()
/var
</code></pre><p>Whoa, hang on! What’s that <code>/var</code> doing there? Of course the one thing I didn’t tell you is that <code>foobar</code> is not really a directory, but rather a symlink pointing to one (<code>/var/log</code> in this case).</p><p>The corollary is that the shell builtin <code>cd</code> is <em>not the same</em> as Unix <code>chdir()</code> (it is easily checked that both Perl and C exhibit the same behaviour). In fact, the shell builtin has an oft-forgotten command-line switch, <code>-P</code>, which causes it to follow physical instead of logical path structure.</p><p>On a closing note: I have somewhat neglected the blog throughout the previous month, but I hope to revive it soon. It is not unlikely that such irregularities will recur.</p></div>tag:blog.danieljanus.pl,2008-05-19:post:recently-read-1Recently read #1: Akhmatova meets Bashō (Vasil Bykaŭ, “The Wall”)2008-05-19T00:00:00Z<div><p>(Introductory note: This post marks the beginning of a new series on this blog, aptly titled “Recently read.” Every now and then I will try to verbalize afterthoughts inspired by the books I happen to read, and post them here. I hope these recommendations or anti-recommmendations might turn out to be useful for someone.)</p><blockquote><p>Give me<br> a kiss to build a dream on,<br> and my imagination<br> will thrive upon that kiss;<br> sweetheart,<br> I ask no more than this —<br> a kiss to build a dream on.</p></blockquote><p>Thusly starts the <a href="http://www.youtube.com/watch?v=e3PXiV95kwA">Fallout 2 intro</a> — a mini-movie that can be considered a piece of art in its own right. Louis Armstrong sings these words in an abandoned underground cinema, wherein a movie is displayed, touching on nostalgia for pre-War times as well as severe dangers that lurk on the surface of the earth. And then come these words…</p><p><a href="http://www.youtube.com/watch?v=_mcJAI6oRYY">“War, war never changes.”</a></p><p>Just like throughout the entire Fallout saga, these words reverberated in my mind as I read “The Wall,” a collection of short stories by Vasil Bykaŭ, the late Belarussian writer.</p><p>For the war indeed does not change. And wherever it appears, it carries around such an amount of destruction and utter wrongness that it is next to unimaginable for a generation grown up in a relatively peaceful place and time such as ours. In fact, even the words “utter wrongness” do not do justice to what was once an unescapable reality. There is only one way to find appropriate words: to show it, show it without overlooking anything, show it dryly and aloofly in all its hideousness.</p><p>And Bykaŭ does. There is not a single word of moralizing in these stories. There are no high words, and barely even a human thought beyond fear for life. There is pure depiction; and yet every word in this depiction stands firm and cannot be removed without losing the level of detail called for. This induces associations with Anna Akhmatova and her famous “Requiem,” which begins as follows:</p><blockquote><p>Это было, когда улыбался<br> только мертвый, спокойствию рад.</p></blockquote><p>“Это было.” “It happened.” These simple words are immensely powerful. And the same two words, unspoken, echo throughout the entire book. It happened, and it was like this. Nothing more can, or should, be told.</p><p>I’ve mentioned the level of detail; this aspect of these stories deserves longer comment. It is imminent that they would not be quite as powerful were it not for the detail. One can almost sense the chill of a dawn rising up above some godforsaken trench somewhere on the battlefront. Or shudder at the cold dampness of the soil inside it. Or smell the stench of decay rising above a corpse shot several days ago. Or feel the almost palpable fear floating in the crossfire of danger, one on the enemy side, the other shaped as one’s own commandment. Or the gloom with which a small group of soldiers sets out to dig their final resting place before committing suicide, lest worse fate befall them.</p><p>These stories are almost “haikuistic,” so to speak, in that each one of them resembles a very thinly cut and faithfully portrayed slice of reality from which there is no escape. This shows most strongly in case of the shortest ones, like “The Hill,” which are just several pages long, but it arguably holds even for the longest text in the book, the opening novella, “Love Me, Soldier.” (In which, by the way, Falloutesque associations are particularly strong: imagine a Belarussian sergeant who finds a fellow countrygirl hiding in a village in Austria, at the very end of the war, and falls in love with her. Doesn’t that sound like “a kiss to build a dream on?” Well, in Bykaŭ’s world, good dreams never come true.)</p><p>I quoted Akhmatova’s “it happened.” Yet, perhaps, the most striking and saddening impression from “The Wall” as a whole, and one that sets it apart from other war literature, is that this should really read “it still happens.” For the war has ended, but it takes long for a nation to recover from the scars it left; especially the Belarussian nation, who have been held captive by various regimes for too long and have never actually experienced freedom that we take for granted. Once a wound has been healed, it is all too easy to reopen it. And fear for speaking one’s own language remains the same, war or no war.</p><p>No wonder Bykaŭ’s writings are still censored in his homeland. Thanks to the Internet, though, <a href="http://kamunikat.org/4108.html">the full Polish text</a> of all the stories is available online. Highly recommended.</p></div>tag:blog.danieljanus.pl,2008-05-05:post:inward-ripenessInward ripeness2008-05-05T00:00:00Z<div><blockquote><p>How soon hath Time, the subtle thief of youth,<br> Stol’n on his wing my three and twentieth year!<br> My hasting days fly on with full career,<br> But my late spring no bud or blossom shew’th.<br> Perhaps my semblance might deceive the truth<br> That I to manhood am arrived so near;<br> And inward ripeness doth much less appear,<br> That some more timely-happy spirits endu’th.<br> Yet be it less or more, or soon or slow,<br> It shall be still in strictest measure ev’n<br> To that same lot, however mean or high,<br> Toward which Time leads me, and the will of Heav’n:<br> All is, if I have grace to use it so,<br> As ever in my great Task-Master’s eye.</p></blockquote><p>John Milton is said to have composed this sonnet on his twenty-fourth birthday, and his thoughts (including, but not limited to, the criticism of the achievements so far) are very much in line with mine on my very own 24th birthday. One wonders what is the programming equivalent of “Paradise Lost.”</p></div>tag:blog.danieljanus.pl,2008-04-30:post:cl-netstringscl-netstrings2008-04-30T00:00:00Z<div><p>I’ve just packaged up the Common Lisp netstring handling code that I <a href="http://blog.danieljanus.pl/hacking-away-with-json-rpc.html">wrote a week ago</a> into a neat library. Unsurprisingly enough, it is called cl-netstrings and has its own <a href="http://github.com/nathell/cl-netstrings">home on the Web</a>. It’s even asdf-installable! I wonder whether this one turns out to be useful for anybody besides me…</p><p>The other thing I’ve been working on is a new build system for Poliqarp. But that’s the story for another post — most probably I will write about it when it gets out of a state of constant flux.</p><p><em>Update 2010-Jan-17</em>: cl-netstrings is now hosted on GitHub; I’ve updated the link.</p></div>tag:blog.danieljanus.pl,2008-04-25:post:best-os-everBest OS ever2008-04-25T00:00:00Z<div><p>If you are reading this on a box that does not have an impressive amount of RAM (say, 512 MB or less) and is running a fairly recent Linux, then for goodness sake, drop everything you are doing right now and follow the instructions in this entry. I’m going to show you how to make your system use the memory in a more efficient way, <em>yielding an effect almost equivalent to increasing its amount — with no expenses whatsoever!</em> Sounds good? Read on.</p><p>You see, there’s this Linux kernel module for kernels 2.6.17 and up (that’s what the phrase fairly recent in the previous paragraph macroexpands to), called <a href="http://code.google.com/p/compcache">Compcache</a>. It works by slicing out a contiguous chunk of your RAM (25% by default, but it’s settable, of course) and setting it up as a swap space with highmost priority. The trick is that pages that are swapped out to this area are compressed using the <a href="http://www.oberhumer.com/opensource/lzo/">LZO</a> algorithm, which provides very fast compression/decompression while maintaining a decent compression ratio. In this way, more unused pages can fit in memory, and less of them are swapped out to disk, which can considerably cut down disk swap usage. I’ve enabled it in my system and it doesn’t seem to cause any problems, while providing a visible efficiency boost. Here’s how I did it on a freshly-installed <a href="http://www.ubuntu.com/products/whatisubuntu/804features/">Ubuntu Hardy</a>:</p><ul><li><span>I installed the Ubuntu package <code>build-essential</code>, then downloaded Compcache from its site, extracted it, entered its directory and compiled it by saying make. So far, so easy.</span></li><li><span>Unfortunately, one cannot say <code>make install</code> — creating a flexible cross-distro <code>install</code> target is admittedly hard. So I installed it by hand, ensuring that my system enables it automatically on boot-up.</span></li><li><span>I created a directory <code>/lib/modules/2.6.24-16-generic/ubuntu/compcache/</code> and copied the four kernel modules (<code>compcache.ko</code>, <code>lzo1x_compress.ko</code>, <code>lzo1x_decompress.ko</code>, and <code>tlsf.ko</code>) created by the compilation to that directory.</span></li><li><span>Next, I ran <code>depmod -a</code> to make the modules loadable by <code>modprobe</code>.</span></li><li><span>I edited the file <code>/etc/modules</code> and added a line at the end, containing the single word <code>compcache</code>.</span></li><li><span>I copied the shell scripts <code>use_compcache.sh</code> and <code>unuse_compcache.sh</code> that come with compcache to <code>/usr/local/bin</code>.</span></li><li><span>I created an executable script <code>/etc/init.d/compcache</code> with the following contents:</span></li></ul><pre><code class="hljs bash"><span class="hljs-meta">#!/bin/sh</span>
<span class="hljs-keyword">case</span> <span class="hljs-string">"<span class="hljs-variable">$1</span>"</span> <span class="hljs-keyword">in</span>
start)
/usr/local/bin/use_compcache.sh ;;
stop)
/usr/local/bin/unuse_compcache.sh ;;
<span class="hljs-keyword">esac</span>
</code></pre><ul><li><span>The last step was to create a symlink <code>/etc/rc2.d/S02compcache</code> pointing to that script.</span></li></ul><p>I then rebooted the system and verified that the new swapspace is in use:</p><pre><code class="hljs bash">nathell@chamsin:~$ <span class="hljs-built_in">cat</span> /proc/swaps
Filename Type Size Used Priority
/dev/sdb2 partition 996020 0 -1
/dev/ramzswap0 partition 128896 111396 100
</code></pre><p>With the final release of Hardy installed on my main box and compcache optimizing its memory usage, I do not hesitate to call this combo the best OS I have ever had installed.</p><p>And no, I don’t own a Mac. :-/</p></div>tag:blog.danieljanus.pl,2008-04-24:post:forgettingForgetting2008-04-24T00:00:00Z<div><p>It has just occurred to me that the best way to throwing things out of one’s mind is to let it be absorbed by something else. I guess this is oft-overlooked fact, even though it seems to be quite obvious. In particular, forcing oneself not to think about something is <em>not</em> a wise strategy, since it leads to mental strain and thinking more and more, eventually yielding dejectedness that can be hard to get over.</p><p>And what could be more absorbing than debugging a <code>SIGSEGV</code> buried deeply in the innards of some library early in the morning? ;–)</p></div>tag:blog.danieljanus.pl,2008-04-24:post:hacking-away-with-json-rpcHacking away with JSON-RPC2008-04-24T00:00:00Z<div><p>Let’s try:</p><pre><code class="hljs lisp">(<span class="hljs-name">let</span> ((<span class="hljs-name">s</span> (<span class="hljs-name">socket-stream</span>
(<span class="hljs-name">socket-connect</span> <span class="hljs-string">"localhost"</span> <span class="hljs-number">10081</span>
<span class="hljs-symbol">:element-type</span> '(unsigned-byte <span class="hljs-number">8</span>)))))
(<span class="hljs-name">write-netstring</span> <span class="hljs-string">"{\"method\":\"ping\",\"params\":[],\"id\":1}"</span> s)
(<span class="hljs-name">finish-output</span> s)
(<span class="hljs-name">princ</span> (<span class="hljs-name">read-netstring</span> s))
(<span class="hljs-name">close</span> s))
<span class="hljs-comment">; { "result": "pong" }</span>
<span class="hljs-comment">; --> T</span>
</code></pre><p>Yay! This is Common Lisp talking to a <a href="http://json-rpc.org/">JSON-RPC</a> server written in C. This means that I have now the foundations for rewriting Poliqarp on top of JSON-RPC (according to the <a href="http://blog.danieljanus.pl/poliqarp-new-protocol.html">protocol spec</a> I have recently posted) up and running, and all that remains is to fill the remainder.</p><p>Well, to be honest, this is not exactly JSON-RPC. First off, as you might have noticed, the above snippet of code sends JSON-RPC requests as <a href="http://cr.yp.to/proto/netstrings.txt">netstrings</a>. This is actually intentional, and the reasons for adopting this encoding have been described in detail in the spec (it basically boils down to the fact that it greatly simplifies reading from and writing to network, especially in C). I wrote some crude code to handle netstrings in CL — now it occurred to me that it might actually be worthwhile to polish it up a little, write some documentation and put on <a href="http://www.cliki.net/">CLiki</a> as an asdf-installable library. I’ll probably get on to this quite soon.</p><p>Second, the resulting JSON object does not have all the necessary stuff. It contains the result, but not the error or id (as mandated by the <a href="http://json-rpc.org/wiki/specification">JSON-RPC spec</a>). This is actually a deficiency of the <a href="http://www.big-llc.com/software.jsp">JSON-RPC C library</a> I’m currently using. It places the burden of constructing objects that are proper JSON-RPC responses on the programmer, instead of doing that itself. This will be easy to sort out, however, because the library is small and adheres to the <a href="http://en.wikipedia.org/wiki/KISS_principle">KISS principle</a>. More of a problem is that the licensing of that library is unclear; I emailed the maintainers to explain the status.</p></div>tag:blog.danieljanus.pl,2008-04-22:post:eclm-2008ECLM 20082008-04-22T00:00:00Z<div><blockquote><p><em>What is there left for me to do in this life</em>?<br> <em>Did I achieve what I had set in my sights?</em><br> <em>Am I a happy man, or is this sinking sand?</em><br> <em>Was it all worth it?—was it all worth it?</em><br></p><p style="text-align: right;">— Queen</p>
</blockquote><p>Gusts of moderate wind are blowing at my face through the open window of a train from Amsterdam to Warsaw as I write these words. The last buildings of Amsterdam have vanished a while ago, giving ground to the damp, low countryside of the Netherlands — not quite fascinating sight to be watching — so I decided to fire up my laptop and write down some impressions while they are sharp and vivid — impressions from the <a href="http://www.weitz.de/eclm2008/">European Common Lisp Meeting</a> that was held in Amsterdam yesterday.</p><p>I was there with <a href="http://blog.pasternacki.net/">Maciek</a> and <a href="http://lisp.jogger.pl/">Richard</a>. Amsterdam did not receive us warmly, pouring some mild yet cold rain on us, but our hosts — Lispniks from the Hague, Gabriele and Victor from <a href="http://streamtech.nl/">Streamtech</a> — turned out to be really nice guys. I’m not going to go into a very detailed description of the social aspects of our trip, instead focusing on the conference itself. And that is definitely a topic worth talking about for a long time.</p><p>The first man to speak was Jeremy Jones of <a href="http://www.clozure.com/">Clozure Associates</a>, talking about InspireData and how they did it in Lisp. Although they also seem to be the people behind Clozure CL the implementation of Common Lisp, InspireData, the product their presentation was about, seems to have been written in LispWorks. It is a quite interesting application for browsing datasets in many interesting ways and draw conclusions for them. Jeremy started off with a demonstration presenting the key ideas of InspireData and what it can do, and this almost instantly hooked the attention of most of the gathered Lispers; mine, at least, definitely. First off, it seems to be quite a nice success story of a real-world application of Lisp, well worth learning about and mentioning where it deserves a mention. Second, one of its great features shown by the demo is that one can copy HTML tables from a Web browser and paste them as InspireData datasets. Given that Poliqarp now has statistical features and can export its data to HTML, I wonder whether it is possible to couple it with InspireData to interactively explore linguistic material in an absorbing way. That’s certainly a topic worthy of further research.</p><p>And last but not least, Jeremy outlined the points they did wrong and those they got right. Among those latter were two letters that now constitute a huge part of my professional life: <strong>QA</strong>. He just couldn’t emphasize enough how crucial the fact that they had a serious quality assurance process from the very beginning proved to yield the final quality of the product. That’s the lesson I’m now quickly learning. When I learned that InspireData was mostly tested by hand by a skilled QA team, I felt somewhat proud of being able to automate large parts of the process at Sentivision. I’m very curious where this path will lead me to. Let’s hope for the best!</p><p>The next speaker was Nicolas Neuss of University of Karlsruhe, talking about <a href="http://www.femlisp.org/">Femlisp</a>, a framework for solving partial differential equations in Common Lisp. I have little to say about this one, since I lack the mathematical background needed to fully comprehend and appreciate the topic; it’s just not my kettle of fish. Undoubtedly, though, Femlisp seems to be filling its niche in a neat way, as the demonstrations showed.</p><p>After a coffee break, Stefan Richter came up with the one presentation that I’ve been looking forward to the most; that of using Common Lisp for large, scalable Internet systems. After all the talks were over, Maciek dubbed it a “very nice anti-FUD presentation” and I could not agree more. I didn’t learn many new things from it, but the author clearly knows how to attempt to convince non-Lispers to try out Lisp. The talk started off with outlining the typical designs of Web apps and portals, starting with simple one-server scenarios that don’t scale well and progressing in the direction of more scalable and extensible ones. Stefan then pointed out that in some mainstream languages like Java there exists a mature and proven infrastructure for employing such designs. And then came the key point — <em>that this is the case also for Common Lisp!</em> All the necessary tools are there, ready to use Right Now and free for the most part; they’re just not as mature in some cases. This is not much of a problem, though, given the incremental nature of Lisp development: any problem at hand is typically fixable much faster than in case of other languages. The only weird thing was that the author advocated using continuation-based Web frameworks (such as <a href="http://common-lisp.net/project/cl-weblocks/">Weblocks</a> or <a href="http://common-lisp.net/project/ucw/">UnCommon Web</a>) just a couple of minutes after discouraging using sticky sessions.</p><p>Next came Killian Sprotte with a speech about <a href="http://www2.siba.fi/PWGL/">PWGL</a>, the program for computer-aided music composing. I have very mixed feelings about it. Notice that I didn’t use the word “presentation” — there was no presentation at all. Yes, that’s right. The speaker was just talking and showing off various things in the musical engine. Now, having no presentation accompanying the talk is not necessarily a bad thing in itself; but without one, it’s a little harder to draw attention of the audience and a whole lot easier to deliver a chaotic talk instead of a cleanly-structured and well-organized one. Such was the case with this speech. Some features were shown, but with a fair amount of obscurity and boredom thrown in, leaving me with a rather low overall impression.</p><p>As for PWGL itself, some of the ideas employed in it seem a bit peculiar (for want of a better word) to me. As befits a Lisp program, it is an extensible utility that actually allows users to program music just as one programs software. But the way that programming is done… well, think of a graphical editor for Lisp programs. An editor in which to write, say, a factorial function, you right-click an initially blank sheet, select <code>defun</code> from a pop-up with a complicated set of menus and submenus… and kaboom! up comes a box divided into several sub-boxes. They correspond to — what else they could? — the name of the function, list of arguments, and a body. You can draw boxes representing computations, drag them around and link them with arrows — this is supposed to build complicated expressions out of simpler ones. And there is a huge library of musical tools, all available for the convenience of a programmer. Or, should I say, a composer.</p><p>Sounds cool? Maybe — for a newbie. I can’t really say. As someone who has high experience with Lisp and programming in general, I can only speak for myself. And for me all this click-and-drag-programming seems to be an unnecessarily tedious, obscure and error-prone way of doing things. Stuff like score editors, chord editors or various transformations is admittedly cool, but for lower-level matters the kind of visualization PWGL offers (and it obviously has its rough edges) seems to get in the way rather than staying out of it. But perhaps that’s just me?</p><p>By the time the fourth talk ended, most Lispers were already hungry, so a lunch break followed. I talked to some guy (I don’t remember his name, alas) who’s working on porting Clozure CL to 64-bit Windows. This is great news — when the port’s complete, it has high chances of becoming the free Common Lisp implementation of choice for many Windows Lisp hackers.</p><p>Juan José García-Ripoll then <a href="http://ecls.wiki.sourceforge.net/space/showimage/eclm2008.pdf">talked</a> about <a href="http://ecls.sourceforge.net/">ECL</a>, another CL implementation that is characterized by a fairly small memory and disk footprint, while still managing to achieve decent performance (via compilation to C) and good standard compliance. It was good to see that ECL is still quite alive and getting better and better with each release. Just for the heck of it, I attempted in the evening to reproduce the problem I had with ECL a while ago on a fresh CVS checkout. I managed to reproduce it (for the curious, it was an issue with ECL failing to build after having been <code>configure</code>d with the option <code>--disable-shared</code>). So I reported the bug to Juan, and he promised to look into it within the next days. And I must say that reporting bugs IRL to open source projects’ maintainers is a very nice experience. :-)</p><p>And then came a really big surprise, and I mean a <em>nice</em> surprise. It took the form of Kristofer Kvello of <a href="http://www.selvaag.no/">Selvaag</a>, a Norwegian-based house-building company, and his presentation on <a href="http://www.selvaag.no/en/Companies/Selvaagbluethink/aboutus/Sider/default.aspx">House Designer</a>, a Common Lisp program for aiding in designing residences, as the name suggests. Yet another example of a success story in an area CL can really excel at. Basically, what House Designer can do is that you give it a <em>sketch</em>, containing a rough description of the shape of a flat or residence and layout of rooms, and out comes a very detailed project with all sorts of bells and whistles: the program automatically figures out what the number of windows should be and where they should be located, the number and location of electric outlets, the optimal types of walls, layout of water installation and what not. It’s transfixing when you think of the sheer amount of tedious labour it automates, taking into account all of the professional knowledge about designing houses accumulated over years, some parts of which a human can easily omit. And it’s been Lisp all the lifetime of this project, and it’s Lisp all the way down (except for the GUI in Java)! Very, very impressive!</p><p>Marc Battyani’s talk about <a href="http://www.hpcplatform.com/">programming FPGAs in Lisp</a> probably should not have been stacked so late in the programme. I mean, the topic seems to be quite interesting (though a bit low-level for my interests), but there was something about Marc’s way of talking and showing things that sent me off dozing almost instantaneously. I’d been a bit tired after the many hours of sitting and listening to speeches, especially after having woken up at six o’clock, and so I somewhat regret missing large parts of the talk. It’s nice to know, though, that it is possible to do such things with Lisp. Seems to have a high hack value, as in: “Why do it this way? Because we <em>can</em>!”</p><p>And what better end of a conference could one ask for than a rant by Kenny Tilton? If you have only encountered Kenny on the Usenet (some of the crème de la crème of his postings is <a href="http://www.pasternacki.net/en/ken-tilton-fortunes">meticulously collected by Maciek</a>) and think he’s one heck of a freak, you definitely should listen to him live. Here was another talk without slides — just talking and demonstrating stuff — but this time, it was a totally different thing. Kenny sure knows how to attract the attention of the audience and how not to let it loose throughout an hour’s worth of talking. And he changes topics with mastery, using digressions to a great effect to avoid the boredom slipping in, caused by bragging about one thing all the time. There was <a href="http://smuglispweeny.blogspot.com/2008/02/cells-manifesto.html">Cells</a> in that talk, there was <a href="http://www.theoryyalgebra.com/">teaching of algebra</a>, and there was high-speed driving through the streets of New York. I only hope someone has recorded that to put it online.</p><p>So, this was it. There was much talk afterwards, there was much beer, there was much socializing, there was much rejoicing. I saw a real <a href="http://laptop.org/">XO-1</a> and played with it for a while, and boy, isn’t it cute! And then we all came back. And here I am, sitting at my desk in Warsaw (it’s the next day already; I really wish my laptop had a better battery), finishing up this longish blog entry and asking myself: was this 50 euro well spent?</p><blockquote><p><em>Yes, it was a worthwhile experience, hahhahahahahhaaaa!</em><br> (evil chuckle à la Kenny)<br> <em>It was worth it!</em></p></blockquote></div>tag:blog.danieljanus.pl,2008-04-16:post:poliqarp-new-protocolPoliqarp’s new protocol2008-04-16T00:00:00Z<div><p>The first version of the document I’ve been writing about <a href="http://blog.danieljanus.pl/tex-hackery.html">a couple of days ago</a> is now <a href="http://bach.ipipan.waw.pl/~nathell/new-protocol.pdf">ready for public review</a>. I’ll be making an initial attempt at the implementation once I return from the <a href="http://weitz.de/eclm2008/">European Common Lisp Meeting ‘08</a> and write a report.</p></div>tag:blog.danieljanus.pl,2008-04-14:post:im-not-playing-this-stupid-game-anymoreI’m not playing this stupid game anymore2008-04-14T00:00:00Z<div><p>Not until the next tournament, that is. My achievements in the 12th Scrabble Championship of Warsaw can be described as “mediocre” at best; four won, one drawn and seven lost games mean that my general rating will drop down by two points or so. Oh well. Everybody knows it’s a stupid game. ;-) At least I’ve managed to get a decent small score, with an average of 377 points per game.</p><p>Random resolutions for the indefinite future:</p><ol><li><p>Get a final draft of the C++09 standard when it’s ready and acquaint myself with it as closely as possible. I strongly dislike C++ (and I’m not alone in this — see the <a href="http://yosefk.com/c++fqa/">Frequently Questioned Answers</a> about C++ for very detailed criticisms); however, I’ve long wanted to learn that language better just to know all the strengths and weaknesses of the enemy. The ideal moment for this will be when the new standard is out; this will give me the advantage of not having to unlearn the things changed by the standard, while staying on a cutting and competitive edge.</p></li><li><p>Get a copy of <a href="http://en.wikipedia.org/wiki/Federico_Garc%C3%ADa_Lorca">Federico García Lorca</a>’s poems translated into Polish by Jerzy Ficowski. I have only a very vague knowledge of Lorca (just his Romance of the Spanish Civil Guard (<a href="http://www.poesia-inter.net/index214.htm">Romance de la Guardia Civil Española</a>)), but I very much like what little I know.</p></li></ol></div>tag:blog.danieljanus.pl,2008-04-10:post:recipe-for-successful-presentationRecipe for a successful presentation2008-04-10T00:00:00Z<div><p><a href="http://www.latex-project.org/">LaTeX</a> + <a href="http://latex-beamer.sourceforge.net/">Beamer</a> (for typesetting the presentation in a visually pleasant, clean, simple and consistent way) + <a href="http://impressive.sourceforge.net/">KeyJNote</a> (for presenting it stylishly to the audience) = a recipe for success. In particular, KeyJNote, which I found only yesterday, seems to be a fine and tremendously useful piece of software, despite being very young. The only annoyance I have found in it is that it doesn’t respond to Alt-Tab when in fullscreen mode. On the typographical side, I used the <a href="http://www.cert.fr/dcsd/THESES/sbouveret/francais/LaTeX.html">progressbar</a> Beamer theme and the <a href="http://www.nowacki.strefa.pl/torunska-e.html">Torunian Antiqua</a> font, both to great effect.</p><p>While I’m at this topic, <a href="http://jan.rychter.com/">Jan Rychter</a> has recently posted <a href="http://jan.rychter.com/blog/files/sztuka-prezentacji-03-2008.html">a great guide to giving presentations</a>, especially short ones. I heartily recommend it to those of you who speak Polish (is there actually any non-Polish-speaking person reading this?)</p><p><em>Update 2010-Jan-17</em>: KeyJNote is now called <a href="http://impressive.sourceforge.net/">Impressive</a>.</p></div>tag:blog.danieljanus.pl,2008-04-07:post:yavpYAVP2008-04-07T00:00:00Z<div><pre><code>Jezus, the high elven wizard, saved the world with his brave efforts and became a great ruler while saving himself 34 times.
He scored 24999532 points and advanced to level 50.
He survived for 0 years, 123 days, 0 hours, 11 minutes and 39 seconds (176207 turns).
Jezus visited 127 places.
His strength score was modified by +26 during his career.
His learning score was modified by +17 during his career.
His willpower score was modified by +10 during his career.
His dexterity score was modified by +7 during his career.
His toughness score was modified by +25 during his career.
His charisma score was modified by +9 during his career.
His appearance score was modified by +6 during his career.
His mana score was modified by +13 during his career.
His perception score was modified by +15 during his career.
He was unnaturally aged by 76 years.
He was the champion of the arena.
He was a member of the thieves guild.
He made a little water dragon very happy.
He defeated the arch enemy of a mighty karmic wyrm.
He adhered to the principles of the Cat Lord and thus rose to great fame.
He saved Khelavaster from certain death.
He left the Drakalor Chain after completing his quest and became a great leader and famous hero.
</code></pre><p>Yay! Now that I have finished ADOM for the first time ever, after something like six years of trying, I can finally get back to work with peace of mind. :-)</p></div>tag:blog.danieljanus.pl,2008-04-07:post:ubuntu-postinstallUbuntu post-installation tricks2008-04-07T00:00:00Z<div><p>Yesterday, my level of frustration with my old operating system at work exceeded a critical point, and I installed a fresh daily build of the not-yet-released <a href="https://wiki.ubuntu.com/HardyHeron">Ubuntu 8.04</a> in place of it. Then, in addition to usual post-installation chores like setting up mail, hardware, etc., I performed a couple of steps to make the system more pleasurable to use. Here’s what I did, just in case someone finds this useful.</p><ol><li><p>First, I tweaked the font rendering. This was one area that has long been a PITA for Linux users (at least for me, since 2000 or so), but as far as Ubuntu is concerned, they introduced a change to Freetype somewhere along the way between Feisty and Gutsy which, when set up properly, makes the font rendering on LCD displays far superior for me to that of, say, Windows XP, in particular at small font sizes. The way to enable it is to enable sub-pixel rendering, and set the hinting level to “slight.” This results in a rendering very close to what the author of <a href="http://www.antigrain.com/research/font_rasterization">Texts Rasterization Exposures</a> managed to achieve.</p></li><li><p>I installed the package <code>msttcorefonts</code> to get Microsoft’s free-as-in-beer set of core TrueType fonts, including Times New Roman, Arial, Georgia, etc. There are very many sites out there on the Web that were designed with these fonts in mind, and this is one of the few areas Microsoft doesn’t completely suck at.</p></li><li><p>Next I enabled bitmap fonts. The way to do this is to become root, cd to <code>/etc/fonts/conf.d</code>, remove the symlink named <code>70-no-bitmaps.conf</code>, and make a symlink pointing to <code>/etc/fonts/conf.avail/70-yes-bitmaps.conf</code> instead. This would come in handy in the next step.</p></li><li><p>Which was installing my favourite console font. Unfortunately, it doesn’t come preinstalled with the Gnome-based Ubuntu, but it was no big deal. The font is named console8x16 and it comes with Kubuntu’s (and KDE’s) default terminal emulator, Konsole. So I downloaded <a href="http://packages.ubuntu.com/hardy/konsole">an appropriate package</a> (manually, without the help of APT, because all I wanted was the font, not the package itself). I then installed Midnight Commander (which I use a lot, if only for its great vfs feature, which allows to access, inter alia, Debian/Ubuntu packages as if they were directories), grabbed the file <code>console8x16.pcf.gz</code>, installed it in <code>/usr/share/fonts/X11/misc</code>, changed to that directory, ran <code>mkfontdir</code> and <code>mkfontscale</code>, logged out and restarted the X server.</p></li></ol><p>The last step was to use this font for Emacs, too. So I installed Emacs, created the file <code>~/.Xdefaults</code> containing the single line</p><pre><code>Emacs*font: -misc-console-medium-r-normal--16-160-72-72-c-80-iso10646-1
</code></pre><p>and ran <code>xrdb ~/.Xdefaults</code>.</p><p>Then I got round to configuring Emacs itself. But that’s a story for another post.</p></div>tag:blog.danieljanus.pl,2008-04-06:post:tex-hackeryThe TeX Hackery2008-04-06T00:00:00Z<div><p>After a longish while of inactivity, I finally got around to finishing the draft spec of a next-generation protocol for <a href="http://poliqarp.sf.net/">Poliqarp</a>, the be-all-end-all corpus concordance tool that I maintain. The spec is being written in LaTeX, and it has a number of subsections that describe particular methods of the protocol. Each one of those is further divided into sub-subsections that describe the method’s signature, purpose, syntax of request, syntax of response, and an optional example. I thought to write a couple of macros to help me separate the document’s logic from details of formatting, so that I could say:</p><pre><code class="hljs latex"><span class="hljs-keyword">\synopsis</span>/() -> {version : int; extensions : string*}/
</code></pre><p>and have it expanded into:</p><pre><code class="hljs latex"><span class="hljs-keyword">\paragraph</span>{Synopsis}
<span class="hljs-keyword">\verb</span>/<span class="hljs-string">{version : int; extensions : string*}</span>/
</code></pre><p>Being a casual LaTeX user who hardly ever writes his own macros, I first thought to use LaTeX’s command-defining commands, <code>\newcommand</code> and <code>\renewcommand</code>. However, I quickly ran into the limitation that the argument of commands defined in such a way can only be delimited by curly braces, which I could not use because they might appear in the argument itself.</p><p>I googled around and found that this limitation can be overcome by using <code>\def</code> instead, which is not a LaTeX macro but rather an incantation of plain TeX, and allows to use arbitrary syntax for delimiting arguments. Having found that, my first shot was:</p><pre><code class="hljs latex"><span class="hljs-keyword">\def</span><span class="hljs-keyword">\synopsis</span>/<span class="hljs-params">#1</span>/{<span class="hljs-keyword">\paragraph</span>{Synopsis}<span class="hljs-keyword">\verb</span>/<span class="hljs-string">#1</span>/}
</code></pre><p>which, obviously enough, turned out not to work, producing errors about <code>\verb</code> ended by an end-of-line.</p><p>“What the heck?” I thought, and resorted to Google again, this time searching for tex macros expanding to verb. This yielded an entry from some TeX FAQ, which basically states that the <code>\verb</code> is a “fragile” command, and as such it cannot appear in bodies of macros. Ook. So it can’t be done?</p><p>“But,” I thought, “TeX is such a flexible and powerful tool, there must be some way around this!” And, as it would turn out, there is. Yet more googling led me to <a href="http://groups.google.pl/group/comp.text.tex/browse_thread/thread/5bca05fb8865a9c2">this thread</a> on comp.text.tex, where someone gives the following answer for a similar question:</p><pre><code class="hljs latex"><span class="hljs-keyword">\def</span><span class="hljs-keyword">\term</span><span class="hljs-params">#</span>{ <span class="hljs-comment">%</span>
<span class="hljs-keyword">\afterassignment</span><span class="hljs-keyword">\Term</span> <span class="hljs-keyword">\let</span><span class="hljs-keyword">\TErm</span>= }<span class="hljs-comment">%</span>
<span class="hljs-keyword">\edef</span><span class="hljs-keyword">\Term</span>{<span class="hljs-keyword">\noexpand</span><span class="hljs-keyword">\verb</span> \<span class="hljs-string">string}}
</span></code></pre><p>Now this is overkill. Why in the world am I forced to stuff such incomprehensible hackery into my document just to perform a seemingly simple task?! Easy things should be easy — that’s one of the principles of good design.</p><p>Reluctantly, I copied it over, and attempted to adjust it to my needs. After a number of initial failed attempts, I thought that I might actually attempt to understand what all these <code>\afterassignment</code>’s, <code>\noexpand</code>’s and <code>\edef</code>’s are for, so I downloaded the <a href="http://www-cs-faculty.stanford.edu/~knuth/abcde.html">TeXbook</a> and dived straight in.</p><p>I spent another fifteen minutes or so reading bits of it and trying to understand tokens, macros, when they are expanded and when merely carried over, etc. But a sparkle of thought made me replace the whole complicated thingy with a simple snippet that actually worked.</p><pre><code class="hljs latex"><span class="hljs-keyword">\def</span><span class="hljs-keyword">\synopsis</span>{<span class="hljs-keyword">\paragraph</span>{Synopsis}<span class="hljs-keyword">\verb</span>}<span class="hljs-string">
</span></code></pre><p>That’s right. This superficially resembles a C preprocessor macro, and works because I was lucky enough to have <code>\verb</code> appear last in the definition, thus allowing the “arguments” of <code>\synopsis</code> to be specified just like arguments to <code>\verb</code> and fit at exactly right place. I’m almost certain that it does not always work this way, but for now it’ll suffice.</p><p>Oh well. TeX is undoubtedly a fine piece of software that provides splendid results if used right. But I can’t get over the impressions that there are a great deal more idiosyncracies like this in it than in, say, Common Lisp, even though the latter’s heritage tracks back to as early as 1958 and is a whopping twenty years longer than TeX’s. (On the side note, as it turns out, someone has already written <a href="http://www3.interscience.wiley.com/cgi-bin/abstract/98518913/ABSTRACT">a Lisp-based preprocessor for TeX macros</a>. Gotta check it out someday.)</p><p>As for the TeXbook itself: it is a fine piece of documentation that I will definitely have to add to my must-read list, though it admittedly has a math-textbookish feel to it. First, however, I want to finish “Shaman’s Crossing” by Robin Hobb (which I will probably brag about in a separate post once I’m finished with it) and tackle Christian Queinnec’s “Lisp in Small Pieces”.</p></div>tag:blog.danieljanus.pl,2008-04-04:post:introductionIntroduction2008-04-04T00:00:00Z<div><p>So, there. Inspired by the newly-started blogs of some of <a href="http://www.joemonster.org/blog/pietshaq">my</a> <a href="http://jan.rychter.com/blog/">acquaintances</a>, I had this thought that I might actually have a word or two on a number of subjects, and that it might even be worth sharing. And here I am, typing this introductory entry in my Emacs.</p><p>I guess the first thing one usually writes about in a blog is introducing himself to the public, so for those of you who have arrived here through some links on the Web and don’t know me, here goes. I am a 23-year-old (soon to be 24, though) programmer geek, living in Warsaw, Poland. In 2006, I graduated from Warsaw University, where I majored in computer science, and am now working full-time as a senior software engineer at <a href="http://www.sentivision.com/">Sentivision</a>. I have a <a href="http://danieljanus.pl/">homepage</a> (currently in Polish only, though I probably will translate it to English some day). These days, I tend to use Common Lisp for most of my programming work, though I also occassionally use Python, Ruby, C, Perl, OCaml, Haskell, Java, and a handful of other languages.</p><p>This is going to be a blog about my personal interests. This means mostly programming, with a fair share of posts about books, poetry, Scrabble, music, biking, cats, and a bunch of other topics thrown in. The new entries will most likely be added irregularly, whenever I feel like sharing a thought. My mother tongue is Polish, but I will try to maintain this blog in English just to polish up my English writing skills (no pun intended) and for greater worldwide understandability.</p><p>As you might have noticed, there is no possibility of leaving comments. This is a side-effect of the fact that this blog is, technically, just a bunch of static HTML pages (automatically generated with Blosxom), and is in line with my idea of blog comments: I view them as a way of providing direct feedback to the author, not as a publically available message board with discussions having a heavy tendency to drift off topic. So, should you like to comment on some post, feel free to drop me an email; I’ll be happy to respond to interesting mails in the blog. I can be contacted at dj at danieljanus dot pl.</p><p><em>Update 2010-Jan-17</em>: Several things have changed —– in particular, Sentivision doesn’t exist anymore — but I haven’t done any editing in this post other than updating the links.</p></div>