<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>tag:blog.danieljanus.pl,2019:category:lifehacking</id>
  <title>Daniel Janus – lifehacking</title>
  <link href="http://blog.danieljanus.pl/category/lifehacking/"/>
  <updated>2012-04-12T00:00:00Z</updated>
  <author>
    <name>Daniel Janus</name>
    <uri>http://danieljanus.pl</uri>
    <email>dj@danieljanus.pl</email>
  </author>
  <entry>
    <id>tag:blog.danieljanus.pl,2012-04-12:post:lifehacking-gumtree</id>
    <title>Lifehacking: How to get cheap home equipment using Clojure</title>
    <link href="http://blog.danieljanus.pl/lifehacking-gumtree/"/>
    <updated>2012-04-12T00:00:00Z</updated>
    <content type="html">&lt;div&gt;&lt;p&gt;I’ve moved to London last September. Like many new Londoners, I have changed accommodation fairly quickly, being already after one removal and with another looming in a couple of months; my current flat was largely unfurnished when I moved in, so I had to buy some basic homeware. I didn’t want to invest much in it, since it’d be only for a few months. Luckily, it is not hard to do that cheaply: many people are moving out and getting rid of their stuff, so quite often you can search for the desired item on &lt;a href="http://www.gumtree.com/london"&gt;Gumtree&lt;/a&gt; and find there’s a cheap one a short bike ride away.&lt;/p&gt;&lt;p&gt;Except when there isn’t. In this case, it’s worthwhile to check again within a few days as new items are constantly being posted. Being lazy, I’ve decided to automate this. A few hours and a hundred lines of Clojure later, &lt;a href="https://github.com/nathell/gumtree-scraper"&gt;gumtree-scraper&lt;/a&gt; was born.&lt;/p&gt;&lt;p&gt;I’ve packaged it using &lt;code&gt;lein uberjar&lt;/code&gt; into a standalone jar, which, when run, produces a &lt;code&gt;gumtree.rss&lt;/code&gt; that is included in my Google Reader subscriptions. This way, whenever something I’m interested in appears, I get notified within an hour or so.&lt;/p&gt;&lt;p&gt;It’s driven by a Google spreadsheet. I’ve created a sheet that has three columns: item name, minimum price, maximum price; then I’ve made it available to anyone who knows the URL. This way I can edit it pretty much from everywhere without touching the script. Each time the script is run (by cron), it downloads that spreadsheet as a CSV that looks like this:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;hand blender,,5
bike rack,,15
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For each row the script queries Gumtree’s category “For Sale” within London given the price range, gets each result and transforms it to a RSS entry.&lt;/p&gt;&lt;p&gt;Gumtree has no API, so I’m using screenscraping to retrieve all the data. Because the structure of the pages is much simpler, I’m actually scraping the &lt;a href="http://m.gumtree.com/"&gt;mobile version&lt;/a&gt;; a technical twist here is that the mobile version is only served to actual browsers so I’m supplying a custom User-Agent, pretending to be Safari. For actual scraping, the code uses &lt;a href="https://github.com/cgrand/enlive"&gt;Enlive&lt;/a&gt;; it works out nicely.&lt;/p&gt;&lt;p&gt;About half of the code is RSS generation — mostly XML emitting. I’d use &lt;code&gt;clojure.xml/emit&lt;/code&gt; but it’s known to &lt;a href="http://clojure-log.n01se.net/date/2012-01-03.html#17:28a"&gt;produce malformed XML&lt;/a&gt; at times, so I include a variant that should work.&lt;/p&gt;&lt;p&gt;In case anyone wants to tries it out, be aware that the location and category are hardcoded in the search URL template; if you want, change the template line in &lt;code&gt;get-page&lt;/code&gt;. The controller spreadsheet URL is not, however, hardcoded; it’s built up using the &lt;code&gt;spreadsheet.key&lt;/code&gt; system property. Here’s the wrapper script I use that is actually run by cron:&lt;/p&gt;&lt;pre&gt;&lt;code class="hljs bash"&gt;&lt;span class="hljs-meta"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="hljs-keyword"&gt;if&lt;/span&gt; [ &lt;span class="hljs-string"&gt;&amp;quot;`ps ax | grep java | grep gumtree`&amp;quot;&lt;/span&gt; ]; &lt;span class="hljs-keyword"&gt;then&lt;/span&gt;
  &lt;span class="hljs-built_in"&gt;echo&lt;/span&gt; &lt;span class="hljs-string"&gt;&amp;quot;already running, exiting&amp;quot;&lt;/span&gt;
  &lt;span class="hljs-built_in"&gt;exit&lt;/span&gt; 0
&lt;span class="hljs-keyword"&gt;fi&lt;/span&gt;
&lt;span class="hljs-built_in"&gt;cd&lt;/span&gt; &lt;span class="hljs-string"&gt;&amp;quot;`dirname &lt;span class="hljs-variable"&gt;$0&lt;/span&gt;`&amp;quot;&lt;/span&gt;
java -Dspreadsheet.key=MY_SECRET_KEY -jar &lt;span class="hljs-variable"&gt;$HOME&lt;/span&gt;/gumtree/gumtree.jar
&lt;span class="hljs-built_in"&gt;cp&lt;/span&gt; &lt;span class="hljs-variable"&gt;$HOME&lt;/span&gt;/gumtree/gumtree.rss &lt;span class="hljs-variable"&gt;$HOME&lt;/span&gt;/public_html
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now let me remove that entry for a blender — I’ve bought one yesterday for £4…&lt;/p&gt;&lt;/div&gt;</content>
  </entry>
</feed>
