Thursday, March 26, 2015

A Strategy of No Skill

I love this:

Russ: I get an email from a football predictor who says, 'I know who is going to win Monday night. I know which team you should bet on for Monday night football.'

And I get this email, and I think, well, these guys are just a bunch of hacks. I'm not going to pay any attention to it. But it turns out to be right; and of course who knows? It's got a 50-50 chance. But then, for the next 10 weeks he keeps sending me the picks, and I happen to notice that for 10 weeks in a row he gets it right every time. And I know that that can't be done by chance, 10 picks in a row.

He must be a genius. And of course, I'm a sucker. Why?

Guest: So, let's say after those 10 weeks in a row you actually subscribe to this person's predictions. And then they don't do so well, after the 10 weeks.

And the reason is that the original strategy was basically: Send an email to 100,000 people, and in 50,000 of those emails you say that Team A is going to win on Monday. And in 50,000 you say Team B is going to win on Monday.

And then, if Team A wins, the next week you only send to the people that got the correct prediction. So, the next week you do the same thing. 25,000 for Team A, 25,000 for Team B. And you continue doing this. And the size of the number of emails decreases every single week, until after that 10th week, there are 97 people that got 10 picks in a row correct. So you harvest 97 suckers out of this. (

Or in other words:

$ irb
2.1.3 :001 > people = 100_000
2.1.3 :002 > { people /= 2 }
  [0] 50000,
  [1] 25000,
  [2] 12500,
  [3] 6250,
  [4] 3125,
  [5] 1562,
  [6] 781,
  [7] 390,
  [8] 195,
  [9] 97

Saturday, February 21, 2015

A minimalistic node version manager

If you pay attention to nodejs world & suddenly find yourself using 3 version of node simultaneously, you probably may start thinking about a version manager.

There are some existing ones, like nvm & n. They are nice, but both are written in bash & may require a periodic update after a new node/iojs release.

What I want from the 'manager' is that it doesn't integrate itself w/ a shell & doesn't require a constant updating.

A 'non-updating' feature resolves in a drastic code simplification: if a version manager (VM) doesn't know how to install a new node version whatsoever, then you don't need to update its code (hopefully whatsoever too).

A non-bash requirement dates back to rvm, which has been redefining cd for us since 2009. It doesn't mean of course that a VM written in bash would obligatory modify built-in shell commands, but observing the rvm struggle w/ bash, have discouraged me from sh-like solutions.

The VM should be fast, so writing it in Ruby (unfortunately) is not an option, due to a small (but a noticeable) startup overhead that any ruby CLI util has. Ideally it also should have no dependencies.

This leaves us w/ several options. We can use mruby or plain C or, wait, there is Golang! In the past its selling point was a 'system' language feeling.

Well. I can tell that it's not as poignant as Ruby for sure, but it's hyper fast & quite consistent. It took me roughly a day to feel more or less comfortable w/ it, which is incomparable w/ a garbage like C++. Frankly I was surprised myself that it went so smooth.

Back to the YA version manager for node. It's called nodever, it uses a 'subshell' approach via installing system-wide wrappers & it's a tiny Go program.

Saturday, February 7, 2015

node.js 0.12, stdin & spawnSync

If you have in your code a quick hack like this:

stdin = fs.readFileSync('/dev/stdin').toString()

& it works fine & nothing really happens bad, so you may start wondering one day why is it considered by everyone as a temporal solution?

Node's readFileSync() uses stat(2) to get the size of a file it tries to read. By definition, you can't know ahead the size of stdin. As one dude put it on SO:

Imagine stdin is like a water tap. What you are asking is the same as "How much water is there in a tap?".

by using stat(2) readFileSync() will read up to what lenght value the kernel will lie/guess about /dev/stdin.

Another issues comes w/ testing. If you have a CL utility & want to write an acceptance test for it using 'new' node 0.12 child_process.spawnSync() API, expect funny errors.

Suppose we have a node version of cat that's written in a dumb 'synchronous' way. Call it cat-1.js:

#!/usr/bin/env node

var rfs = require('fs').readFileSync

if (process.argv.length == 2) {
} else {
        process.argv.slice(2).forEach(function(file) {

Now we write a simple test for it:

var assert = require('assert')
var spawnSync = require('child_process').spawnSync

var r = spawnSync('./cat-1.js', { input: 'hello' })
assert.equal('hello', r.stdout.toString())

& run:

$ node test-cat-1-1.js

  throw new assert.AssertionError({
AssertionError: 'hello' == ''
    at Object.<anonymous> (/home/alex/lib/writing/

What just happened? (I've cut irrelevant trace lines.) Why the captured stdout is empty? Lets change the test to:

var assert = require('assert')
var spawnSync = require('child_process').spawnSync

var r = spawnSync('./cat-1.js', { input: 'hello' })

then run:

$ node test-cat-1-2.js
  return, stringToFlags(flags),
Error: ENXIO, no such device or address '/dev/stdin'
    at Error (native)
    at Object.fs.openSync (fs.js:502:18)
    at fs.readFileSync (fs.js:354:15)

At this point unless you want to dive into libuv internals, that quick hack of explicitly reading /dev/stdin should be changed to something else.

In the past node maintainers disdained the stdin sync read & called it an antipattern. The recommended way was to use streams API, where you employed process.stdin as a readable stream. Still, what if we really want a sync read?

The easiest way is to make a wrapper around readFileSync() that checks filename argument & invokes a real readFileSync() when it's not equal to /dev/stdin. For example, lets create a simple module readFileSync:

var fs = require('fs')

module.exports = function(file, opt) {
        if ( !(file && file.trim() === '/dev/stdin'))
                return fs.readFileSync(file, opt)

        var BUFSIZ = 65536
        var chunks = []
        while (1) {
                try {
                        var buf = new Buffer(BUFSIZ)
                        var nbytes = fs.readSync(process.stdin.fd, buf, 0, BUFSIZ, null)
                } catch (err) {
                        if (err.code === 'EAGAIN') {
                                // node is funny
                                throw new Error("interactive mode isn't supported, use pipes")
                        if (err.code === 'EOF') break
                        throw err

                if (nbytes === 0) break
                chunks.push(buf.slice(0, nbytes))

        return Buffer.concat(chunks)

It's far from ideal, but at least it doesn't use stat(2) for determining stdin size.

We modify out cat version to use this module:

#!/usr/bin/env node

var rfs = require('./readFileSync')

if (process.argv.length == 2) {
} else {
        process.argv.slice(2).forEach(function(file) {

& modify the original version of the acceptance test to use it too:

var assert = require('assert')
var spawnSync = require('child_process').spawnSync

var r = spawnSync('./cat-2.js', { input: 'hello' })
assert.equal('hello', r.stdout.toString())

& run:

$ node test-cat-2-1.js

Yay, it doesn't throw up an error & apparently works!

To be sure, generate a big file, like 128MB:

$ head -c $((128*1024*1024)) < /dev/urandom > 128M

then run:

$ cat 128M | ./cat-2.js > 1
$ cmp 128M 1
$ echo $?

Which should return 0 if everything was fine & no bytes were lost.

Sunday, December 14, 2014

A Naive Benchmark of GnuPG 2.1 Symmetric Algorithms

Some symmetric algo benchmarks already exist, but still don't answer to a typical question for a typical setup:

I do a regular backup of N (or even K) gigabytes. I don't want the backup to be readable by a random hacker form Russia (if he breaks into my server). What algo should I use to encrypt the backup as fast as possible?

This rules out many existing benchmarks.

The typical setup also includes gpg2. I don't care about synthetic algo tests (like 'I read once that Rijndael is fast & 3DES is slow'), I'm interested in a particular implementation that runs on my machines.

(Note that benchmarks below are not 'scientific' in any way; they are meant to be useful for 1 specific operation only: encrypting binary blobs through ruby-gpeme.)

gpg2 cli program

The first thing I did was to run

$ gpg2 --batch --passphrase 12345 -o out --compress-algo none \
    --cipher-algo '<ALGO>' -c < file.tar.gz

But was quickly saddened because the results weren't consistent: the deviation between runs was too big.

What we needed here was to dissociate the crypto from the IO.


'Modern' versions of GnuPG have detached a big chunk of the crypto magic into a separate low-level library libgcrypt. If we want to test symmetric ciphers w/o any additional overhead, we can write a nano version of gpg2.

It'll read some bytes from /dev/urandom, pad them (if a block cipher mode requires it), generate an IV, encrypt, prepend the IV to an encrypted text, append a MAC, run that for all libgcrypt supported ciphers. Then we can draw a pretty graph & brag about it to coworkers.

The problem is that there is no any docs (at least I haven't found them) about a general format that gpg2 uses for block ciphers. And you need it because a decipher must be able to know what algo was used, its cipher mode, where to search for a stored IV, etc.

There is OpenPGP RFC 4880 of course:

The data is encrypted in CFB mode, with a CFB shift size equal to the cipher's block size. The Initial Vector (IV) is specified as all zeros. Instead of using an IV, OpenPGP prefixes a string of length equal to the block size of the cipher plus two to the data before it is encrypted.

That's better than nothing, but still leaves us w/ n hours of struggling to write & test code that will produce an encrypted stream suitable for gpg2.


GnuPG has an official library that even has bindings for such languages as Ruby. It's an opposite of libgcrypt: it does all the work for you, where libgcrypt doesn't even provide auto padding.

The trouble w/ gpgme is that it was unusable for automated testing purposes until GnuPG hit version 2.1 this fall.

For instance,

  • Versions 2.0.x cannot read passwords w/o pinentry.
  • At the time of writing, 2.1 isn't available on any major Linux distribution (except Arch, but I'm not using it anywhere (maybe I should)).
Writing a Benchmark

ruby-gpgme has a nifty example for symmetric ciphers:

crypto = password: '12345'
r = crypto.encrypt "Hello world!\n", symmetric: true

where will return an encrypted string.

We have 2 problems here:

  1. There is absolutely no way to change through the API the symmetric cipher. (The default one is CAST5.) This isn't a fault of ruby-gpgme, but the very same gpgme library under it.

    GnuPG has a concept of a 'home' directory (it has nothing to do w/ user's home directory, it just uses it as a default). Each 'home' can have its number of configuration files. We need gpg.conf file there w/ a line:

    personal-cipher-preferences <algo>
  2. The modest password: '12345' option does nothing unless archaic gpg1 is used. W/ gnupg 2.0.x an annoying pinentry window will pop-up.

    E.g. installing 2.1 is the only option. Instead overwriting the existing 2.0.x installation (and possibly breaking your system), install 2.1 under a separate prefix (for example, to ~/tmp/gnupg).

    Next, for each gpg 'home' directory we need to add to gpg.conf another line:

    pinentry-mode loopback

    & create a gpg-agent.conf file w/ a line:


The benchmark works like this:

  1. Before running any crypto operations, for each cipher we create a 'home' directory & fill it w/ custom gpg.conf & gpg-agent.conf files.
  2. Start a bunch of copies of gpg-agent, each for a different 'home' dir.
  3. Add a bin directory of our fresh gnupg 2.1 installation to the PATH, for example ~/tmp/gnupg/bin.
  4. Set LD_LIBRARY_PATH to ~/tmp/gnupg/lib.
  5. Generate 'plaint text' as n bytes from /dev/urandom.
  6. Encode 'plain text' w/ a list of all supported symmetric ciphers.
  7. Print the results.

Ruby script that does this can be cloned form You'll need gpgme & benchmark-ips gems. Run the file benchmark from the cloned dir.


AMD Sempron 145, Linux 3.11.7-200.fc19.x86_64

$ ./benchmark /opt/tmp/gnupg $((256*1024*1024))
Plain text size: 268,435,456B
Calculating -------------------------------------
                idea     1.000  i/100ms
                3des     1.000  i/100ms
               cast5     1.000  i/100ms
            blowfish     1.000  i/100ms
                 aes     1.000  i/100ms
              aes192     1.000  i/100ms
              aes256     1.000  i/100ms
             twofish     1.000  i/100ms
         camellia128     1.000  i/100ms
         camellia192     1.000  i/100ms
         camellia256     1.000  i/100ms
                idea      0.051  (± 0.0%) i/s -      1.000  in  19.443114s
                3des      0.037  (± 0.0%) i/s -      1.000  in  27.137538s
               cast5      0.059  (± 0.0%) i/s -      1.000  in  16.850647s
            blowfish      0.058  (± 0.0%) i/s -      1.000  in  17.183059s
                 aes      0.059  (± 0.0%) i/s -      1.000  in  17.080337s
              aes192      0.057  (± 0.0%) i/s -      1.000  in  17.516253s
              aes256      0.057  (± 0.0%) i/s -      1.000  in  17.673528s
             twofish      0.057  (± 0.0%) i/s -      1.000  in  17.533964s
         camellia128      0.054  (± 0.0%) i/s -      1.000  in  18.359755s
         camellia192      0.053  (± 0.0%) i/s -      1.000  in  18.712756s
         camellia256      0.054  (± 0.0%) i/s -      1.000  in  18.684303s

               cast5:        0.1 i/s
                 aes:        0.1 i/s - 1.01x slower
            blowfish:        0.1 i/s - 1.02x slower
              aes192:        0.1 i/s - 1.04x slower
             twofish:        0.1 i/s - 1.04x slower
              aes256:        0.1 i/s - 1.05x slower
         camellia128:        0.1 i/s - 1.09x slower
         camellia256:        0.1 i/s - 1.11x slower
         camellia192:        0.1 i/s - 1.11x slower
                idea:        0.1 i/s - 1.15x slower
                3des:        0.0 i/s - 1.61x slower

Algo         Total Iterations
       idea          2
       3des          2
      cast5          2
   blowfish          2
        aes          2
     aes192          2
     aes256          2
    twofish          2
camellia128          2
camellia192          2
camellia256          2

As we see, 3DES is indeed slower that Rijndael.

(The plot is written in Grap. It doesn't really matter but I wanted to show off that I was tinkering w/ a Bell Labs language from 1984 that nobody is using anymore.)

In the repo above there is the result for 3G blob (w/ compression turned on), where Ruby garbage collector has run amok.

Wednesday, December 3, 2014


It has been almost 2 month since YC folks have announced their official Hacker News API & have threatened us w/ an imminent HN design change. (When I say 'us' I mean authors of various Chrome extensions or web scrapers.)

Writing YA interface on a top of a common backend is exciting only if you are 17 y.o. Instead of inventing a 'new' forum-like view I've decided to make a one-way HN to NNTP 'convertor', so that I can read HN in mutt. Like this:


Because of a history of newsreaders UI, reading something that represents a newsgroup means:

  1. Being able not to read the same post (article) twice (the client software marks old articles).
  2. Local filtering. Highlighting favourite authors, hiding trolls, sorting by date, thread, etc.
  3. The offline mode (if you have a local NNTP server on your laptop).

Some time ago I've specifically wrote a Chrome extension for items 1-2, but have never impemented the custom thread sorting in it.

Moving your reading activities to mutt has its disadvantages:

  • No up-voting.

  • No score updates.

  • Once article is fetched & posted, it's very cumbersome to post it again if the content of it changes. You have to check w/ the server if it has the article w/ a particular message id, check for body differences, change the message id of a new article (otherwise the server will reject it as a duplicate), and possibly modify its References header to point it out to the old version.

    In short, I didn't do that. Once the article is posted it stays the same.

The original idea was to run some-gateway as a daemon that would have monitor for NH updates & would have immidiately convert new stories/comments. That turned out to be impractical because my laptop isn't on 24/365. Instead I took an old usenet path: donwload a bunch of articles & read them later.

The old way has 2 primary advantages:

  • There is no need to save the program state, because if we download an article twice (now & in the previous run), NNTP server will reject a duplicate.
  • It can help w/ HN addiction. You run some-convertor once a day & read all the interesting staff in your scheduled 'NH time'.

Then, if we use a decent article injector, it'll spool undelivered articles (for example if the NNTP server isn't responding) & post them in the next run automatically.

In the end, I run

  $ hackernews2nntp-get top100 -v | hackernews2nntp-convert -v | sudo rnews -N  

once a day & practically never visit the HN website.

You can read more about the convertor here:

Saturday, September 13, 2014

Porting Code to MRuby

If you take a random library from Ruby stdlib & try to use it under mruby, expect failure. If everything seems to work out of the box it's either (a) a miracle or (b) (more likely) you haven't tested the library enough.

The 1st thing I've tried to bring to minirake was FileList. It turned out that FileList uses Dir.glob (glob wasn't implementated in mruby-dir). It turned out that Dir.glob internally uses File.fnmatch (fnmatch wasn't implemented in mruby-io).

Have you ever used File.fnmatch in your code? You usually stumble across its pattern language only as sub-patterns of Dir.glob patterns. For example, Dir.glob adds ** & { } syntax.

In MRI, File.fnmatch is implemented in C. Extracting it to a plain C library w/o Ruby dependency is relatively quick & simple. This is how Rubinius team ported it & so did I. There nothing interesting about the library except maybe the notion that for some reason MRI version returns 0 as a mark of successful match & 1 otherwise.

Dir.glob is a more complex story. Again, in MRI it's implemented in C. At 1st I wanted to do for glob the same job as for fnmatch but glob has too many calls to MRI API that hasn't direct equivalents in mruby. I was lucky not to have to mess with C because Rubinius had its own version of Dir.glob written in Ruby.

It didn't go so smoothly as I hoped because the code isn't a 'pure' Ruby but an Rubinius version of it with annoying calls like Rubinius::LRUCache, Regexp.match_from, String.byteslice. (The last one is from Ruby 1.9+ but mruby still lacks it.)

After the porting struggle I checked the result with unit tests for Dir.glob from MRI & amazingly they worked fine which was a pleasant surprise because I wasn't expecting the good outcome.

Then came FileList turn

As every library that was written by Jim Weirich it's (a) very well documented, (b) uses metaprogramming a lot.

While changing class_eval calls with interpolated strings to a class_eval with blocks & define_method was easy, bugs have started to arrive from unexpected & funny areas. For example:

$ ruby -e "p File.join ['a', 'b', 'c']"


$ mruby -e "p File.join ['a', 'b', 'c']"
["a", "b", "c"]

Or even better:

$ ruby -e 'p [nil] <=> [nil]'
$ mruby -e 'p [nil] <=> [nil]'
        [1] mrblib/array.rb:166:in Array.<=>
        [0] -e:1
mrblib/array.rb:166: undefined method '<=>' for nil (NoMethodError)

The same goes for NilClass & <=>. File.extname behaves differently, File.split is missing, etc.

In many cases it isn't mruby fault but mrbgem libraries, but the whole ecosystem is in a state that isn't suitable for people with weak nerves. Sometimes I thought that 'm' in mruby actually means 'masochistic'.

After the porting struggle with Array methods like | & + I took unit tests from Rake & amazingly they worked almost file (there is no StringIO in mruby) which wasn't a pleasant surprise because at that point I got angry.


Do you know that __FILE__ is a keyword & __dir__ is a method? You can monkey patch __dir__ in any moment, but can do nothing to __FILE__. I didn't know that.

Making an executable with mruby involves producing the bytecode which can be statically linked to the executable & loaded via mrb_read_irep function at the runtime.

Bytecode can be generated with mrbc CL utility that ships with mruby. It sets value for __FILE__ according to its CL arguments. For example:

$ mrbc -b mycode foo/bar/main.rb

will set __FILE__, for bytecoded main.rb, to foo/bar/main.rb. If you have an executable named foobar & use main.rb as an entry point it your Ruby code, the classic trick

do_someting if __FILE__ == $0

won't give the result you've expected.

At 1st I thought of overriding __FILE__ but it turned out that that wasn't possible. Then I thought of setting __FILE__ after the bytecode was generated but wasn't able to figure out how to do it w/o coredumping. At the end I patched mrbc to be able to pass the required value from CL which means, to be compiled, minirake requires now a patched version of mruby. Great. :(


The last missing part of Rake I wanted to have was FileUtils. It may seems like useless & superfluous but we like Ruby for DSLs, thus its more idiomatic to write

mkdir 'foo/bar'


sh "mkdir -p foo/bar"

or even

exit 1 unless system "mkdir -p foo/bar" # [1]

FileUtils has some nice properties like the ability to print on demand what is happening or turn on 'no write' mode. For example, if you

include FileUtils::NoWrite

any 'destructive' command like rm or touch will do nothing.

I've looked into stdlib fileutils.rb & have quickly gave up. It's too much work to port it to mruby. Then I thought of making a thin wrapper around system commands with an FileUtils compatible API.

The idea is to generate a several sets of wrappers around simple methods in some FileUtilsSimple::Commands namespace so that user will never execute them directly but only through pre-generated static wrapper that decide what to do with a command.

Acquiring a list of singleton methods is easy but mruby never makes your life easy enough. The next mruby present was an absence of Kernel.method method. I don't even.

Unit Tests

Don't get tempted to test the ported code under MRI because your favorite test framework runs only under cruby. I've bumped into several occasions where test passes fine under cruby & fail miserably under mruby.

[1]Did I mention that Kernel.system just return a boolean & doesn't set $?? (Make a random guess in which implementation.)

Saturday, August 23, 2014

Mruby & A Self-Contained Subset of Rake

Since last time I've checked mruby, many things had changed. The biggest one was the introduction of compiled-time plugins that were confusingly called mrbgems. I have a completely different image in mind when I hear words ruby & gem together.

Still no love for require from matz.

To get an interpreter that is useful IRL, it's possible to cherry pick from a list of mrbgems. mruby-require plugin, sorry, gem is the most confusing one. If you specify it before other plugins, sorry, gems, all other gems (below it) will be compiled as .so libs & to use them you would write require foo & would immediately lose compatibility with MRI. After that, the helper

def mruby?
  RUBY_ENGINE == 'mruby'

& conditional checks is the only answer.

mruby build system is interesting. It uses a nano-version of Rake called minirake. By an unknown reason it's incompatible with mruby. At that point I thought "How would be cool to have rake as a standalone executable that doesn't depend on Ruby at all?".

What it has to do with mruby? It turns out, mruby can produce an array of bytecode that can be compiled with your C program into 1 executable.

It sounds cool but has its limitations. Firstly, you'll need to inline all your require statements to have 1 .rb source file. Secondly, remember, there is no stdlib in mruby. Plugins, sorry, gems, that try to bring it to mruby are nice but incomplete (for example, Dir misses glob).

You'll find problems in areas you've never imagined. For example, Ruby ISO standard doesn't mention ARGV & $0 (that't what I heard, the pdf paper is under 198 CHF paywall) which means, right, no ARGV & $0 by default--you'll need to look in mirb src to guess how to inject them.

Btw, googling won't help much, because most blog posts about mruby were written in 2012 & API is different now => old examples are mostly useless.

Back to rake. Porting 'real' rake is a daunting task. I just took minirake source, tweaked them a bit & wrote a tiny C wrapper with a couple of rakefiles: Amazingly it seems to work. Glory to Japan!

Wednesday, April 30, 2014


Last weekend I wrote a small Chrome extension that helps me to avoid facebook & livejournal. I mean I allow myself to stare at them for 10 minutes max & then Antislaker (the extension) kicks in & blocks those 2 domains for the rest of the day.

The idea is this:

  1. In we look for a domain name match. If the match was successfully, we inject a chunk of JS code.
  2. The injected peace of the code contains a counter that every 5 seconds updates a record in localStore.
  3. When a time limit comes, we move user to our internal page withing Chrome extension that shows a random Dilbert comics.

The most tricky part was making a 'mutex' for localStore records. Because the user can open several facebook pages & the counter (in the worst case) will count 2 times faster. It's actually a pity that we don't have any concurrency primitives in JS & so we have to invert poor man's busy waiting when using timers.

Thursday, October 24, 2013

Multi-Lingual Interface With Jekyll

Imagine you have a site in N languages, for example, in English & Ukrainian. The content of articles is different & cannot be auto-translated, but we can ensure that the GUI on each part of the site is localized. All menus, buttons, tooltips, etc can be edited or localized without modifying the app source code.

Let's start with the example.

$ jekyll --version
jekyll 1.2.1

$ jekyll new blog && cd blog && mkdir _plugins

$ for i in en uk
  (mkdir $i \
  && cd $i && ln -s ../css ../_l* ../_plugins . \
  && cp -r ../*yml ../_posts ../*html .); \

$ rm -rf _config.yml _posts index.html

For each section we copied _posts, _config.yml & index.html, because for each site they all are different, and we symlinked css & _layouts directories because for each site they will be the same.

The site's structure looks like this:

|__ ..
|__ default.html
|__ post.html
|__ ..
|__ ..
|__ main.css
|__ syntax.css
|__ ..
|__ _layouts@ -> ../_layouts/
|__ _plugins@ -> ../_plugins/
|__ _posts/
|__ css@ -> ../css/
|__ _config.yml
|__ index.html
|__ ..
|__ _layouts@ -> ../_layouts/
|__ _plugins@ -> ../_plugins/
|__ _posts/
|__ css@ -> ../css/
|__ _config.yml
|__ index.html

Now, install jekyll-msgcat:

$ gem install jekyll-msgcat

Create _plugins/req.rb file & add to it 1 line:

require 'jekyll/msgcat'

Add to uk/_config.yml:

  locale: uk
  # may be 'domain' or 'nearby'
  deploy: nearby

Then open _layouts/default.html, find the line

<a class="extra" href="/">home</a>

& replace it with:

<a class="extra" href="/">{{ 'Return to Home' | mc }}</a>

(Quotes are essential.)

As you see, we are using some unknown Liquid filter 'mc'. If you check the uk site

$ (cd uk; jekyll serve)

& go to, nothing will change, everything would be in English as before. To automatically substitute 'Return to Home' string with something else we need to create a message catalog.

In our case, the message catalog is just a .yaml file:

$ cat uk/_msgcat.yaml
  'Return to Home': На головну сторiнку

What is handy about this is that if the string in the message catalog isn't provided or if there is no _msgcat.yaml file at all, the default English string would be used. Kill jekyll's server & start it again to test.

Links to Localized Versions

The other problem you may have is how to generate a link from a current page to the same page in other language.

If you choose to host each site on a separate subdomain, e.g. &, set the value of``msgcat.deploy`` key in site's _config.yml to domain. If you like a scheme without subdomains & prefer &&, set the key's value to nearby.

Make sure you have url & baseurl in _config.yml. In Liquid templates use cur_page_in_another_locale filter. For example, in _layouts/default.html:

{{ 'en' | cur_page_in_another_locale }}
{{ 'uk' | cur_page_in_another_locale }}

will generate in en site (msgcat.deploy == domain):

<a href='#' class='btn btn-primary btn-xs disabled'>en</a>
<a href='' class='btn btn-primary btn-xs '>uk</a>

or for msgcat.deploy == nearby:

<a href='#' class='btn btn-primary btn-xs disabled'>en</a>
<a href='/blog/uk/index.html' class='btn btn-primary btn-xs '>uk</a>

If you don't like injected names of Bootstrap's CSS classes, use the filter with an empty parameter:

{{ 'en' | cur_page_in_another_locale: "" }}
{{ 'uk' | cur_page_in_another_locale: "" }}

Or provide your own class name(s) instead of the empty string.

Friday, March 1, 2013

Creating Emacs Multi-file Packages

(This text assumes your familiarity with the difference between simple vs. multi-file packages in Emacs, how to create them, etc.)

After writing NAME-pkg.el, creating tar file & successfully installing a package from your local test archive, you may notice a small problem: the package meta information (its version, name, etc) appears in 2 or 3 places. Take, for example, a version number:

  • it's sitting somewhere in the code as a variable value;
  • it exists in NAME-pkg.el;
  • it's stored in Makefile because your target must be aware of the output file name (which must contain the version number).

Some even prefer to include it in README.

In other package systems like npm, this is a non-issue, because their package.json file that contains all the meta can be a first class citizen in the libraries that npm delivers. It's trivial to parse it & there are nive CLI tools like jsontool that can be used in Makefiles to extract any data from package.json.

Of course we can 'parse' our NAME-pkg.el file too. This snippet will read foobar-pkg.el file and return the version string from it:

(nth 2 (package-read-from-string

But it won't solve the problem with Makefile. For instance, you'll need to write a custom CLI util only to grab package's name & version from NAME-pkg.el.


Instead we'll take another path & store all information about our package in a .json file. JSON can be easily parsed in elisp & with jsontool's help we can extract all data within Makefile.

meta.json may look like this:

    "name" : "foobar",
    "version" : "0.0.1",
    "docstring" : "Free variables and bound variables",
    "reqs" : {
        "emacs" : "24.3"
    "repo" : {
        "type": "git",
        "url" : "git://"
    "homepage" : "",
    "files" : [

If you're not familiar with jsontool, install it via npm -g jsontool & play:

$ json name < meta.json
$ json files < meta.json | json -a
$ json -a -d- name version < meta.json

It's very handy.

Getting Meta Into Elisp

That .json file can be parsed once while our package is loading into Emacs. We can wrap that in a library, for example, foo-metadata.el:

(require 'json)

(defvar foo-meta (json-read-file
                 (concat (file-name-directory load-file-name) "/meta.json")))

(defconst foo-meta-version (cdr (assoc 'version foo-meta)))
(defconst foo-meta-name (cdr (assoc 'name foo-meta)))

(provide 'foo-metadata)

Then you just write (require 'foo-metadata) in your code.

Package Generation

Consider the minimal multi-file structure of some Foobar project:

|__ ..
|__ bin/
|   |__ ..
|   |__ foo-make-pkg
|__ Makefile
|__ fb-bar.el
|__ fb-foo.el
|__ fb-foobar.el
|__ meta.json

Notice that file foobar-pkg.el is missing. Instead we have strange bin/foo-make-pkg utility that generates it. If we write it properly enough we can reuse it in another emacs project:

:; exec emacs -Q --script "$0" -- "$@" # -*- mode: emacs-lisp; lexical-binding: t -*-

 debug-on-error t                     ; show stack stace
 argv (cdr argv))                     ; remove '--' from CL arguments

(require 'json)

(when (not (= 2 (length argv)))
  (message "Usage: %s meta.json some-pkg.el" (file-name-base load-file-name))
  (kill-emacs 1))

(setq data (json-read-file (car argv)))

(setq reqs (cdr (assoc 'reqs data)))
(when reqs
  (let (rlist)
    (dolist (idx reqs)
      (push (list (car idx) (cdr idx)) rlist))
    (setq reqs `(quote ,rlist))

    (nth 1 argv)
  (insert (prin1-to-string
           (list 'define-package
                 (cdr (assoc 'name data))
                 (cdr (assoc 'version data))
                 (cdr (assoc 'docstring data))

Test it by running:

$ bin/foo-make-pkg meta.json foobar-pkg.el && cat !#:1
(define-package "foobar" "0.0.1" \
    "Free variables and bound variables" (quote ((emacs "24.3"))))

To bring all together we need 2 targets in Makefile: foobar-pkg.el that generates that file & a phony target package that creates elpa-compatible tar.

.PHONY: clean package

JSON := json
METADATA := meta.json
PKG_NAME := $(shell $(JSON) -a -d- name version < $(METADATA))

foobar-pkg.el: meta.json
    bin/foo-make-pkg $@

package: foobar-pkg.el
    $(TAR) --transform='s,^,$(PKG_NAME)/,S' -cf $(PKG_NAME).tar \
        `$(JSON) files < $(METADATA) | $(JSON) -a`

    rm foobar-pkg.el $(PKG_NAME).tar

Recall that with meta.json we have 1 definitive source of all project metadata, so when you'll need to update the version number or the project dependencies or the contents of the tar or whatever--you'll edit only 1 file.

There is, of course, another route--even without any file generation. For example, you can gently parse foobar-pkg.el in elisp & have an utility that from static foobar-pkg.el produces JSON, which goes to jsontool input.

Thursday, February 28, 2013

Emacs, ERT & Structuring Unit Tests

ERT framework, that everyone is using this days in Emacs, provide very little guidance on how to organize & structure unit tests.

Running tests in the Emacs you are working in is quote idiotic. Not only you can easily pollute editor's global namespace in case of mistyping, but unit tests in such mode cannot be reliable at all, because it's possible to create unwanted dependencies on a data structures that weren't properly destroyed in previous tests invocations.

Emacs batch mode

The only one right way to execute tests is to use emacs batch mode. The idea is: your Makefile contains test target which goes to test directory, which contains several test_*.el files. Each test_*.el file can be run independently & has a test selector (a regexp) that you may optionally provide as a command line parameter.

For example, consider some Foobar project:

|__ ..
|__ test/
|   |__ ..
|   |__ test_bar.el
|   |__ test_foo.el
|   |__ test_utils.el
|__ Makefile
|__ foo-bar.el
|__ foo-foo.el
|__ foo-foobar.el
|__ foo-utils.el

To make this work, each test_* file must know where to find foo-*.el libraries & how to run its tests. Ideally it should not depend on a current directory from which user actually runs it.

test_utils.el script then looks like:

:; exec emacs -Q --script "$0" -- "$@"

(setq tdd-lib-dir (concat (file-name-directory load-file-name) "/.."))
(push tdd-lib-dir load-path)
(push (file-name-directory load-file-name) load-path)

(setq argv (cdr argv))

(require 'foo-utils)

(ert-deftest ignorance-is-strength()
  (should (equal (foo-utils-agenda) "war is peace")))

(ert-run-tests-batch-and-exit (car argv))

Here is quite a header before the ert-deftest definition.

1st line is a way to tell your kernel & bash to run emacs with current file as an argument. -Q option forces Emacs not to read your ~/.emacs file, not to process X resource, etc. This helps (a) to start Emacs as quickly as possible & (b) to force your code not to depend on your local customizations.

Next 3 lines modify load-path list which is used by Emacs to search for files when you 'require' or 'load' something. We add to that list a parent directory, where our *.el files are. Note that load-file-name contains an absolute path to the current test_utils.el file.

Next line removes '--' cell from argv list, so that (car argv) will give you 1st command line parameter passed to the script.

(require 'foo-utils) line loads ../foo-utils.el file (if you have provided 'foo-utils in it, of course).

Next 2 lines are usual ERT test definition with 1 assertion in this example.

The last line is a ERT command that runs your unit tests. Notice its argument--it allows you to optionally run the script as:

$ ./test_utils.el regexp

to filter out unmatched ert-deftest definitions.


You can add to it 2 useful targets: test & compile. The last one transforms .el files to .elc & sometimes produces useful info about unused variables, etc:

.PHONY: test compile clean

ELC := $(patsubst %.el,%.elc,$(wildcard *.el))

%.elc: %.el
    emacs -Q -batch -L `pwd` -f batch-byte-compile $<

    @for idx in test/test_*; do \
        printf '* %s\n' $$idx ; \
        ./$$idx ; \
        [ $$? -ne 0 ] && exit 1 ; \
    done; :

compile: $(ELC)

    rm $(ELC)


Try to make every test non-interactive. For example, if your command ask user for confirmation via (y-or-n-p), Emacs even in batch mode stops and waits for input from the terminal. If you need to answer "yes", just monkey patch the function:

(setq tdd-y-or-n nil) ;; by default say "no"
(defun y-or-n-p (prompt)

and then write an assert as:

(let ((tdd-y-or-n t))
  (should (freedom-is-slavery)))

You can monkey patch any elisp function except those which are compiled in (e.g. come from .c files & are 'primitive' in Emacs terminology).

Unfortunately, famous (message) function is built-in & cannot be monkey patched. If you use it heavily in the code, your non-interactive tests will fill the stderr with garbage that will distract you. It's better to use a global (to your project namespace) flag & a wrapper for (message):

(defconst foo-meta-name "foobar")
(defvar foo-verbose 1)

(defun foo-warn (level str &rest args)
"Print a message via (message) according to LEVEL."
(when (<= level foo-verbose)
  (if (/= 0 level) (setq str (concat foo-meta-name ": " str)))
  (message (apply 'format str args))

Then use (foo-warn 1 "hi, mom") in the code instead of (message). In .el libraries foo-verbose variable can be equal to 1, but in your tests set it to -1 to prevent printing to stderr.

Friday, January 25, 2013

ssh command quoting hell

When you type

$ ssh user@host 'cat /tmp/foo.txt'

cat /tmp/foo.txt part of that string is evaluated twice: 1) by your current shell as a single quoted string, 2) by a shell on a remote host.

Lets assume you want to write a script that backups some directory from a remote machine. A naive version:

$ cat

[ -z "$1" -o -z "$2" ] && exit 1

ssh user@host "tar cvf - -C $tcd $tdir | gzip" > foo.tar.gz

and if you run it like this:

$ ./ /home joe

And if everything goes ok, you'll get foo.tar.gz which will contain joe's home directory files. But what if $1 or $2 arguments contain spaces and/or quotes? I'll tell you:

$ ./ /home/joe 'tmp/foo "bar'
bash: -c: line 0: unexpected EOF while looking for matching `"'
bash: -c: line 1: syntax error: unexpected end of file

This is a bash error from a remote host because it tries to run

tar czv -C /home/joe tmp/foo "bar | gzip

and "bar contains an unmached quote. Obvously this is not the command you had in mind.

How can we fix that? Another naive approach would be to single-quote some variables in the script:

ssh user@host "tar cvf - -C '$tcd' '$tdir' | gzip" > foo.tar.gz

And this will work for our example but will fail if tmp/foo "bar directory would have a name tmp/foo 'bar (with a single quote instead of a double).

To make it work regardless of such shades we need somehow to transform $1 and $2 script arguments to quoted strings. Such transformed strings shall be a safe choice for substrings that represent to-be-executed commands on the remote host.

One nuance: transforming must be done not by the rules of /bin/sh or your current local shell, but by the rules of user's shell on a remote host. (See do_child() function in session.c of openssh source: it extracts user's shell name from users db on a remote machine & constructs arguments for execve(2) as "/path/to/shell_name", "shell_name", "-c", "foo", "bar".)

If the remote shell is a sh-derived one, the trasformation function can look like:

sq() {
    printf '%s\n' "$*" | sed -e "s/'/'\\\\''/g" -e 1s/^/\'/ -e \$s/\$/\'/

(Taken from of

Then, a final version of the 'backup' script would be:


sq() {
    printf '%s\n' "$*" | sed -e "s/'/'\\\\''/g" -e 1s/^/\'/ -e \$s/\$/\'/

[ -z "$1" -o -z "$2" ] && exit 1

out=`basename "$tdir"`.tar.gz

cmd="tar cvf - -C `sq $tcd` `sq $tdir` | gzip"
echo "$cmd"
ssh user@host "$cmd" > "$out"

Hint: when in doubt, run (openssh) ssh with -v option and search for 'debug1: Sending command' string in the output.

Monday, July 30, 2012

Gmake Acrobatics

Lets start with obvious. Suppose you have in your Makefile several variables:

DB_PORT := 5432
DB_USER := joe
DB_NAME := test

Variable values come from some configuration file outside of this Makefile. There is no point of replication such information & holding it in 2 places (config & Makefile). So you start thinking like this: "I'll just read my config file from my Makefile and assign variables dynamically. That's easy."

Suppose the config is in json format. Using handy jsontool we can write:

  DB_HOST := $(shell json < myconfig.json)  

Okay. But with this approach we need to execute jsontool every time for each variable. For n variable this will be exactly n forks. Suddenly every task in your Makefile becomes a little (or not a little) sluggish.

It is possible, of course, to execute jsontool only once and get a newline separated 'list' of all values:

$ json db.port db.user < myconfig.json

But how do you map those into Makefile variables?

In every other language this would be very simple: iterate over a list of variable names; for each name, construct a string var := value and feed it to eval function. Fot example, in lovely CoffeeScript:

values = ['a', 'b']
MyEval "#{v} := #{values[count++]}" for v,count in ['DB_HOST', 'DB_PORT']  

Try to translate this into gmake code & you will struggle badly. gmake can iterate on a string, splitting it with spaces. It has eval function which can parse & evaluate its makefile language. It even has some simple helper functions for manipulating strings, for example $(word 2,this is nice) would return the word 'is'.

What is doesn't have is basic arithmetic support. You can't add 1 + 1 in it without executing a shell script or whatever.

Googling may bring up this link to you, which is hilarious in its very own way but has a great idea: if we need a simple counter, we can start it with an empty string (yes, string), call gmake function $(words $(string)) which will return 0. Then we concatenate the string with another string containing a space & a char (for example, ' x'), call function $(words $(string)) again & it will return 1. And so on.

This is what I ended with:

# Create a batch of variables on-the-fly.
# _left contains variable names, _right--their corresponding values.
_right := $(shell json db_name db.user db.port < myconfig.json)
_n := 1
define _dvars
$(i) := $$(word $$(words $$(_n)),$$(_right))
_n := $$(_n) x
$(foreach i,$(_left),$(eval $(_dvars)))

It works but looks ugly. Those define...endif construction is just a way to define a variable that contains newlines. eval function evaluates it twice for every iteration. This is why we need $$ in front of every dynamic construction except of $(i) parameter.

Hint: user Rake & don't waste your time.

Monday, June 18, 2012

On-the-fly Generator of Preferences Pages for Opera Extensions

If you've ever tried to write an Opera extension, then you probably have stumbled upon a process of handling preferences for your extensions.

When a user clinks 'Preferences' in an extension menu button, Opera reads options.html file from the installed extension. What goes to that options.html is up to the developer. Nothing prevents him to display a lolcat video instead of html forms.

The process of writing options.html if everything except creative--it's the same boring crap over & over again for every new extension. I don't get why you even have to do this--Opera could have an API to help automatically generate preferences pages like it have it internally for the browser (opera:config). But there isn't API for this & nobody pushes for it.

So image you're writing in a declarative way what preferences your extension needs & the browser is drawing GUI elements according to your specification. Not code, just a declaration.

If you agree to this approach & don't want to waste your time on a dumb & repetitive staff, see weakspec on-the fly generator. You'll probably like it.

Friday, April 13, 2012

How To Disable Rack Logging in Sinatra

To completely & finally turn the annoying Rack logging off, simple 'set :logging, nil' doesn't help. Instead, insert this monkey patch at the beginning of your Sinatra application:

# Disable useless rack logger completely! Yay, yay!
module Rack
  class CommonLogger
    def call(env)
      # do nothing

Custom RDoc's Darkfish Templates

RDoc 3.12 allows us to specify a template for a formatter. Formatter can be 'ri' or such that emits html. The later is called 'darkfish' in RDoc.

The problem with darkfish is that albeit it contains a quite nice navigation, it hurts my eyes:

Dark grey on light grey! Very artistic choice, of course. I believe it's very possible to invent even worse combination, like red on green, but I still don't get how anybody can like the absence of a contrast.

Anyway, here is a solution: another template for darkfish. (Not another formatter.)

RDoc allows that if you install another template as a gem, because it looks for templates only in rdoc/generator/template directory in Ruby's $LOAD_PATH.

What if you want to generate alternate looking html from a particular Rakefile without messing up with system gems?

  1. Copy the original template to some editable place, for example:

    % cp /usr/[...]/1.9/gems/rdoc-3.12/lib/rdoc/generator/template/darkfish \

    'lightfish' is out new template in this example.

  2. Edit lightfish/rdoc.css to remove ugly colors, fonts, etc.

  3. Add a small monkey patch to your project's Rakefile:

    class RDoc::Options
      def template_dir_for template
        '/home/alex/Desktop/' + template
    end'html') do |i|
      i.template = 'lightfish'
      i.main = 'README.rdoc'
      i.rdoc_files = FileList['doc/*', 'lib/**/*.rb', '*.rb']

    template_dir_for() function is a key to succsess.

  4. Run rake html. RDoc must not complain about 'could not find template lightfish'.

But there is even simpler method with $LOAD_PATH. Make rdoc/generator/template directory somewhere, for example in the project's root directory. Move in it a modified template from steps 1-2 above and just run (assuming rake's target for generating documentation is still called 'html'):

% rake -I . html