Sunday, August 16, 2015

What is Ruby power_assert gem & why you may need it

After upgrading from Ruby 2.1.3 to 2.2.2 I've noticed a new bundled gem called power_assert. It turned out that test-unit requires it for like a year now. It was a 2nd surprise, because I thought that everyone's moved to minitest many years ago & test-unit was left alone for the backward-compatibility sake.

A 'power assert' enabled test-unit has an enhanced version of assert() that can take a block & in a case of failure print values for an each object in a method chain. If no block is given to this new assert(), the old one version is invoked.

$ cat example-1.rb
require 'test/unit'

class Hello < Test::Unit::TestCase
  def test_smoke
    assert { 3.times.include? 10 }
  end
end

$ ruby example-1.rb | sed -n '/==/,/==/p'
===============================================================================
Failure:
      assert { 3.times.include? 10 }
                 |     |
                 |     false
                 #<Enumerator: 3:times>
test_smoke(Hello)
/home/alex/.rvm/gems/ruby-2.2.2@global/gems/power_assert-0.2.2/lib/power_assert.
rb:29:in `start'
example-1.rb:5:in `test_smoke'
     2:
     3: class Hello < Test::Unit::TestCase
     4:   def test_smoke
  => 5:     assert { 3.times.include? 10 }
     6:   end
     7: end
===============================================================================

As I understand, Kazuki Tsujimoto (the author of power_assert gem) got the idea for a pretty picture for a method chain from the Groovy language. Before power_assert gem we could only use Object.tap() for peeking into the chain:

> ('a'..'c').to_a.tap {|i| p i}.map {|i| i.upcase }
["a", "b", "c"]
[
  [0] "A",
  [1] "B",
  [2] "C"
]

Using power_assert we can write a enhanced version of Kernel.p(), where in the spirit of the new assert(), it prints a fancy picture if a user provides a block for it:

$ cat super_duper_p.rb
require 'power_assert'

def p *args
  if block_given?
    PowerAssert.start(Proc.new, assertion_method: __callee__) do |pa|
      val = pa.yield
      str = pa.message_proc.call
      if str == "" then Kernel.p(val) else puts str end
      val
    end
  else
    Kernel.p(*args)
  end
end

$ cat example-2.rb
require './super_duper_p'

p {3.times.to_a.map {|i| "i=#{i}" }.include? 3}
p [1,2,3], [4,5,6], "7"
p { [1,2,3] }

$ ruby example-2.rb
p {3.times.to_a.map {|i| "i=#{i}" }.include? 3}
     |     |    |                   |
     |     |    |                   false
     |     |    ["i=0", "i=1", "i=2"]
     |     [0, 1, 2]
     #<Enumerator: 3:times>
[1, 2, 3]
[4, 5, 6]
"7"
[1, 2, 3]

Unfortunately, it won't work in irb.

If you're like the rest of us who prefer minitest instead of test-unit, you'll need a separate gem for it.

Thursday, July 16, 2015

iojs API docs in Texinfo format

I wanted to hold on until the day of node & iojs convergence, but sadly the convergence apparently ain't gonna happen this year.

So, for those who like to read docs in Emacs & not in a browser I wrote a simple converter from iojs .md files to the Texinfo format. As a byproduct of this, it's now possible to auto check broken cross-references in the iojs docs.

Why read docs in Emacs? We automatically get

  • Searching
  • Index

(None of which are available in the current md->html iojs tooling.)

To play w/ the index, go to the iojs node & press i. Using the index is unbelievably handy after you get used to it.

If you think that Texinfo is a complex, outdated & obscure thing, I have a quote for you from Eli Zaretskii:

What is it with you young people that you are so afraid of "barriers"? Did someone sell you a fairy tail that there are no barriers in life, except in Emacs and Texinfo? If you cannot negotiate these ridiculously low "barriers", how will you ever succeed in your life out there?

Tuesday, July 14, 2015

Firefox & Antialiasing

Firefox is the only browser that continually annoys me with its 'liberal' reading of my fontconfig configuration. I don't use Firefox as my primary browser so when I need to run it to test some new API hotness I usually cry of frustration.

Take fonts for example. In ~/.config/fontconfig/fonts.conf I have this:

<fontconfig>
  [...]

  <!-- antialiasing is off for truetype fonts -->
  <match target="font">
    <test name="fontformat">
      <string>TrueType</string>
    </test>
    <edit mode="assign" name="antialias">
      <bool>false</bool>
    </edit>
  </match>

</fontconfig>

that allows me to have any local TT font rendered (by a program that abides the fontconfig rules) w/o antialiasing. Webfonts, that a browser downloads, in 99.(9)% cases don't come in TT format, so any web page that uses them renders w/ antialiasing as usual. Such a trick works flawlessly w/ Chrome but fails w/ Firefox.

A week ago a nightly version (how they call it, 'mozilla-central'?) suddenly started to behave like Chrome but the surprise didn't last very long: today, simultaneously w/ a never-ending Adobe Flash brouhaha, they broke the font rendering again.

Wednesday, April 8, 2015

GNU Make Shellquote

Sometimes you may have a filename that contains quotes & your usual makefile routines breaks. For example, if you generate

index.Ukrayins'ka.html from index.Ukrayins'ka.md

& add index.Ukrayins'ka.html to clean variable, this classic pattern won't work anymore:

.PHONY: clean
clean:
      rm -rf $(clean)

because your shell will complain about a quote mismatch.

So you need to 'shellquote' a variable clean.

We can write a parameterized function in make that transforms 1 word into a safe shell-quoted string:

clean.shellquote = '$(subst ','\'',$(1))'

The Make manual has a nice example of a map function. That's all we need: we transform each word from clean variable w/ the map function that calls our clean.shellquote routine.

The complete example:

clean.map = $(foreach a,$(2),$(call $(1),$(a)))
clean.shellquote = '$(subst ','\'',$(1))'
# '# emacs font-lock

.PHONY: clean
clean:
      rm -rf $(call clean.map,clean.shellquote,$(clean))

Thursday, March 26, 2015

A Strategy of No Skill

I love this:

Russ: I get an email from a football predictor who says, 'I know who is going to win Monday night. I know which team you should bet on for Monday night football.'

And I get this email, and I think, well, these guys are just a bunch of hacks. I'm not going to pay any attention to it. But it turns out to be right; and of course who knows? It's got a 50-50 chance. But then, for the next 10 weeks he keeps sending me the picks, and I happen to notice that for 10 weeks in a row he gets it right every time. And I know that that can't be done by chance, 10 picks in a row.

He must be a genius. And of course, I'm a sucker. Why?

Guest: So, let's say after those 10 weeks in a row you actually subscribe to this person's predictions. And then they don't do so well, after the 10 weeks.

And the reason is that the original strategy was basically: Send an email to 100,000 people, and in 50,000 of those emails you say that Team A is going to win on Monday. And in 50,000 you say Team B is going to win on Monday.

And then, if Team A wins, the next week you only send to the people that got the correct prediction. So, the next week you do the same thing. 25,000 for Team A, 25,000 for Team B. And you continue doing this. And the size of the number of emails decreases every single week, until after that 10th week, there are 97 people that got 10 picks in a row correct. So you harvest 97 suckers out of this. (http://www.econtalk.org/archives/2015/03/campbell_harvey.html)

Or in other words:

$ irb
2.1.3 :001 > people = 100_000
100000
2.1.3 :002 > 10.times.map { people /= 2 }
[
  [0] 50000,
  [1] 25000,
  [2] 12500,
  [3] 6250,
  [4] 3125,
  [5] 1562,
  [6] 781,
  [7] 390,
  [8] 195,
  [9] 97
]

Saturday, February 21, 2015

A minimalistic node version manager

If you pay attention to nodejs world & suddenly find yourself using 3 version of node simultaneously, you probably may start thinking about a version manager.

There are some existing ones, like nvm & n. They are nice, but both are written in bash & may require a periodic update after a new node/iojs release.

What I want from the 'manager' is that it doesn't integrate itself w/ a shell & doesn't require a constant updating.

A 'non-updating' feature resolves in a drastic code simplification: if a version manager (VM) doesn't know how to install a new node version whatsoever, then you don't need to update its code (hopefully whatsoever too).

A non-bash requirement dates back to rvm, which has been redefining cd for us since 2009. It doesn't mean of course that a VM written in bash would obligatory modify built-in shell commands, but observing the rvm struggle w/ bash, have discouraged me from sh-like solutions.

The VM should be fast, so writing it in Ruby (unfortunately) is not an option, due to a small (but a noticeable) startup overhead that any ruby CLI util has. Ideally it also should have no dependencies.

This leaves us w/ several options. We can use mruby or plain C or, wait, there is Golang! In the past its selling point was a 'system' language feeling.

Well. I can tell that it's not as poignant as Ruby for sure, but it's hyper fast & quite consistent. It took me roughly a day to feel more or less comfortable w/ it, which is incomparable w/ a garbage like C++. Frankly I was surprised myself that it went so smooth.

Back to the YA version manager for node. It's called nodever, it uses a 'subshell' approach via installing system-wide wrappers & it's a tiny Go program.

Saturday, February 7, 2015

node.js 0.12, stdin & spawnSync

If you have in your code a quick hack like this:

stdin = fs.readFileSync('/dev/stdin').toString()

& it works fine & nothing really happens bad, so you may start wondering one day why is it considered by everyone as a temporal solution?

Node's readFileSync() uses stat(2) to get the size of a file it tries to read. By definition, you can't know ahead the size of stdin. As one dude put it on SO:

Imagine stdin is like a water tap. What you are asking is the same as "How much water is there in a tap?".

by using stat(2) readFileSync() will read up to what lenght value the kernel will lie/guess about /dev/stdin.

Another issues comes w/ testing. If you have a CL utility & want to write an acceptance test for it using 'new' node 0.12 child_process.spawnSync() API, expect funny errors.

Suppose we have a node version of cat that's written in a dumb 'synchronous' way. Call it cat-1.js:

#!/usr/bin/env node

var rfs = require('fs').readFileSync

if (process.argv.length == 2) {
        process.stdout.write(rfs('/dev/stdin'))
} else {
        process.argv.slice(2).forEach(function(file) {
                process.stdout.write(rfs(file))
        })
}

Now we write a simple test for it:

var assert = require('assert')
var spawnSync = require('child_process').spawnSync

var r = spawnSync('./cat-1.js', { input: 'hello' })
assert.equal('hello', r.stdout.toString())

& run:

$ node test-cat-1-1.js

assert.js:86
  throw new assert.AssertionError({
        ^
AssertionError: 'hello' == ''
    at Object.<anonymous> (/home/alex/lib/writing/gromnitsky.blogspot.co
m/posts/2015-02-07.1423330840/test-cat-1-1.js:5:8)

What just happened? (I've cut irrelevant trace lines.) Why the captured stdout is empty? Lets change the test to:

var assert = require('assert')
var spawnSync = require('child_process').spawnSync

var r = spawnSync('./cat-1.js', { input: 'hello' })
console.error(r.stderr.toString())

then run:

$ node test-cat-1-2.js
fs.js:502
  return binding.open(pathModule._makeLong(path), stringToFlags(flags),
mode);
                 ^
Error: ENXIO, no such device or address '/dev/stdin'
    at Error (native)
    at Object.fs.openSync (fs.js:502:18)
    at fs.readFileSync (fs.js:354:15)

At this point unless you want to dive into libuv internals, that quick hack of explicitly reading /dev/stdin should be changed to something else.

In the past node maintainers disdained the stdin sync read & called it an antipattern. The recommended way was to use streams API, where you employed process.stdin as a readable stream. Still, what if we really want a sync read?

The easiest way is to make a wrapper around readFileSync() that checks filename argument & invokes a real readFileSync() when it's not equal to /dev/stdin. For example, lets create a simple module readFileSync:

var fs = require('fs')

module.exports = function(file, opt) {
        if ( !(file && file.trim() === '/dev/stdin'))
                return fs.readFileSync(file, opt)

        var BUFSIZ = 65536
        var chunks = []
        while (1) {
                try {
                        var buf = new Buffer(BUFSIZ)
                        var nbytes = fs.readSync(process.stdin.fd, buf, 0, BUFSIZ, null)
                } catch (err) {
                        if (err.code === 'EAGAIN') {
                                // node is funny
                                throw new Error("interactive mode isn't supported, use pipes")
                        }
                        if (err.code === 'EOF') break
                        throw err
                }

                if (nbytes === 0) break
                chunks.push(buf.slice(0, nbytes))
        }

        return Buffer.concat(chunks)
}

It's far from ideal, but at least it doesn't use stat(2) for determining stdin size.

We modify out cat version to use this module:

#!/usr/bin/env node

var rfs = require('./readFileSync')

if (process.argv.length == 2) {
        process.stdout.write(rfs('/dev/stdin'))
} else {
        process.argv.slice(2).forEach(function(file) {
                process.stdout.write(rfs(file))
        })
}

& modify the original version of the acceptance test to use it too:

var assert = require('assert')
var spawnSync = require('child_process').spawnSync

var r = spawnSync('./cat-2.js', { input: 'hello' })
assert.equal('hello', r.stdout.toString())

& run:

$ node test-cat-2-1.js

Yay, it doesn't throw up an error & apparently works!

To be sure, generate a big file, like 128MB:

$ head -c $((128*1024*1024)) < /dev/urandom > 128M

then run:

$ cat 128M | ./cat-2.js > 1
$ cmp 128M 1
$ echo $?
0

Which should return 0 if everything was fine & no bytes were lost.

Sunday, December 14, 2014

A Naive Benchmark of GnuPG 2.1 Symmetric Algorithms

Some symmetric algo benchmarks already exist, but still don't answer to a typical question for a typical setup:

I do a regular backup of N (or even K) gigabytes. I don't want the backup to be readable by a random hacker form Russia (if he breaks into my server). What algo should I use to encrypt the backup as fast as possible?

This rules out many existing benchmarks.

The typical setup also includes gpg2. I don't care about synthetic algo tests (like 'I read once that Rijndael is fast & 3DES is slow'), I'm interested in a particular implementation that runs on my machines.

(Note that benchmarks below are not 'scientific' in any way; they are meant to be useful for 1 specific operation only: encrypting binary blobs through ruby-gpeme.)

gpg2 cli program

The first thing I did was to run

$ gpg2 --batch --passphrase 12345 -o out --compress-algo none \
    --cipher-algo '<ALGO>' -c < file.tar.gz

But was quickly saddened because the results weren't consistent: the deviation between runs was too big.

What we needed here was to dissociate the crypto from the IO.

libgcrypt

'Modern' versions of GnuPG have detached a big chunk of the crypto magic into a separate low-level library libgcrypt. If we want to test symmetric ciphers w/o any additional overhead, we can write a nano version of gpg2.

It'll read some bytes from /dev/urandom, pad them (if a block cipher mode requires it), generate an IV, encrypt, prepend the IV to an encrypted text, append a MAC, run that for all libgcrypt supported ciphers. Then we can draw a pretty graph & brag about it to coworkers.

The problem is that there is no any docs (at least I haven't found them) about a general format that gpg2 uses for block ciphers. And you need it because a decipher must be able to know what algo was used, its cipher mode, where to search for a stored IV, etc.

There is OpenPGP RFC 4880 of course:

The data is encrypted in CFB mode, with a CFB shift size equal to the cipher's block size. The Initial Vector (IV) is specified as all zeros. Instead of using an IV, OpenPGP prefixes a string of length equal to the block size of the cipher plus two to the data before it is encrypted.

That's better than nothing, but still leaves us w/ n hours of struggling to write & test code that will produce an encrypted stream suitable for gpg2.

GPGME

GnuPG has an official library that even has bindings for such languages as Ruby. It's an opposite of libgcrypt: it does all the work for you, where libgcrypt doesn't even provide auto padding.

The trouble w/ gpgme is that it was unusable for automated testing purposes until GnuPG hit version 2.1 this fall.

For instance,

  • Versions 2.0.x cannot read passwords w/o pinentry.
  • At the time of writing, 2.1 isn't available on any major Linux distribution (except Arch, but I'm not using it anywhere (maybe I should)).
Writing a Benchmark

ruby-gpgme has a nifty example for symmetric ciphers:

crypto = GPGME::Crypto.new password: '12345'
r = crypto.encrypt "Hello world!\n", symmetric: true

where r.read() will return an encrypted string.

We have 2 problems here:

  1. There is absolutely no way to change through the API the symmetric cipher. (The default one is CAST5.) This isn't a fault of ruby-gpgme, but the very same gpgme library under it.

    GnuPG has a concept of a 'home' directory (it has nothing to do w/ user's home directory, it just uses it as a default). Each 'home' can have its number of configuration files. We need gpg.conf file there w/ a line:

    personal-cipher-preferences <algo>
    
  2. The modest password: '12345' option does nothing unless archaic gpg1 is used. W/ gnupg 2.0.x an annoying pinentry window will pop-up.

    E.g. installing 2.1 is the only option. Instead overwriting the existing 2.0.x installation (and possibly breaking your system), install 2.1 under a separate prefix (for example, to ~/tmp/gnupg).

    Next, for each gpg 'home' directory we need to add to gpg.conf another line:

    pinentry-mode loopback
    

    & create a gpg-agent.conf file w/ a line:

    allow-loopback-pinentry
    

The benchmark works like this:

  1. Before running any crypto operations, for each cipher we create a 'home' directory & fill it w/ custom gpg.conf & gpg-agent.conf files.
  2. Start a bunch of copies of gpg-agent, each for a different 'home' dir.
  3. Add a bin directory of our fresh gnupg 2.1 installation to the PATH, for example ~/tmp/gnupg/bin.
  4. Set LD_LIBRARY_PATH to ~/tmp/gnupg/lib.
  5. Generate 'plaint text' as n bytes from /dev/urandom.
  6. Encode 'plain text' w/ a list of all supported symmetric ciphers.
  7. Print the results.

Ruby script that does this can be cloned form https://github.com/gromnitsky/gpg-algo-speed. You'll need gpgme & benchmark-ips gems. Run the file benchmark from the cloned dir.

Results

AMD Sempron 145, Linux 3.11.7-200.fc19.x86_64

$ ./benchmark /opt/tmp/gnupg $((256*1024*1024))
Plain text size: 268,435,456B
Calculating -------------------------------------
                idea     1.000  i/100ms
                3des     1.000  i/100ms
               cast5     1.000  i/100ms
            blowfish     1.000  i/100ms
                 aes     1.000  i/100ms
              aes192     1.000  i/100ms
              aes256     1.000  i/100ms
             twofish     1.000  i/100ms
         camellia128     1.000  i/100ms
         camellia192     1.000  i/100ms
         camellia256     1.000  i/100ms
-------------------------------------------------
                idea      0.051  (± 0.0%) i/s -      1.000  in  19.443114s
                3des      0.037  (± 0.0%) i/s -      1.000  in  27.137538s
               cast5      0.059  (± 0.0%) i/s -      1.000  in  16.850647s
            blowfish      0.058  (± 0.0%) i/s -      1.000  in  17.183059s
                 aes      0.059  (± 0.0%) i/s -      1.000  in  17.080337s
              aes192      0.057  (± 0.0%) i/s -      1.000  in  17.516253s
              aes256      0.057  (± 0.0%) i/s -      1.000  in  17.673528s
             twofish      0.057  (± 0.0%) i/s -      1.000  in  17.533964s
         camellia128      0.054  (± 0.0%) i/s -      1.000  in  18.359755s
         camellia192      0.053  (± 0.0%) i/s -      1.000  in  18.712756s
         camellia256      0.054  (± 0.0%) i/s -      1.000  in  18.684303s

Comparison:
               cast5:        0.1 i/s
                 aes:        0.1 i/s - 1.01x slower
            blowfish:        0.1 i/s - 1.02x slower
              aes192:        0.1 i/s - 1.04x slower
             twofish:        0.1 i/s - 1.04x slower
              aes256:        0.1 i/s - 1.05x slower
         camellia128:        0.1 i/s - 1.09x slower
         camellia256:        0.1 i/s - 1.11x slower
         camellia192:        0.1 i/s - 1.11x slower
                idea:        0.1 i/s - 1.15x slower
                3des:        0.0 i/s - 1.61x slower

Algo         Total Iterations
       idea          2
       3des          2
      cast5          2
   blowfish          2
        aes          2
     aes192          2
     aes256          2
    twofish          2
camellia128          2
camellia192          2
camellia256          2

As we see, 3DES is indeed slower that Rijndael.

(The plot is written in Grap. It doesn't really matter but I wanted to show off that I was tinkering w/ a Bell Labs language from 1984 that nobody is using anymore.)

In the repo above there is the result for 3G blob (w/ compression turned on), where Ruby garbage collector has run amok.

Wednesday, December 3, 2014

hackernews2nntp

It has been almost 2 month since YC folks have announced their official Hacker News API & have threatened us w/ an imminent HN design change. (When I say 'us' I mean authors of various Chrome extensions or web scrapers.)

Writing YA interface on a top of a common backend is exciting only if you are 17 y.o. Instead of inventing a 'new' forum-like view I've decided to make a one-way HN to NNTP 'convertor', so that I can read HN in mutt. Like this:

https://raw.github.com/gromnitsky/hackernews2nntp/master/screenshot1.png

Why NNTP?

Because of a history of newsreaders UI, reading something that represents a newsgroup means:

  1. Being able not to read the same post (article) twice (the client software marks old articles).
  2. Local filtering. Highlighting favourite authors, hiding trolls, sorting by date, thread, etc.
  3. The offline mode (if you have a local NNTP server on your laptop).

Some time ago I've specifically wrote a Chrome extension for items 1-2, but have never impemented the custom thread sorting in it.

Moving your reading activities to mutt has its disadvantages:

  • No up-voting.

  • No score updates.

  • Once article is fetched & posted, it's very cumbersome to post it again if the content of it changes. You have to check w/ the server if it has the article w/ a particular message id, check for body differences, change the message id of a new article (otherwise the server will reject it as a duplicate), and possibly modify its References header to point it out to the old version.

    In short, I didn't do that. Once the article is posted it stays the same.

The original idea was to run some-gateway as a daemon that would have monitor for NH updates & would have immidiately convert new stories/comments. That turned out to be impractical because my laptop isn't on 24/365. Instead I took an old usenet path: donwload a bunch of articles & read them later.

The old way has 2 primary advantages:

  • There is no need to save the program state, because if we download an article twice (now & in the previous run), NNTP server will reject a duplicate.
  • It can help w/ HN addiction. You run some-convertor once a day & read all the interesting staff in your scheduled 'NH time'.

Then, if we use a decent article injector, it'll spool undelivered articles (for example if the NNTP server isn't responding) & post them in the next run automatically.

In the end, I run

  $ hackernews2nntp-get top100 -v | hackernews2nntp-convert -v | sudo rnews -N  

once a day & practically never visit the HN website.

You can read more about the convertor here: https://github.com/gromnitsky/hackernews2nntp

Saturday, September 13, 2014

Porting Code to MRuby

If you take a random library from Ruby stdlib & try to use it under mruby, expect failure. If everything seems to work out of the box it's either (a) a miracle or (b) (more likely) you haven't tested the library enough.

The 1st thing I've tried to bring to minirake was FileList. It turned out that FileList uses Dir.glob (glob wasn't implementated in mruby-dir). It turned out that Dir.glob internally uses File.fnmatch (fnmatch wasn't implemented in mruby-io).

Have you ever used File.fnmatch in your code? You usually stumble across its pattern language only as sub-patterns of Dir.glob patterns. For example, Dir.glob adds ** & { } syntax.

In MRI, File.fnmatch is implemented in C. Extracting it to a plain C library w/o Ruby dependency is relatively quick & simple. This is how Rubinius team ported it & so did I. There nothing interesting about the library except maybe the notion that for some reason MRI version returns 0 as a mark of successful match & 1 otherwise.

Dir.glob is a more complex story. Again, in MRI it's implemented in C. At 1st I wanted to do for glob the same job as for fnmatch but glob has too many calls to MRI API that hasn't direct equivalents in mruby. I was lucky not to have to mess with C because Rubinius had its own version of Dir.glob written in Ruby.

It didn't go so smoothly as I hoped because the code isn't a 'pure' Ruby but an Rubinius version of it with annoying calls like Rubinius::LRUCache, Regexp.match_from, String.byteslice. (The last one is from Ruby 1.9+ but mruby still lacks it.)

After the porting struggle I checked the result with unit tests for Dir.glob from MRI & amazingly they worked fine which was a pleasant surprise because I wasn't expecting the good outcome.

Then came FileList turn

As every library that was written by Jim Weirich it's (a) very well documented, (b) uses metaprogramming a lot.

While changing class_eval calls with interpolated strings to a class_eval with blocks & define_method was easy, bugs have started to arrive from unexpected & funny areas. For example:

$ ruby -e "p File.join ['a', 'b', 'c']"
"a/b/c

vs.

$ mruby -e "p File.join ['a', 'b', 'c']"
["a", "b", "c"]

Or even better:

$ ruby -e 'p [nil] <=> [nil]'
0
$ mruby -e 'p [nil] <=> [nil]'
trace:
        [1] mrblib/array.rb:166:in Array.<=>
        [0] -e:1
mrblib/array.rb:166: undefined method '<=>' for nil (NoMethodError)

The same goes for NilClass & <=>. File.extname behaves differently, File.split is missing, etc.

In many cases it isn't mruby fault but mrbgem libraries, but the whole ecosystem is in a state that isn't suitable for people with weak nerves. Sometimes I thought that 'm' in mruby actually means 'masochistic'.

After the porting struggle with Array methods like | & + I took unit tests from Rake & amazingly they worked almost file (there is no StringIO in mruby) which wasn't a pleasant surprise because at that point I got angry.

__FILE__

Do you know that __FILE__ is a keyword & __dir__ is a method? You can monkey patch __dir__ in any moment, but can do nothing to __FILE__. I didn't know that.

Making an executable with mruby involves producing the bytecode which can be statically linked to the executable & loaded via mrb_read_irep function at the runtime.

Bytecode can be generated with mrbc CL utility that ships with mruby. It sets value for __FILE__ according to its CL arguments. For example:

$ mrbc -b mycode foo/bar/main.rb

will set __FILE__, for bytecoded main.rb, to foo/bar/main.rb. If you have an executable named foobar & use main.rb as an entry point it your Ruby code, the classic trick

do_someting if __FILE__ == $0

won't give the result you've expected.

At 1st I thought of overriding __FILE__ but it turned out that that wasn't possible. Then I thought of setting __FILE__ after the bytecode was generated but wasn't able to figure out how to do it w/o coredumping. At the end I patched mrbc to be able to pass the required value from CL which means, to be compiled, minirake requires now a patched version of mruby. Great. :(

FileUtils

The last missing part of Rake I wanted to have was FileUtils. It may seems like useless & superfluous but we like Ruby for DSLs, thus its more idiomatic to write

mkdir 'foo/bar'

then

sh "mkdir -p foo/bar"

or even

exit 1 unless system "mkdir -p foo/bar" # [1]

FileUtils has some nice properties like the ability to print on demand what is happening or turn on 'no write' mode. For example, if you

include FileUtils::NoWrite

any 'destructive' command like rm or touch will do nothing.

I've looked into stdlib fileutils.rb & have quickly gave up. It's too much work to port it to mruby. Then I thought of making a thin wrapper around system commands with an FileUtils compatible API.

The idea is to generate a several sets of wrappers around simple methods in some FileUtilsSimple::Commands namespace so that user will never execute them directly but only through pre-generated static wrapper that decide what to do with a command.

Acquiring a list of singleton methods is easy but mruby never makes your life easy enough. The next mruby present was an absence of Kernel.method method. I don't even.

Unit Tests

Don't get tempted to test the ported code under MRI because your favorite test framework runs only under cruby. I've bumped into several occasions where test passes fine under cruby & fail miserably under mruby.

[1]Did I mention that Kernel.system just return a boolean & doesn't set $?? (Make a random guess in which implementation.)

Saturday, August 23, 2014

Mruby & A Self-Contained Subset of Rake

Since last time I've checked mruby, many things had changed. The biggest one was the introduction of compiled-time plugins that were confusingly called mrbgems. I have a completely different image in mind when I hear words ruby & gem together.

Still no love for require from matz.

To get an interpreter that is useful IRL, it's possible to cherry pick from a list of mrbgems. mruby-require plugin, sorry, gem is the most confusing one. If you specify it before other plugins, sorry, gems, all other gems (below it) will be compiled as .so libs & to use them you would write require foo & would immediately lose compatibility with MRI. After that, the helper

def mruby?
  RUBY_ENGINE == 'mruby'
end

& conditional checks is the only answer.

mruby build system is interesting. It uses a nano-version of Rake called minirake. By an unknown reason it's incompatible with mruby. At that point I thought "How would be cool to have rake as a standalone executable that doesn't depend on Ruby at all?".

What it has to do with mruby? It turns out, mruby can produce an array of bytecode that can be compiled with your C program into 1 executable.

It sounds cool but has its limitations. Firstly, you'll need to inline all your require statements to have 1 .rb source file. Secondly, remember, there is no stdlib in mruby. Plugins, sorry, gems, that try to bring it to mruby are nice but incomplete (for example, Dir misses glob).

You'll find problems in areas you've never imagined. For example, Ruby ISO standard doesn't mention ARGV & $0 (that't what I heard, the pdf paper is under 198 CHF paywall) which means, right, no ARGV & $0 by default--you'll need to look in mirb src to guess how to inject them.

Btw, googling won't help much, because most blog posts about mruby were written in 2012 & API is different now => old examples are mostly useless.

Back to rake. Porting 'real' rake is a daunting task. I just took minirake source, tweaked them a bit & wrote a tiny C wrapper with a couple of rakefiles: https://github.com/gromnitsky/minirake. Amazingly it seems to work. Glory to Japan!

Wednesday, April 30, 2014

Antislacker

Last weekend I wrote a small Chrome extension that helps me to avoid facebook & livejournal. I mean I allow myself to stare at them for 10 minutes max & then Antislaker (the extension) kicks in & blocks those 2 domains for the rest of the day.

The idea is this:

  1. In background.coffee we look for a domain name match. If the match was successfully, we inject a chunk of JS code.
  2. The injected peace of the code contains a counter that every 5 seconds updates a record in localStore.
  3. When a time limit comes, we move user to our internal page withing Chrome extension that shows a random Dilbert comics.

The most tricky part was making a 'mutex' for localStore records. Because the user can open several facebook pages & the counter (in the worst case) will count 2 times faster. It's actually a pity that we don't have any concurrency primitives in JS & so we have to invert poor man's busy waiting when using timers.

Thursday, October 24, 2013

Multi-Lingual Interface With Jekyll

Imagine you have a site in N languages, for example, in English & Ukrainian. The content of articles is different & cannot be auto-translated, but we can ensure that the GUI on each part of the site is localized. All menus, buttons, tooltips, etc can be edited or localized without modifying the app source code.

Let's start with the example.

$ jekyll --version
jekyll 1.2.1

$ jekyll new blog && cd blog && mkdir _plugins

$ for i in en uk
  do
  (mkdir $i \
  && cd $i && ln -s ../css ../_l* ../_plugins . \
  && cp -r ../*yml ../_posts ../*html .); \
  done

$ rm -rf _config.yml _posts index.html

For each section we copied _posts, _config.yml & index.html, because for each site they all are different, and we symlinked css & _layouts directories because for each site they will be the same.

The site's structure looks like this:

_layouts/
|__ ..
|__ default.html
|__ post.html
_plugins/
|__ ..
css/
|__ ..
|__ main.css
|__ syntax.css
en/
|__ ..
|__ _layouts@ -> ../_layouts/
|__ _plugins@ -> ../_plugins/
|__ _posts/
|__ css@ -> ../css/
|__ _config.yml
|__ index.html
uk/
|__ ..
|__ _layouts@ -> ../_layouts/
|__ _plugins@ -> ../_plugins/
|__ _posts/
|__ css@ -> ../css/
|__ _config.yml
|__ index.html
.gitignore

Now, install jekyll-msgcat:

$ gem install jekyll-msgcat

Create _plugins/req.rb file & add to it 1 line:

require 'jekyll/msgcat'

Add to uk/_config.yml:

msgcat:
  locale: uk
  # may be 'domain' or 'nearby'
  deploy: nearby

Then open _layouts/default.html, find the line

<a class="extra" href="/">home</a>

& replace it with:

<a class="extra" href="/">{{ 'Return to Home' | mc }}</a>

(Quotes are essential.)

As you see, we are using some unknown Liquid filter 'mc'. If you check the uk site

$ (cd uk; jekyll serve)

& go to http://127.0.0.1:4000/, nothing will change, everything would be in English as before. To automatically substitute 'Return to Home' string with something else we need to create a message catalog.

In our case, the message catalog is just a .yaml file:

$ cat uk/_msgcat.yaml
uk:
  'Return to Home': На головну сторiнку

What is handy about this is that if the string in the message catalog isn't provided or if there is no _msgcat.yaml file at all, the default English string would be used. Kill jekyll's server & start it again to test.

Links to Localized Versions

The other problem you may have is how to generate a link from a current page to the same page in other language.

If you choose to host each site on a separate subdomain, e.g. en.example.com & uk.example.com, set the value of``msgcat.deploy`` key in site's _config.yml to domain. If you like a scheme without subdomains & prefer example.com/blog/en && example.com/blog/uk, set the key's value to nearby.

Make sure you have url & baseurl in _config.yml. In Liquid templates use cur_page_in_another_locale filter. For example, in _layouts/default.html:

{{ 'en' | cur_page_in_another_locale }}
{{ 'uk' | cur_page_in_another_locale }}

will generate in en site (msgcat.deploy == domain):

<a href='#' class='btn btn-primary btn-xs disabled'>en</a>
<a href='http://uk.example.com/index.html' class='btn btn-primary btn-xs '>uk</a>

or for msgcat.deploy == nearby:

<a href='#' class='btn btn-primary btn-xs disabled'>en</a>
<a href='/blog/uk/index.html' class='btn btn-primary btn-xs '>uk</a>

If you don't like injected names of Bootstrap's CSS classes, use the filter with an empty parameter:

{{ 'en' | cur_page_in_another_locale: "" }}
{{ 'uk' | cur_page_in_another_locale: "" }}

Or provide your own class name(s) instead of the empty string.

Friday, March 1, 2013

Creating Emacs Multi-file Packages

(This text assumes your familiarity with the difference between simple vs. multi-file packages in Emacs, how to create them, etc.)

After writing NAME-pkg.el, creating tar file & successfully installing a package from your local test archive, you may notice a small problem: the package meta information (its version, name, etc) appears in 2 or 3 places. Take, for example, a version number:

  • it's sitting somewhere in the code as a variable value;
  • it exists in NAME-pkg.el;
  • it's stored in Makefile because your target must be aware of the output file name (which must contain the version number).

Some even prefer to include it in README.

In other package systems like npm, this is a non-issue, because their package.json file that contains all the meta can be a first class citizen in the libraries that npm delivers. It's trivial to parse it & there are nive CLI tools like jsontool that can be used in Makefiles to extract any data from package.json.

Of course we can 'parse' our NAME-pkg.el file too. This snippet will read foobar-pkg.el file and return the version string from it:

(nth 2 (package-read-from-string
      (with-temp-buffer
        (insert-file-contents
         "foobar-pkg.el")
        (buffer-string))))

But it won't solve the problem with Makefile. For instance, you'll need to write a custom CLI util only to grab package's name & version from NAME-pkg.el.

meta.json

Instead we'll take another path & store all information about our package in a .json file. JSON can be easily parsed in elisp & with jsontool's help we can extract all data within Makefile.

meta.json may look like this:

{
    "name" : "foobar",
    "version" : "0.0.1",
    "docstring" : "Free variables and bound variables",
    "reqs" : {
        "emacs" : "24.3"
    },
    "repo" : {
        "type": "git",
        "url" : "git://example.com/foobar.git"
    },
    "homepage" : "http://example.com",
    "files" : [
        "*.el",
        "README",
        "meta.json"
    ]
}

If you're not familiar with jsontool, install it via npm -g jsontool & play:

$ json name < meta.json
foobar
$ json files < meta.json | json -a
*.el
README
meta.json
$ json -a -d- name version < meta.json
foobar-0.0.1

It's very handy.

Getting Meta Into Elisp

That .json file can be parsed once while our package is loading into Emacs. We can wrap that in a library, for example, foo-metadata.el:

(require 'json)

(defvar foo-meta (json-read-file
                 (concat (file-name-directory load-file-name) "/meta.json")))

(defconst foo-meta-version (cdr (assoc 'version foo-meta)))
(defconst foo-meta-name (cdr (assoc 'name foo-meta)))

(provide 'foo-metadata)

Then you just write (require 'foo-metadata) in your code.

Package Generation

Consider the minimal multi-file structure of some Foobar project:

foobar/
|__ ..
|__ bin/
|   |__ ..
|   |__ foo-make-pkg
|__ Makefile
|__ fb-bar.el
|__ fb-foo.el
|__ fb-foobar.el
|__ meta.json

Notice that file foobar-pkg.el is missing. Instead we have strange bin/foo-make-pkg utility that generates it. If we write it properly enough we can reuse it in another emacs project:

:; exec emacs -Q --script "$0" -- "$@" # -*- mode: emacs-lisp; lexical-binding: t -*-

(setq
 debug-on-error t                     ; show stack stace
 argv (cdr argv))                     ; remove '--' from CL arguments

(require 'json)

(when (not (= 2 (length argv)))
  (message "Usage: %s meta.json some-pkg.el" (file-name-base load-file-name))
  (kill-emacs 1))

(setq data (json-read-file (car argv)))

(setq reqs (cdr (assoc 'reqs data)))
(when reqs
  (let (rlist)
    (dolist (idx reqs)
      (push (list (car idx) (cdr idx)) rlist))
    (setq reqs `(quote ,rlist))
    ))

(with-temp-file
    (nth 1 argv)
  (insert (prin1-to-string
           (list 'define-package
                 (cdr (assoc 'name data))
                 (cdr (assoc 'version data))
                 (cdr (assoc 'docstring data))
                 reqs))))

Test it by running:

$ bin/foo-make-pkg meta.json foobar-pkg.el && cat !#:1
(define-package "foobar" "0.0.1" \
    "Free variables and bound variables" (quote ((emacs "24.3"))))

To bring all together we need 2 targets in Makefile: foobar-pkg.el that generates that file & a phony target package that creates elpa-compatible tar.

.PHONY: clean package

JSON := json
METADATA := meta.json
PKG_NAME := $(shell $(JSON) -a -d- name version < $(METADATA))

foobar-pkg.el: meta.json
    bin/foo-make-pkg $@

package: foobar-pkg.el
    $(TAR) --transform='s,^,$(PKG_NAME)/,S' -cf $(PKG_NAME).tar \
        `$(JSON) files < $(METADATA) | $(JSON) -a`

clean:
    rm foobar-pkg.el $(PKG_NAME).tar

Recall that with meta.json we have 1 definitive source of all project metadata, so when you'll need to update the version number or the project dependencies or the contents of the tar or whatever--you'll edit only 1 file.

There is, of course, another route--even without any file generation. For example, you can gently parse foobar-pkg.el in elisp & have an utility that from static foobar-pkg.el produces JSON, which goes to jsontool input.

Thursday, February 28, 2013

Emacs, ERT & Structuring Unit Tests

ERT framework, that everyone is using this days in Emacs, provide very little guidance on how to organize & structure unit tests.

Running tests in the Emacs you are working in is quote idiotic. Not only you can easily pollute editor's global namespace in case of mistyping, but unit tests in such mode cannot be reliable at all, because it's possible to create unwanted dependencies on a data structures that weren't properly destroyed in previous tests invocations.

Emacs batch mode

The only one right way to execute tests is to use emacs batch mode. The idea is: your Makefile contains test target which goes to test directory, which contains several test_*.el files. Each test_*.el file can be run independently & has a test selector (a regexp) that you may optionally provide as a command line parameter.

For example, consider some Foobar project:

foobar/
|__ ..
|__ test/
|   |__ ..
|   |__ test_bar.el
|   |__ test_foo.el
|   |__ test_utils.el
|__ Makefile
|__ foo-bar.el
|__ foo-foo.el
|__ foo-foobar.el
|__ foo-utils.el

To make this work, each test_* file must know where to find foo-*.el libraries & how to run its tests. Ideally it should not depend on a current directory from which user actually runs it.

test_utils.el script then looks like:

:; exec emacs -Q --script "$0" -- "$@"

(setq tdd-lib-dir (concat (file-name-directory load-file-name) "/.."))
(push tdd-lib-dir load-path)
(push (file-name-directory load-file-name) load-path)

(setq argv (cdr argv))

(require 'foo-utils)

(ert-deftest ignorance-is-strength()
  (should (equal (foo-utils-agenda) "war is peace")))

(ert-run-tests-batch-and-exit (car argv))

Here is quite a header before the ert-deftest definition.

1st line is a way to tell your kernel & bash to run emacs with current file as an argument. -Q option forces Emacs not to read your ~/.emacs file, not to process X resource, etc. This helps (a) to start Emacs as quickly as possible & (b) to force your code not to depend on your local customizations.

Next 3 lines modify load-path list which is used by Emacs to search for files when you 'require' or 'load' something. We add to that list a parent directory, where our *.el files are. Note that load-file-name contains an absolute path to the current test_utils.el file.

Next line removes '--' cell from argv list, so that (car argv) will give you 1st command line parameter passed to the script.

(require 'foo-utils) line loads ../foo-utils.el file (if you have provided 'foo-utils in it, of course).

Next 2 lines are usual ERT test definition with 1 assertion in this example.

The last line is a ERT command that runs your unit tests. Notice its argument--it allows you to optionally run the script as:

$ ./test_utils.el regexp

to filter out unmatched ert-deftest definitions.

Makefile

You can add to it 2 useful targets: test & compile. The last one transforms .el files to .elc & sometimes produces useful info about unused variables, etc:

.PHONY: test compile clean

ELC := $(patsubst %.el,%.elc,$(wildcard *.el))

%.elc: %.el
    emacs -Q -batch -L `pwd` -f batch-byte-compile $<

test:
    @for idx in test/test_*; do \
        printf '* %s\n' $$idx ; \
        ./$$idx ; \
        [ $$? -ne 0 ] && exit 1 ; \
    done; :

compile: $(ELC)

clean:
    rm $(ELC)

Hints

Try to make every test non-interactive. For example, if your command ask user for confirmation via (y-or-n-p), Emacs even in batch mode stops and waits for input from the terminal. If you need to answer "yes", just monkey patch the function:

(setq tdd-y-or-n nil) ;; by default say "no"
(defun y-or-n-p (prompt)
  tdd-y-or-n)

and then write an assert as:

(let ((tdd-y-or-n t))
  (should (freedom-is-slavery)))

You can monkey patch any elisp function except those which are compiled in (e.g. come from .c files & are 'primitive' in Emacs terminology).

Unfortunately, famous (message) function is built-in & cannot be monkey patched. If you use it heavily in the code, your non-interactive tests will fill the stderr with garbage that will distract you. It's better to use a global (to your project namespace) flag & a wrapper for (message):

(defconst foo-meta-name "foobar")
(defvar foo-verbose 1)

(defun foo-warn (level str &rest args)
"Print a message via (message) according to LEVEL."
(when (<= level foo-verbose)
  (if (/= 0 level) (setq str (concat foo-meta-name ": " str)))
  (message (apply 'format str args))
  ))

Then use (foo-warn 1 "hi, mom") in the code instead of (message). In .el libraries foo-verbose variable can be equal to 1, but in your tests set it to -1 to prevent printing to stderr.

Friday, January 25, 2013

ssh command quoting hell

When you type

$ ssh user@host 'cat /tmp/foo.txt'

cat /tmp/foo.txt part of that string is evaluated twice: 1) by your current shell as a single quoted string, 2) by a shell on a remote host.

Lets assume you want to write a script that backups some directory from a remote machine. A naive version:

$ cat mybackup.sh
#!/bin/sh

[ -z "$1" -o -z "$2" ] && exit 1

tcd=$1
tdir=$2
ssh user@host "tar cvf - -C $tcd $tdir | gzip" > foo.tar.gz

and if you run it like this:

$ ./mybackup.sh /home joe

And if everything goes ok, you'll get foo.tar.gz which will contain joe's home directory files. But what if $1 or $2 arguments contain spaces and/or quotes? I'll tell you:

$ ./mybackup.sh /home/joe 'tmp/foo "bar'
bash: -c: line 0: unexpected EOF while looking for matching `"'
bash: -c: line 1: syntax error: unexpected end of file

This is a bash error from a remote host because it tries to run

tar czv -C /home/joe tmp/foo "bar | gzip

and "bar contains an unmached quote. Obvously this is not the command you had in mind.

How can we fix that? Another naive approach would be to single-quote some variables in the script:

ssh user@host "tar cvf - -C '$tcd' '$tdir' | gzip" > foo.tar.gz

And this will work for our example but will fail if tmp/foo "bar directory would have a name tmp/foo 'bar (with a single quote instead of a double).

To make it work regardless of such shades we need somehow to transform $1 and $2 script arguments to quoted strings. Such transformed strings shall be a safe choice for substrings that represent to-be-executed commands on the remote host.

One nuance: transforming must be done not by the rules of /bin/sh or your current local shell, but by the rules of user's shell on a remote host. (See do_child() function in session.c of openssh source: it extracts user's shell name from users db on a remote machine & constructs arguments for execve(2) as "/path/to/shell_name", "shell_name", "-c", "foo", "bar".)

If the remote shell is a sh-derived one, the trasformation function can look like:

sq() {
    printf '%s\n' "$*" | sed -e "s/'/'\\\\''/g" -e 1s/^/\'/ -e \$s/\$/\'/
}

(Taken from of http://unix.stackexchange.com/a/4774.)

Then, a final version of the 'backup' script would be:

#!/bin/sh

sq() {
    printf '%s\n' "$*" | sed -e "s/'/'\\\\''/g" -e 1s/^/\'/ -e \$s/\$/\'/
}

[ -z "$1" -o -z "$2" ] && exit 1

tcd=$1
tdir=$2
out=`basename "$tdir"`.tar.gz

cmd="tar cvf - -C `sq $tcd` `sq $tdir` | gzip"
echo "$cmd"
ssh user@host "$cmd" > "$out"

Hint: when in doubt, run (openssh) ssh with -v option and search for 'debug1: Sending command' string in the output.