Sunday, January 3, 2016

An Oral History of Unix as an epub

During the summer-fall of 1989, Professor Michael S. Mahoney (of Princeton University) recorded a series of interviews w/ Bell Labs people who were involved in the creation of Unix. For example, dmr or McIlroy (Alan Turing always wanted to win a McIlroy Award, but didn't qualify).

This interview project was called An Oral History of Unix. Until the last week I had no idea of its existence. Judging from the text length (& comments in the transcriptions like "end of side A"), each conversation was an hour-long or more.

Unfortunately, the format that transcriptions are in, is an ancient version of MS Word & html version of it contains this hilarious lines:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<META NAME="Generator" CONTENT="Microsoft Word 97">

I don't know about you, but the last time I saw similarly crafted pages was more than 15 years ago.

Of course as you may guess an encoding in the content type header doesn't match the encoding of the file:

$ curl -sI http://www.princeton.edu/~hos/mike/transcripts/weinberger.htm | grep Content-Type
Content-Type: text/html; charset=UTF-8

It's like 1999 all over again!

Ok, enough w/ that. We can't write to Professor because he passed away in 2008. What we can do is to fix the presentation of the pages or, what I chose to do, to make them more readable on Kindle. I.e. if we generate a TOC & feed the (fixed) html to Calibre, it generates a valid epub file that we then can convert to .mobi or .azw3. The build scripts can be found here. The final result (epub, mobi, pdf): http://gromnitsky.users.sourceforge.net/lit/an-oral-history-of-unix/.

Enjoy the reading!

Sunday, December 20, 2015

Dynamic PATH in GNU Make

Sometimes you may have several targets, where the 1st one creates a new directory, puts some files in it & the 2nd target expects newly created directory to be added to PATH. For example:

$ make -v | head -1
GNU Make 4.0

$ cat example-01.mk
PATH := toolchain:$(PATH)

src/.configure: | toolchain src
        cd src && ./configure.sh
        touch $@

toolchain:
        mkdir $@
        printf "#!/bin/sh\necho foo" > $@/foo.sh
        chmod +x $@/foo.sh

src:
        mkdir $@
        cp configure.sh $@

toolchain target here creates the directory w/ new executables. src target emulates unpacking a tarball w/ configure.sh script in it that runs foo.sh, expecting it to be in PATH:

$ cat configure.sh
#!/bin/sh

echo PATH: $PATH
echo
foo.sh

If we run this example, configure.sh will unfortunately fail:

$ make -f example-01.mk 2>&1 | cut -c -72
mkdir toolchain
printf "#!/bin/sh\necho foo" > toolchain/foo.sh
chmod +x toolchain/foo.sh
mkdir src
cp configure.sh src
cd src && ./configure.sh
PATH: toolchain:/home/alex/.rvm/gems/ruby-2.1.3/bin:/home/alex/.rvm/gems

./configure.sh: line 5: foo.sh: command not found
example-01.mk:4: recipe for target 'src/.configure' failed
make: *** [src/.configure] Error 127

The error is in the line where configure.sh is invoked:

cd src && ./configure.sh

As soon as we chdir to src, toolchain directory in the PATH becomes unreachable. If we try in use $(realpath) it won't help because when PATH variable is set there is no toolchain directory yet & $(realpath) will expand to an empty string.

What if PATH was an old school macro that was reevaluated every time it was accessed? If we change PATH := to:

path.orig := $(PATH)
PATH = $(warning $(shell echo PWD=`pwd`))$(realpath toolchain):$(path.orig)

Then PATH becomes a recursively expanded variable & a handy $(warning) function will print to the stderr the current working directory exactly in the moment PATH is being evaluated (it won't mangle the PATH value because $(warning) always expands to an empty string).

$ rm -rf toolchain src ; make -f example-02.mk 2>&1 | cut -c -100
mkdir toolchain
example-02.mk:2: PWD=/home/alex/lib/writing/gromnitsky.blogspot.com/posts/2015-12-20.1450641571
printf "#!/bin/sh\necho foo" > toolchain/foo.sh
chmod +x toolchain/foo.sh
mkdir src
example-02.mk:2: PWD=/home/alex/lib/writing/gromnitsky.blogspot.com/posts/2015-12-20.1450641571
cp configure.sh src
cd src && ./configure.sh
example-02.mk:2: PWD=/home/alex/lib/writing/gromnitsky.blogspot.com/posts/2015-12-20.1450641571
PATH: /home/alex/lib/writing/gromnitsky.blogspot.com/posts/2015-12-20.1450641571/toolchain:/home/ale

foo
touch src/.configure

As we see, PATH was accessed 3 times: before printf/cp invocations & after ./configure.sh (because for ./ there is no need to consult PATH).

Saturday, December 12, 2015

GTK3

While upgrading to Fedora 23, I've discovered New Horizons of Awesomeness in gtk3. (I think it should be the official slogan for all the new gtk apps in general.)

If you don't use a compositor & select ubuntu-style theme:

  $ grep theme-name ~/.config/gtk-3.0/settings.ini
  gtk-theme-name = Ambiance  

modern apps start looking very indie in fvwm:

https://lh3.googleusercontent.com/-77JmxzfMAm8/Vmwu061FLJI/AAAAAAAAAiw/P0HrryUvhrI/s640-Ic42/gtk3-demo-ambiance.png

Granted, it's not 1997 anymore, we all have big displays w/ a lot of lilliputian pixels, but such a waste of a screen estate seems a little unnecessary to me.

Turns out it's an old problem that has no solution, except for the "use Gnome" handy advice. There is a https://github.com/PCMan/gtk3-nocsd hack but I don't think I'm in a such a desparate position to employ it. A quote from the README:

  I use $LD_PRELOAD to override several gdk and glib/gobject APIs to
  intercept related calls gtk+ 3 uses to setup CSD.  

I have no words. All we can do to disable the gtk3 decoration is to preload a custom library that mocks some rather useful part of gtk3 api. All praise Gnome!

In seeking of a theme that has contrast (e.g. !gray text on gray backgrounds) I've found that (a) an old default theme looks worse than Motif apps from 1990s:

  $ GTK_THEME=Raleigh gtk3-demo  
https://lh3.googleusercontent.com/-NT7umB_tPJc/Vmwu1GMn1jI/AAAAAAAAAi0/DdEE3LnBZyQ/s640-Ic42/gtk3-demo-raleigh.png

Which is a pity because gtk2 Raleigh theme was much prettier:

https://lh3.googleusercontent.com/-qyN_DZbufhA/Vmwu0sCcDNI/AAAAAAAAAio/CFabyRWEVvA/s800-Ic42/gtk2-demo-raleigh.png

& (b) my favourite GtkPaned widget renders equaly horrific everywhere. Even a highly voted Clearlooks-Phenix theme manages to make it practically imperceptible by the eye:

https://lh3.googleusercontent.com/-aRrShlI2Lmw/Vmwu0kfj-tI/AAAAAAAAAis/LSOBlcyTAKI/s640-Ic42/clearlooks-phenix-gtk3-theme.png

A moral of the story: don't write desktop apps (but all kids know this already), ditch gtk apps you run today for they all will become unusable tomorrow (but what do I know? I still use xv as a photo viewer).

Sunday, November 8, 2015

Why Johnny Still Can't Encrypt

Before reading "Why Johnny Still Can't Encrypt" I'd read "Why Johnny Can't Encrypt". Boy it was hilarious!

In the original paper they asked 12 people to send en encrypted message to 5 people. In the process the participants had to stumble upon several traps like a need to distinguish a key algo type because 1 of the recipients used an 'old' style RSA key.

The results were funny to read:

'One of the 12 participants (P4) was unable to figure out how to encrypt at all. He kept attempting to find a way to "turn on" encryption, and at one point believed that he had done so by modifying the settings in the Preferences dialog in PGPKeys.'

'P1, P7 and P11 appeared to develop an understanding that they needed the team members' public keys, but still did not succeed at correctly encrypting email. P2 never appeared to understand what was wrong, even after twice receiving feedback that the team members could not decrypt his email.'

'(P5) so completely misunderstood the model that he generated key pairs for each team member rather than for himself, and then attempted to send the secret in an email encrypted with the five public keys he had generated. Even after receiving feedback that the team members were unable to decrypt his email, he did not manage to recover from this error.'

'P6 generated a test key pair and then revoked it, without sending either the key pair or its revocation to the key server. He appeared to think he had successfully completed the task.'

'P11 expressed great distress over not knowing whether or not she should trust the keys, and got no further in the remaining ten minutes of her test session.'

The new paper "Why Johnny Still Can't Encrypt" is uninspiring. They used a JS OpenPGP implementation (Mailvelope), avalivible as a Chrome/Firefox plugin. Before reading the sequel I'd installed the plugin to judge it by myself.

Mailvelope is fine if you understand that it operates just on a arbitual block of text; it doesn't (& cannot) 'hook' into GMail in any way except for trying to parse encoded text blocks & looking for editible DIVs. It can be confusing if you don't get that selecting the recipeint in the GMail compose window has nothing to do the encrypting: it's easy to sent a mail to bob@example.com where you encoded the message with alice@example.com PK.

In other aspects I've found Mailvelope pretty obvious.

Having 'achieved' the grandiose task of exchanging public keys between 2 emails & sending encrypting messages, I finally read the paper.

Boy it was disappointing.

In contrast w/ the original PGP study, they resorted to the most simplest possible tasks: user A should generate a key pair; ask user B for his PK; send an encrypted email. They got 20 pairs of A-B users. Only 1 pair successfully send/read a message.

The 1 pair.

This is why humanity is doomed.

Monday, September 14, 2015

wordnet & wordnut

Here is a tiny new Emacs major mode for browsing local WordNet lexical database: https://github.com/gromnitsky/wordnut

I was very surprised not to find an abundance of similar modes in the wild.

https://raw.github.com/gromnitsky/wordnut/master/screenshot1.png

Its most useful features are:

  • Completions. For example, do M-x wordnut-search RET arc TAB.
  • Pressing Enter in *WordNut* buffer inside any word. In this way you can browse the WordNet db indefinitely.
  • History. Keybindings are very usual: `l' to go back, `r' to go forward.

Sunday, August 16, 2015

What is Ruby power_assert gem & why you may need it

After upgrading from Ruby 2.1.3 to 2.2.2 I've noticed a new bundled gem called power_assert. It turned out that test-unit requires it for like a year now. It was a 2nd surprise, because I thought that everyone's moved to minitest many years ago & test-unit was left alone for the backward-compatibility sake.

A 'power assert' enabled test-unit has an enhanced version of assert() that can take a block & in a case of failure print values for an each object in a method chain. If no block is given to this new assert(), the old one version is invoked.

$ cat example-1.rb
require 'test/unit'

class Hello < Test::Unit::TestCase
  def test_smoke
    assert { 3.times.include? 10 }
  end
end

$ ruby example-1.rb | sed -n '/==/,/==/p'
===============================================================================
Failure:
      assert { 3.times.include? 10 }
                 |     |
                 |     false
                 #<Enumerator: 3:times>
test_smoke(Hello)
/home/alex/.rvm/gems/ruby-2.2.2@global/gems/power_assert-0.2.2/lib/power_assert.
rb:29:in `start'
example-1.rb:5:in `test_smoke'
     2:
     3: class Hello < Test::Unit::TestCase
     4:   def test_smoke
  => 5:     assert { 3.times.include? 10 }
     6:   end
     7: end
===============================================================================

As I understand, Kazuki Tsujimoto (the author of power_assert gem) got the idea for a pretty picture for a method chain from the Groovy language. Before power_assert gem we could only use Object.tap() for peeking into the chain:

> ('a'..'c').to_a.tap {|i| p i}.map {|i| i.upcase }
["a", "b", "c"]
[
  [0] "A",
  [1] "B",
  [2] "C"
]

Using power_assert we can write a enhanced version of Kernel.p(), where in the spirit of the new assert(), it prints a fancy picture if a user provides a block for it:

$ cat super_duper_p.rb
require 'power_assert'

def p *args
  if block_given?
    PowerAssert.start(Proc.new, assertion_method: __callee__) do |pa|
      val = pa.yield
      str = pa.message_proc.call
      if str == "" then Kernel.p(val) else puts str end
      val
    end
  else
    Kernel.p(*args)
  end
end

$ cat example-2.rb
require './super_duper_p'

p {3.times.to_a.map {|i| "i=#{i}" }.include? 3}
p [1,2,3], [4,5,6], "7"
p { [1,2,3] }

$ ruby example-2.rb
p {3.times.to_a.map {|i| "i=#{i}" }.include? 3}
     |     |    |                   |
     |     |    |                   false
     |     |    ["i=0", "i=1", "i=2"]
     |     [0, 1, 2]
     #<Enumerator: 3:times>
[1, 2, 3]
[4, 5, 6]
"7"
[1, 2, 3]

Unfortunately, it won't work in irb.

If you're like the rest of us who prefer minitest instead of test-unit, you'll need a separate gem for it.

Thursday, July 16, 2015

iojs API docs in Texinfo format

I wanted to hold on until the day of node & iojs convergence, but sadly the convergence apparently ain't gonna happen this year.

So, for those who like to read docs in Emacs & not in a browser I wrote a simple converter from iojs .md files to the Texinfo format. As a byproduct of this, it's now possible to auto check broken cross-references in the iojs docs.

Why read docs in Emacs? We automatically get

  • Searching
  • Index

(None of which are available in the current md->html iojs tooling.)

To play w/ the index, go to the iojs node & press i. Using the index is unbelievably handy after you get used to it.

If you think that Texinfo is a complex, outdated & obscure thing, I have a quote for you from Eli Zaretskii:

What is it with you young people that you are so afraid of "barriers"? Did someone sell you a fairy tail that there are no barriers in life, except in Emacs and Texinfo? If you cannot negotiate these ridiculously low "barriers", how will you ever succeed in your life out there?

Tuesday, July 14, 2015

Firefox & Antialiasing

Firefox is the only browser that continually annoys me with its 'liberal' reading of my fontconfig configuration. I don't use Firefox as my primary browser so when I need to run it to test some new API hotness I usually cry of frustration.

Take fonts for example. In ~/.config/fontconfig/fonts.conf I have this:

<fontconfig>
  [...]

  <!-- antialiasing is off for truetype fonts -->
  <match target="font">
    <test name="fontformat">
      <string>TrueType</string>
    </test>
    <edit mode="assign" name="antialias">
      <bool>false</bool>
    </edit>
  </match>

</fontconfig>

that allows me to have any local TT font rendered (by a program that abides the fontconfig rules) w/o antialiasing. Webfonts, that a browser downloads, in 99.(9)% cases don't come in TT format, so any web page that uses them renders w/ antialiasing as usual. Such a trick works flawlessly w/ Chrome but fails w/ Firefox.

A week ago a nightly version (how they call it, 'mozilla-central'?) suddenly started to behave like Chrome but the surprise didn't last very long: today, simultaneously w/ a never-ending Adobe Flash brouhaha, they broke the font rendering again.

Wednesday, April 8, 2015

GNU Make Shellquote

Sometimes you may have a filename that contains quotes & your usual makefile routines breaks. For example, if you generate

index.Ukrayins'ka.html from index.Ukrayins'ka.md

& add index.Ukrayins'ka.html to clean variable, this classic pattern won't work anymore:

.PHONY: clean
clean:
      rm -rf $(clean)

because your shell will complain about a quote mismatch.

So you need to 'shellquote' a variable clean.

We can write a parameterized function in make that transforms 1 word into a safe shell-quoted string:

clean.shellquote = '$(subst ','\'',$(1))'

The Make manual has a nice example of a map function. That's all we need: we transform each word from clean variable w/ the map function that calls our clean.shellquote routine.

The complete example:

clean.map = $(foreach a,$(2),$(call $(1),$(a)))
clean.shellquote = '$(subst ','\'',$(1))'
# '# emacs font-lock

.PHONY: clean
clean:
      rm -rf $(call clean.map,clean.shellquote,$(clean))

Thursday, March 26, 2015

A Strategy of No Skill

I love this:

Russ: I get an email from a football predictor who says, 'I know who is going to win Monday night. I know which team you should bet on for Monday night football.'

And I get this email, and I think, well, these guys are just a bunch of hacks. I'm not going to pay any attention to it. But it turns out to be right; and of course who knows? It's got a 50-50 chance. But then, for the next 10 weeks he keeps sending me the picks, and I happen to notice that for 10 weeks in a row he gets it right every time. And I know that that can't be done by chance, 10 picks in a row.

He must be a genius. And of course, I'm a sucker. Why?

Guest: So, let's say after those 10 weeks in a row you actually subscribe to this person's predictions. And then they don't do so well, after the 10 weeks.

And the reason is that the original strategy was basically: Send an email to 100,000 people, and in 50,000 of those emails you say that Team A is going to win on Monday. And in 50,000 you say Team B is going to win on Monday.

And then, if Team A wins, the next week you only send to the people that got the correct prediction. So, the next week you do the same thing. 25,000 for Team A, 25,000 for Team B. And you continue doing this. And the size of the number of emails decreases every single week, until after that 10th week, there are 97 people that got 10 picks in a row correct. So you harvest 97 suckers out of this. (http://www.econtalk.org/archives/2015/03/campbell_harvey.html)

Or in other words:

$ irb
2.1.3 :001 > people = 100_000
100000
2.1.3 :002 > 10.times.map { people /= 2 }
[
  [0] 50000,
  [1] 25000,
  [2] 12500,
  [3] 6250,
  [4] 3125,
  [5] 1562,
  [6] 781,
  [7] 390,
  [8] 195,
  [9] 97
]

Saturday, February 21, 2015

A minimalistic node version manager

If you pay attention to nodejs world & suddenly find yourself using 3 version of node simultaneously, you probably may start thinking about a version manager.

There are some existing ones, like nvm & n. They are nice, but both are written in bash & may require a periodic update after a new node/iojs release.

What I want from the 'manager' is that it doesn't integrate itself w/ a shell & doesn't require a constant updating.

A 'non-updating' feature resolves in a drastic code simplification: if a version manager (VM) doesn't know how to install a new node version whatsoever, then you don't need to update its code (hopefully whatsoever too).

A non-bash requirement dates back to rvm, which has been redefining cd for us since 2009. It doesn't mean of course that a VM written in bash would obligatory modify built-in shell commands, but observing the rvm struggle w/ bash, have discouraged me from sh-like solutions.

The VM should be fast, so writing it in Ruby (unfortunately) is not an option, due to a small (but a noticeable) startup overhead that any ruby CLI util has. Ideally it also should have no dependencies.

This leaves us w/ several options. We can use mruby or plain C or, wait, there is Golang! In the past its selling point was a 'system' language feeling.

Well. I can tell that it's not as poignant as Ruby for sure, but it's hyper fast & quite consistent. It took me roughly a day to feel more or less comfortable w/ it, which is incomparable w/ a garbage like C++. Frankly I was surprised myself that it went so smooth.

Back to the YA version manager for node. It's called nodever, it uses a 'subshell' approach via installing system-wide wrappers & it's a tiny Go program.

Saturday, February 7, 2015

node.js 0.12, stdin & spawnSync

If you have in your code a quick hack like this:

stdin = fs.readFileSync('/dev/stdin').toString()

& it works fine & nothing really happens bad, so you may start wondering one day why is it considered by everyone as a temporal solution?

Node's readFileSync() uses stat(2) to get the size of a file it tries to read. By definition, you can't know ahead the size of stdin. As one dude put it on SO:

Imagine stdin is like a water tap. What you are asking is the same as "How much water is there in a tap?".

by using stat(2) readFileSync() will read up to what lenght value the kernel will lie/guess about /dev/stdin.

Another issues comes w/ testing. If you have a CL utility & want to write an acceptance test for it using 'new' node 0.12 child_process.spawnSync() API, expect funny errors.

Suppose we have a node version of cat that's written in a dumb 'synchronous' way. Call it cat-1.js:

#!/usr/bin/env node

var rfs = require('fs').readFileSync

if (process.argv.length == 2) {
        process.stdout.write(rfs('/dev/stdin'))
} else {
        process.argv.slice(2).forEach(function(file) {
                process.stdout.write(rfs(file))
        })
}

Now we write a simple test for it:

var assert = require('assert')
var spawnSync = require('child_process').spawnSync

var r = spawnSync('./cat-1.js', { input: 'hello' })
assert.equal('hello', r.stdout.toString())

& run:

$ node test-cat-1-1.js

assert.js:86
  throw new assert.AssertionError({
        ^
AssertionError: 'hello' == ''
    at Object.<anonymous> (/home/alex/lib/writing/gromnitsky.blogspot.co
m/posts/2015-02-07.1423330840/test-cat-1-1.js:5:8)

What just happened? (I've cut irrelevant trace lines.) Why the captured stdout is empty? Lets change the test to:

var assert = require('assert')
var spawnSync = require('child_process').spawnSync

var r = spawnSync('./cat-1.js', { input: 'hello' })
console.error(r.stderr.toString())

then run:

$ node test-cat-1-2.js
fs.js:502
  return binding.open(pathModule._makeLong(path), stringToFlags(flags),
mode);
                 ^
Error: ENXIO, no such device or address '/dev/stdin'
    at Error (native)
    at Object.fs.openSync (fs.js:502:18)
    at fs.readFileSync (fs.js:354:15)

At this point unless you want to dive into libuv internals, that quick hack of explicitly reading /dev/stdin should be changed to something else.

In the past node maintainers disdained the stdin sync read & called it an antipattern. The recommended way was to use streams API, where you employed process.stdin as a readable stream. Still, what if we really want a sync read?

The easiest way is to make a wrapper around readFileSync() that checks filename argument & invokes a real readFileSync() when it's not equal to /dev/stdin. For example, lets create a simple module readFileSync:

var fs = require('fs')

module.exports = function(file, opt) {
        if ( !(file && file.trim() === '/dev/stdin'))
                return fs.readFileSync(file, opt)

        var BUFSIZ = 65536
        var chunks = []
        while (1) {
                try {
                        var buf = new Buffer(BUFSIZ)
                        var nbytes = fs.readSync(process.stdin.fd, buf, 0, BUFSIZ, null)
                } catch (err) {
                        if (err.code === 'EAGAIN') {
                                // node is funny
                                throw new Error("interactive mode isn't supported, use pipes")
                        }
                        if (err.code === 'EOF') break
                        throw err
                }

                if (nbytes === 0) break
                chunks.push(buf.slice(0, nbytes))
        }

        return Buffer.concat(chunks)
}

It's far from ideal, but at least it doesn't use stat(2) for determining stdin size.

We modify out cat version to use this module:

#!/usr/bin/env node

var rfs = require('./readFileSync')

if (process.argv.length == 2) {
        process.stdout.write(rfs('/dev/stdin'))
} else {
        process.argv.slice(2).forEach(function(file) {
                process.stdout.write(rfs(file))
        })
}

& modify the original version of the acceptance test to use it too:

var assert = require('assert')
var spawnSync = require('child_process').spawnSync

var r = spawnSync('./cat-2.js', { input: 'hello' })
assert.equal('hello', r.stdout.toString())

& run:

$ node test-cat-2-1.js

Yay, it doesn't throw up an error & apparently works!

To be sure, generate a big file, like 128MB:

$ head -c $((128*1024*1024)) < /dev/urandom > 128M

then run:

$ cat 128M | ./cat-2.js > 1
$ cmp 128M 1
$ echo $?
0

Which should return 0 if everything was fine & no bytes were lost.

Sunday, December 14, 2014

A Naive Benchmark of GnuPG 2.1 Symmetric Algorithms

Some symmetric algo benchmarks already exist, but still don't answer to a typical question for a typical setup:

I do a regular backup of N (or even K) gigabytes. I don't want the backup to be readable by a random hacker form Russia (if he breaks into my server). What algo should I use to encrypt the backup as fast as possible?

This rules out many existing benchmarks.

The typical setup also includes gpg2. I don't care about synthetic algo tests (like 'I read once that Rijndael is fast & 3DES is slow'), I'm interested in a particular implementation that runs on my machines.

(Note that benchmarks below are not 'scientific' in any way; they are meant to be useful for 1 specific operation only: encrypting binary blobs through ruby-gpeme.)

gpg2 cli program

The first thing I did was to run

$ gpg2 --batch --passphrase 12345 -o out --compress-algo none \
    --cipher-algo '<ALGO>' -c < file.tar.gz

But was quickly saddened because the results weren't consistent: the deviation between runs was too big.

What we needed here was to dissociate the crypto from the IO.

libgcrypt

'Modern' versions of GnuPG have detached a big chunk of the crypto magic into a separate low-level library libgcrypt. If we want to test symmetric ciphers w/o any additional overhead, we can write a nano version of gpg2.

It'll read some bytes from /dev/urandom, pad them (if a block cipher mode requires it), generate an IV, encrypt, prepend the IV to an encrypted text, append a MAC, run that for all libgcrypt supported ciphers. Then we can draw a pretty graph & brag about it to coworkers.

The problem is that there is no any docs (at least I haven't found them) about a general format that gpg2 uses for block ciphers. And you need it because a decipher must be able to know what algo was used, its cipher mode, where to search for a stored IV, etc.

There is OpenPGP RFC 4880 of course:

The data is encrypted in CFB mode, with a CFB shift size equal to the cipher's block size. The Initial Vector (IV) is specified as all zeros. Instead of using an IV, OpenPGP prefixes a string of length equal to the block size of the cipher plus two to the data before it is encrypted.

That's better than nothing, but still leaves us w/ n hours of struggling to write & test code that will produce an encrypted stream suitable for gpg2.

GPGME

GnuPG has an official library that even has bindings for such languages as Ruby. It's an opposite of libgcrypt: it does all the work for you, where libgcrypt doesn't even provide auto padding.

The trouble w/ gpgme is that it was unusable for automated testing purposes until GnuPG hit version 2.1 this fall.

For instance,

  • Versions 2.0.x cannot read passwords w/o pinentry.
  • At the time of writing, 2.1 isn't available on any major Linux distribution (except Arch, but I'm not using it anywhere (maybe I should)).
Writing a Benchmark

ruby-gpgme has a nifty example for symmetric ciphers:

crypto = GPGME::Crypto.new password: '12345'
r = crypto.encrypt "Hello world!\n", symmetric: true

where r.read() will return an encrypted string.

We have 2 problems here:

  1. There is absolutely no way to change through the API the symmetric cipher. (The default one is CAST5.) This isn't a fault of ruby-gpgme, but the very same gpgme library under it.

    GnuPG has a concept of a 'home' directory (it has nothing to do w/ user's home directory, it just uses it as a default). Each 'home' can have its number of configuration files. We need gpg.conf file there w/ a line:

    personal-cipher-preferences <algo>
    
  2. The modest password: '12345' option does nothing unless archaic gpg1 is used. W/ gnupg 2.0.x an annoying pinentry window will pop-up.

    E.g. installing 2.1 is the only option. Instead overwriting the existing 2.0.x installation (and possibly breaking your system), install 2.1 under a separate prefix (for example, to ~/tmp/gnupg).

    Next, for each gpg 'home' directory we need to add to gpg.conf another line:

    pinentry-mode loopback
    

    & create a gpg-agent.conf file w/ a line:

    allow-loopback-pinentry
    

The benchmark works like this:

  1. Before running any crypto operations, for each cipher we create a 'home' directory & fill it w/ custom gpg.conf & gpg-agent.conf files.
  2. Start a bunch of copies of gpg-agent, each for a different 'home' dir.
  3. Add a bin directory of our fresh gnupg 2.1 installation to the PATH, for example ~/tmp/gnupg/bin.
  4. Set LD_LIBRARY_PATH to ~/tmp/gnupg/lib.
  5. Generate 'plaint text' as n bytes from /dev/urandom.
  6. Encode 'plain text' w/ a list of all supported symmetric ciphers.
  7. Print the results.

Ruby script that does this can be cloned form https://github.com/gromnitsky/gpg-algo-speed. You'll need gpgme & benchmark-ips gems. Run the file benchmark from the cloned dir.

Results

AMD Sempron 145, Linux 3.11.7-200.fc19.x86_64

$ ./benchmark /opt/tmp/gnupg $((256*1024*1024))
Plain text size: 268,435,456B
Calculating -------------------------------------
                idea     1.000  i/100ms
                3des     1.000  i/100ms
               cast5     1.000  i/100ms
            blowfish     1.000  i/100ms
                 aes     1.000  i/100ms
              aes192     1.000  i/100ms
              aes256     1.000  i/100ms
             twofish     1.000  i/100ms
         camellia128     1.000  i/100ms
         camellia192     1.000  i/100ms
         camellia256     1.000  i/100ms
-------------------------------------------------
                idea      0.051  (± 0.0%) i/s -      1.000  in  19.443114s
                3des      0.037  (± 0.0%) i/s -      1.000  in  27.137538s
               cast5      0.059  (± 0.0%) i/s -      1.000  in  16.850647s
            blowfish      0.058  (± 0.0%) i/s -      1.000  in  17.183059s
                 aes      0.059  (± 0.0%) i/s -      1.000  in  17.080337s
              aes192      0.057  (± 0.0%) i/s -      1.000  in  17.516253s
              aes256      0.057  (± 0.0%) i/s -      1.000  in  17.673528s
             twofish      0.057  (± 0.0%) i/s -      1.000  in  17.533964s
         camellia128      0.054  (± 0.0%) i/s -      1.000  in  18.359755s
         camellia192      0.053  (± 0.0%) i/s -      1.000  in  18.712756s
         camellia256      0.054  (± 0.0%) i/s -      1.000  in  18.684303s

Comparison:
               cast5:        0.1 i/s
                 aes:        0.1 i/s - 1.01x slower
            blowfish:        0.1 i/s - 1.02x slower
              aes192:        0.1 i/s - 1.04x slower
             twofish:        0.1 i/s - 1.04x slower
              aes256:        0.1 i/s - 1.05x slower
         camellia128:        0.1 i/s - 1.09x slower
         camellia256:        0.1 i/s - 1.11x slower
         camellia192:        0.1 i/s - 1.11x slower
                idea:        0.1 i/s - 1.15x slower
                3des:        0.0 i/s - 1.61x slower

Algo         Total Iterations
       idea          2
       3des          2
      cast5          2
   blowfish          2
        aes          2
     aes192          2
     aes256          2
    twofish          2
camellia128          2
camellia192          2
camellia256          2

As we see, 3DES is indeed slower that Rijndael.

(The plot is written in Grap. It doesn't really matter but I wanted to show off that I was tinkering w/ a Bell Labs language from 1984 that nobody is using anymore.)

In the repo above there is the result for 3G blob (w/ compression turned on), where Ruby garbage collector has run amok.

Wednesday, December 3, 2014

hackernews2nntp

It has been almost 2 month since YC folks have announced their official Hacker News API & have threatened us w/ an imminent HN design change. (When I say 'us' I mean authors of various Chrome extensions or web scrapers.)

Writing YA interface on a top of a common backend is exciting only if you are 17 y.o. Instead of inventing a 'new' forum-like view I've decided to make a one-way HN to NNTP 'convertor', so that I can read HN in mutt. Like this:

https://raw.github.com/gromnitsky/hackernews2nntp/master/screenshot1.png

Why NNTP?

Because of a history of newsreaders UI, reading something that represents a newsgroup means:

  1. Being able not to read the same post (article) twice (the client software marks old articles).
  2. Local filtering. Highlighting favourite authors, hiding trolls, sorting by date, thread, etc.
  3. The offline mode (if you have a local NNTP server on your laptop).

Some time ago I've specifically wrote a Chrome extension for items 1-2, but have never impemented the custom thread sorting in it.

Moving your reading activities to mutt has its disadvantages:

  • No up-voting.

  • No score updates.

  • Once article is fetched & posted, it's very cumbersome to post it again if the content of it changes. You have to check w/ the server if it has the article w/ a particular message id, check for body differences, change the message id of a new article (otherwise the server will reject it as a duplicate), and possibly modify its References header to point it out to the old version.

    In short, I didn't do that. Once the article is posted it stays the same.

The original idea was to run some-gateway as a daemon that would have monitor for NH updates & would have immidiately convert new stories/comments. That turned out to be impractical because my laptop isn't on 24/365. Instead I took an old usenet path: donwload a bunch of articles & read them later.

The old way has 2 primary advantages:

  • There is no need to save the program state, because if we download an article twice (now & in the previous run), NNTP server will reject a duplicate.
  • It can help w/ HN addiction. You run some-convertor once a day & read all the interesting staff in your scheduled 'NH time'.

Then, if we use a decent article injector, it'll spool undelivered articles (for example if the NNTP server isn't responding) & post them in the next run automatically.

In the end, I run

  $ hackernews2nntp-get top100 -v | hackernews2nntp-convert -v | sudo rnews -N  

once a day & practically never visit the HN website.

You can read more about the convertor here: https://github.com/gromnitsky/hackernews2nntp

Saturday, September 13, 2014

Porting Code to MRuby

If you take a random library from Ruby stdlib & try to use it under mruby, expect failure. If everything seems to work out of the box it's either (a) a miracle or (b) (more likely) you haven't tested the library enough.

The 1st thing I've tried to bring to minirake was FileList. It turned out that FileList uses Dir.glob (glob wasn't implementated in mruby-dir). It turned out that Dir.glob internally uses File.fnmatch (fnmatch wasn't implemented in mruby-io).

Have you ever used File.fnmatch in your code? You usually stumble across its pattern language only as sub-patterns of Dir.glob patterns. For example, Dir.glob adds ** & { } syntax.

In MRI, File.fnmatch is implemented in C. Extracting it to a plain C library w/o Ruby dependency is relatively quick & simple. This is how Rubinius team ported it & so did I. There nothing interesting about the library except maybe the notion that for some reason MRI version returns 0 as a mark of successful match & 1 otherwise.

Dir.glob is a more complex story. Again, in MRI it's implemented in C. At 1st I wanted to do for glob the same job as for fnmatch but glob has too many calls to MRI API that hasn't direct equivalents in mruby. I was lucky not to have to mess with C because Rubinius had its own version of Dir.glob written in Ruby.

It didn't go so smoothly as I hoped because the code isn't a 'pure' Ruby but an Rubinius version of it with annoying calls like Rubinius::LRUCache, Regexp.match_from, String.byteslice. (The last one is from Ruby 1.9+ but mruby still lacks it.)

After the porting struggle I checked the result with unit tests for Dir.glob from MRI & amazingly they worked fine which was a pleasant surprise because I wasn't expecting the good outcome.

Then came FileList turn

As every library that was written by Jim Weirich it's (a) very well documented, (b) uses metaprogramming a lot.

While changing class_eval calls with interpolated strings to a class_eval with blocks & define_method was easy, bugs have started to arrive from unexpected & funny areas. For example:

$ ruby -e "p File.join ['a', 'b', 'c']"
"a/b/c

vs.

$ mruby -e "p File.join ['a', 'b', 'c']"
["a", "b", "c"]

Or even better:

$ ruby -e 'p [nil] <=> [nil]'
0
$ mruby -e 'p [nil] <=> [nil]'
trace:
        [1] mrblib/array.rb:166:in Array.<=>
        [0] -e:1
mrblib/array.rb:166: undefined method '<=>' for nil (NoMethodError)

The same goes for NilClass & <=>. File.extname behaves differently, File.split is missing, etc.

In many cases it isn't mruby fault but mrbgem libraries, but the whole ecosystem is in a state that isn't suitable for people with weak nerves. Sometimes I thought that 'm' in mruby actually means 'masochistic'.

After the porting struggle with Array methods like | & + I took unit tests from Rake & amazingly they worked almost file (there is no StringIO in mruby) which wasn't a pleasant surprise because at that point I got angry.

__FILE__

Do you know that __FILE__ is a keyword & __dir__ is a method? You can monkey patch __dir__ in any moment, but can do nothing to __FILE__. I didn't know that.

Making an executable with mruby involves producing the bytecode which can be statically linked to the executable & loaded via mrb_read_irep function at the runtime.

Bytecode can be generated with mrbc CL utility that ships with mruby. It sets value for __FILE__ according to its CL arguments. For example:

$ mrbc -b mycode foo/bar/main.rb

will set __FILE__, for bytecoded main.rb, to foo/bar/main.rb. If you have an executable named foobar & use main.rb as an entry point it your Ruby code, the classic trick

do_someting if __FILE__ == $0

won't give the result you've expected.

At 1st I thought of overriding __FILE__ but it turned out that that wasn't possible. Then I thought of setting __FILE__ after the bytecode was generated but wasn't able to figure out how to do it w/o coredumping. At the end I patched mrbc to be able to pass the required value from CL which means, to be compiled, minirake requires now a patched version of mruby. Great. :(

FileUtils

The last missing part of Rake I wanted to have was FileUtils. It may seems like useless & superfluous but we like Ruby for DSLs, thus its more idiomatic to write

mkdir 'foo/bar'

then

sh "mkdir -p foo/bar"

or even

exit 1 unless system "mkdir -p foo/bar" # [1]

FileUtils has some nice properties like the ability to print on demand what is happening or turn on 'no write' mode. For example, if you

include FileUtils::NoWrite

any 'destructive' command like rm or touch will do nothing.

I've looked into stdlib fileutils.rb & have quickly gave up. It's too much work to port it to mruby. Then I thought of making a thin wrapper around system commands with an FileUtils compatible API.

The idea is to generate a several sets of wrappers around simple methods in some FileUtilsSimple::Commands namespace so that user will never execute them directly but only through pre-generated static wrapper that decide what to do with a command.

Acquiring a list of singleton methods is easy but mruby never makes your life easy enough. The next mruby present was an absence of Kernel.method method. I don't even.

Unit Tests

Don't get tempted to test the ported code under MRI because your favorite test framework runs only under cruby. I've bumped into several occasions where test passes fine under cruby & fail miserably under mruby.

[1]Did I mention that Kernel.system just return a boolean & doesn't set $?? (Make a random guess in which implementation.)

Saturday, August 23, 2014

Mruby & A Self-Contained Subset of Rake

Since last time I've checked mruby, many things had changed. The biggest one was the introduction of compiled-time plugins that were confusingly called mrbgems. I have a completely different image in mind when I hear words ruby & gem together.

Still no love for require from matz.

To get an interpreter that is useful IRL, it's possible to cherry pick from a list of mrbgems. mruby-require plugin, sorry, gem is the most confusing one. If you specify it before other plugins, sorry, gems, all other gems (below it) will be compiled as .so libs & to use them you would write require foo & would immediately lose compatibility with MRI. After that, the helper

def mruby?
  RUBY_ENGINE == 'mruby'
end

& conditional checks is the only answer.

mruby build system is interesting. It uses a nano-version of Rake called minirake. By an unknown reason it's incompatible with mruby. At that point I thought "How would be cool to have rake as a standalone executable that doesn't depend on Ruby at all?".

What it has to do with mruby? It turns out, mruby can produce an array of bytecode that can be compiled with your C program into 1 executable.

It sounds cool but has its limitations. Firstly, you'll need to inline all your require statements to have 1 .rb source file. Secondly, remember, there is no stdlib in mruby. Plugins, sorry, gems, that try to bring it to mruby are nice but incomplete (for example, Dir misses glob).

You'll find problems in areas you've never imagined. For example, Ruby ISO standard doesn't mention ARGV & $0 (that't what I heard, the pdf paper is under 198 CHF paywall) which means, right, no ARGV & $0 by default--you'll need to look in mirb src to guess how to inject them.

Btw, googling won't help much, because most blog posts about mruby were written in 2012 & API is different now => old examples are mostly useless.

Back to rake. Porting 'real' rake is a daunting task. I just took minirake source, tweaked them a bit & wrote a tiny C wrapper with a couple of rakefiles: https://github.com/gromnitsky/minirake. Amazingly it seems to work. Glory to Japan!