Sunday, December 14, 2014

A Naive Benchmark of GnuPG 2.1 Symmetric Algorithms

Some symmetric algo benchmarks already exist, but still don't answer to a typical question for a typical setup:

I do a regular backup of N (or even K) gigabytes. I don't want the backup to be readable by a random hacker form Russia (if he breaks into my server). What algo should I use to encrypt the backup as fast as possible?

This rules out many existing benchmarks.

The typical setup also includes gpg2. I don't care about synthetic algo tests (like 'I read once that Rijndael is fast & 3DES is slow'), I'm interested in a particular implementation that runs on my machines.

(Note that benchmarks below are not 'scientific' in any way; they are meant to be useful for 1 specific operation only: encrypting binary blobs through ruby-gpeme.)

gpg2 cli program

The first thing I did was to run

$ gpg2 --batch --passphrase 12345 -o out --compress-algo none \
    --cipher-algo '<ALGO>' -c < file.tar.gz

But was quickly saddened because the results weren't consistent: the deviation between runs was too big.

What we needed here was to dissociate the crypto from the IO.

libgcrypt

'Modern' versions of GnuPG have detached a big chunk of the crypto magic into a separate low-level library libgcrypt. If we want to test symmetric ciphers w/o any additional overhead, we can write a nano version of gpg2.

It'll read some bytes from /dev/urandom, pad them (if a block cipher mode requires it), generate an IV, encrypt, prepend the IV to an encrypted text, append a MAC, run that for all libgcrypt supported ciphers. Then we can draw a pretty graph & brag about it to coworkers.

The problem is that there is no any docs (at least I haven't found them) about a general format that gpg2 uses for block ciphers. And you need it because a decipher must be able to know what algo was used, its cipher mode, where to search for a stored IV, etc.

There is OpenPGP RFC 4880 of course:

The data is encrypted in CFB mode, with a CFB shift size equal to the cipher's block size. The Initial Vector (IV) is specified as all zeros. Instead of using an IV, OpenPGP prefixes a string of length equal to the block size of the cipher plus two to the data before it is encrypted.

That's better than nothing, but still leaves us w/ n hours of struggling to write & test code that will produce an encrypted stream suitable for gpg2.

GPGME

GnuPG has an official library that even has bindings for such languages as Ruby. It's an opposite of libgcrypt: it does all the work for you, where libgcrypt doesn't even provide auto padding.

The trouble w/ gpgme is that it was unusable for automated testing purposes until GnuPG hit version 2.1 this fall.

For instance,

  • Versions 2.0.x cannot read passwords w/o pinentry.
  • At the time of writing, 2.1 isn't available on any major Linux distribution (except Arch, but I'm not using it anywhere (maybe I should)).
Writing a Benchmark

ruby-gpgme has a nifty example for symmetric ciphers:

crypto = GPGME::Crypto.new password: '12345'
r = crypto.encrypt "Hello world!\n", symmetric: true

where r.read() will return an encrypted string.

We have 2 problems here:

  1. There is absolutely no way to change through the API the symmetric cipher. (The default one is CAST5.) This isn't a fault of ruby-gpgme, but the very same gpgme library under it.

    GnuPG has a concept of a 'home' directory (it has nothing to do w/ user's home directory, it just uses it as a default). Each 'home' can have its number of configuration files. We need gpg.conf file there w/ a line:

    personal-cipher-preferences <algo>
    
  2. The modest password: '12345' option does nothing unless archaic gpg1 is used. W/ gnupg 2.0.x an annoying pinentry window will pop-up.

    E.g. installing 2.1 is the only option. Instead overwriting the existing 2.0.x installation (and possibly breaking your system), install 2.1 under a separate prefix (for example, to ~/tmp/gnupg).

    Next, for each gpg 'home' directory we need to add to gpg.conf another line:

    pinentry-mode loopback
    

    & create a gpg-agent.conf file w/ a line:

    allow-loopback-pinentry
    

The benchmark works like this:

  1. Before running any crypto operations, for each cipher we create a 'home' directory & fill it w/ custom gpg.conf & gpg-agent.conf files.
  2. Start a bunch of copies of gpg-agent, each for a different 'home' dir.
  3. Add a bin directory of our fresh gnupg 2.1 installation to the PATH, for example ~/tmp/gnupg/bin.
  4. Set LD_LIBRARY_PATH to ~/tmp/gnupg/lib.
  5. Generate 'plaint text' as n bytes from /dev/urandom.
  6. Encode 'plain text' w/ a list of all supported symmetric ciphers.
  7. Print the results.

Ruby script that does this can be cloned form https://github.com/gromnitsky/gpg-algo-speed. You'll need gpgme & benchmark-ips gems. Run the file benchmark from the cloned dir.

Results

AMD Sempron 145, Linux 3.11.7-200.fc19.x86_64

$ ./benchmark /opt/tmp/gnupg $((256*1024*1024))
Plain text size: 268,435,456B
Calculating -------------------------------------
                idea     1.000  i/100ms
                3des     1.000  i/100ms
               cast5     1.000  i/100ms
            blowfish     1.000  i/100ms
                 aes     1.000  i/100ms
              aes192     1.000  i/100ms
              aes256     1.000  i/100ms
             twofish     1.000  i/100ms
         camellia128     1.000  i/100ms
         camellia192     1.000  i/100ms
         camellia256     1.000  i/100ms
-------------------------------------------------
                idea      0.051  (± 0.0%) i/s -      1.000  in  19.443114s
                3des      0.037  (± 0.0%) i/s -      1.000  in  27.137538s
               cast5      0.059  (± 0.0%) i/s -      1.000  in  16.850647s
            blowfish      0.058  (± 0.0%) i/s -      1.000  in  17.183059s
                 aes      0.059  (± 0.0%) i/s -      1.000  in  17.080337s
              aes192      0.057  (± 0.0%) i/s -      1.000  in  17.516253s
              aes256      0.057  (± 0.0%) i/s -      1.000  in  17.673528s
             twofish      0.057  (± 0.0%) i/s -      1.000  in  17.533964s
         camellia128      0.054  (± 0.0%) i/s -      1.000  in  18.359755s
         camellia192      0.053  (± 0.0%) i/s -      1.000  in  18.712756s
         camellia256      0.054  (± 0.0%) i/s -      1.000  in  18.684303s

Comparison:
               cast5:        0.1 i/s
                 aes:        0.1 i/s - 1.01x slower
            blowfish:        0.1 i/s - 1.02x slower
              aes192:        0.1 i/s - 1.04x slower
             twofish:        0.1 i/s - 1.04x slower
              aes256:        0.1 i/s - 1.05x slower
         camellia128:        0.1 i/s - 1.09x slower
         camellia256:        0.1 i/s - 1.11x slower
         camellia192:        0.1 i/s - 1.11x slower
                idea:        0.1 i/s - 1.15x slower
                3des:        0.0 i/s - 1.61x slower

Algo         Total Iterations
       idea          2
       3des          2
      cast5          2
   blowfish          2
        aes          2
     aes192          2
     aes256          2
    twofish          2
camellia128          2
camellia192          2
camellia256          2

As we see, 3DES is indeed slower that Rijndael.

(The plot is written in Grap. It doesn't really matter but I wanted to show off that I was tinkering w/ a Bell Labs language from 1984 that nobody is using anymore.)

In the repo above there is the result for 3G blob (w/ compression turned on), where Ruby garbage collector has run amok.

Wednesday, December 3, 2014

hackernews2nntp

It has been almost 2 month since YC folks have announced their official Hacker News API & have threatened us w/ an imminent HN design change. (When I say 'us' I mean authors of various Chrome extensions or web scrapers.)

Writing YA interface on a top of a common backend is exciting only if you are 17 y.o. Instead of inventing a 'new' forum-like view I've decided to make a one-way HN to NNTP 'convertor', so that I can read HN in mutt. Like this:

https://raw.github.com/gromnitsky/hackernews2nntp/master/screenshot1.png

Why NNTP?

Because of a history of newsreaders UI, reading something that represents a newsgroup means:

  1. Being able not to read the same post (article) twice (the client software marks old articles).
  2. Local filtering. Highlighting favourite authors, hiding trolls, sorting by date, thread, etc.
  3. The offline mode (if you have a local NNTP server on your laptop).

Some time ago I've specifically wrote a Chrome extension for items 1-2, but have never impemented the custom thread sorting in it.

Moving your reading activities to mutt has its disadvantages:

  • No up-voting.

  • No score updates.

  • Once article is fetched & posted, it's very cumbersome to post it again if the content of it changes. You have to check w/ the server if it has the article w/ a particular message id, check for body differences, change the message id of a new article (otherwise the server will reject it as a duplicate), and possibly modify its References header to point it out to the old version.

    In short, I didn't do that. Once the article is posted it stays the same.

The original idea was to run some-gateway as a daemon that would have monitor for NH updates & would have immidiately convert new stories/comments. That turned out to be impractical because my laptop isn't on 24/365. Instead I took an old usenet path: donwload a bunch of articles & read them later.

The old way has 2 primary advantages:

  • There is no need to save the program state, because if we download an article twice (now & in the previous run), NNTP server will reject a duplicate.
  • It can help w/ HN addiction. You run some-convertor once a day & read all the interesting staff in your scheduled 'NH time'.

Then, if we use a decent article injector, it'll spool undelivered articles (for example if the NNTP server isn't responding) & post them in the next run automatically.

In the end, I run

  $ hackernews2nntp-get top100 -v | hackernews2nntp-convert -v | sudo rnews -N  

once a day & practically never visit the HN website.

You can read more about the convertor here: https://github.com/gromnitsky/hackernews2nntp

Saturday, September 13, 2014

Porting Code to MRuby

If you take a random library from Ruby stdlib & try to use it under mruby, expect failure. If everything seems to work out of the box it's either (a) a miracle or (b) (more likely) you haven't tested the library enough.

The 1st thing I've tried to bring to minirake was FileList. It turned out that FileList uses Dir.glob (glob wasn't implementated in mruby-dir). It turned out that Dir.glob internally uses File.fnmatch (fnmatch wasn't implemented in mruby-io).

Have you ever used File.fnmatch in your code? You usually stumble across its pattern language only as sub-patterns of Dir.glob patterns. For example, Dir.glob adds ** & { } syntax.

In MRI, File.fnmatch is implemented in C. Extracting it to a plain C library w/o Ruby dependency is relatively quick & simple. This is how Rubinius team ported it & so did I. There nothing interesting about the library except maybe the notion that for some reason MRI version returns 0 as a mark of successful match & 1 otherwise.

Dir.glob is a more complex story. Again, in MRI it's implemented in C. At 1st I wanted to do for glob the same job as for fnmatch but glob has too many calls to MRI API that hasn't direct equivalents in mruby. I was lucky not to have to mess with C because Rubinius had its own version of Dir.glob written in Ruby.

It didn't go so smoothly as I hoped because the code isn't a 'pure' Ruby but an Rubinius version of it with annoying calls like Rubinius::LRUCache, Regexp.match_from, String.byteslice. (The last one is from Ruby 1.9+ but mruby still lacks it.)

After the porting struggle I checked the result with unit tests for Dir.glob from MRI & amazingly they worked fine which was a pleasant surprise because I wasn't expecting the good outcome.

Then came FileList turn

As every library that was written by Jim Weirich it's (a) very well documented, (b) uses metaprogramming a lot.

While changing class_eval calls with interpolated strings to a class_eval with blocks & define_method was easy, bugs have started to arrive from unexpected & funny areas. For example:

$ ruby -e "p File.join ['a', 'b', 'c']"
"a/b/c

vs.

$ mruby -e "p File.join ['a', 'b', 'c']"
["a", "b", "c"]

Or even better:

$ ruby -e 'p [nil] <=> [nil]'
0
$ mruby -e 'p [nil] <=> [nil]'
trace:
        [1] mrblib/array.rb:166:in Array.<=>
        [0] -e:1
mrblib/array.rb:166: undefined method '<=>' for nil (NoMethodError)

The same goes for NilClass & <=>. File.extname behaves differently, File.split is missing, etc.

In many cases it isn't mruby fault but mrbgem libraries, but the whole ecosystem is in a state that isn't suitable for people with weak nerves. Sometimes I thought that 'm' in mruby actually means 'masochistic'.

After the porting struggle with Array methods like | & + I took unit tests from Rake & amazingly they worked almost file (there is no StringIO in mruby) which wasn't a pleasant surprise because at that point I got angry.

__FILE__

Do you know that __FILE__ is a keyword & __dir__ is a method? You can monkey patch __dir__ in any moment, but can do nothing to __FILE__. I didn't know that.

Making an executable with mruby involves producing the bytecode which can be statically linked to the executable & loaded via mrb_read_irep function at the runtime.

Bytecode can be generated with mrbc CL utility that ships with mruby. It sets value for __FILE__ according to its CL arguments. For example:

$ mrbc -b mycode foo/bar/main.rb

will set __FILE__, for bytecoded main.rb, to foo/bar/main.rb. If you have an executable named foobar & use main.rb as an entry point it your Ruby code, the classic trick

do_someting if __FILE__ == $0

won't give the result you've expected.

At 1st I thought of overriding __FILE__ but it turned out that that wasn't possible. Then I thought of setting __FILE__ after the bytecode was generated but wasn't able to figure out how to do it w/o coredumping. At the end I patched mrbc to be able to pass the required value from CL which means, to be compiled, minirake requires now a patched version of mruby. Great. :(

FileUtils

The last missing part of Rake I wanted to have was FileUtils. It may seems like useless & superfluous but we like Ruby for DSLs, thus its more idiomatic to write

mkdir 'foo/bar'

then

sh "mkdir -p foo/bar"

or even

exit 1 unless system "mkdir -p foo/bar" # [1]

FileUtils has some nice properties like the ability to print on demand what is happening or turn on 'no write' mode. For example, if you

include FileUtils::NoWrite

any 'destructive' command like rm or touch will do nothing.

I've looked into stdlib fileutils.rb & have quickly gave up. It's too much work to port it to mruby. Then I thought of making a thin wrapper around system commands with an FileUtils compatible API.

The idea is to generate a several sets of wrappers around simple methods in some FileUtilsSimple::Commands namespace so that user will never execute them directly but only through pre-generated static wrapper that decide what to do with a command.

Acquiring a list of singleton methods is easy but mruby never makes your life easy enough. The next mruby present was an absence of Kernel.method method. I don't even.

Unit Tests

Don't get tempted to test the ported code under MRI because your favorite test framework runs only under cruby. I've bumped into several occasions where test passes fine under cruby & fail miserably under mruby.

[1]Did I mention that Kernel.system just return a boolean & doesn't set $?? (Make a random guess in which implementation.)

Saturday, August 23, 2014

Mruby & A Self-Contained Subset of Rake

Since last time I've checked mruby, many things had changed. The biggest one was the introduction of compiled-time plugins that were confusingly called mrbgems. I have a completely different image in mind when I hear words ruby & gem together.

Still no love for require from matz.

To get an interpreter that is useful IRL, it's possible to cherry pick from a list of mrbgems. mruby-require plugin, sorry, gem is the most confusing one. If you specify it before other plugins, sorry, gems, all other gems (below it) will be compiled as .so libs & to use them you would write require foo & would immediately lose compatibility with MRI. After that, the helper

def mruby?
  RUBY_ENGINE == 'mruby'
end

& conditional checks is the only answer.

mruby build system is interesting. It uses a nano-version of Rake called minirake. By an unknown reason it's incompatible with mruby. At that point I thought "How would be cool to have rake as a standalone executable that doesn't depend on Ruby at all?".

What it has to do with mruby? It turns out, mruby can produce an array of bytecode that can be compiled with your C program into 1 executable.

It sounds cool but has its limitations. Firstly, you'll need to inline all your require statements to have 1 .rb source file. Secondly, remember, there is no stdlib in mruby. Plugins, sorry, gems, that try to bring it to mruby are nice but incomplete (for example, Dir misses glob).

You'll find problems in areas you've never imagined. For example, Ruby ISO standard doesn't mention ARGV & $0 (that't what I heard, the pdf paper is under 198 CHF paywall) which means, right, no ARGV & $0 by default--you'll need to look in mirb src to guess how to inject them.

Btw, googling won't help much, because most blog posts about mruby were written in 2012 & API is different now => old examples are mostly useless.

Back to rake. Porting 'real' rake is a daunting task. I just took minirake source, tweaked them a bit & wrote a tiny C wrapper with a couple of rakefiles: https://github.com/gromnitsky/minirake. Amazingly it seems to work. Glory to Japan!

Wednesday, April 30, 2014

Antislacker

Last weekend I wrote a small Chrome extension that helps me to avoid facebook & livejournal. I mean I allow myself to stare at them for 10 minutes max & then Antislaker (the extension) kicks in & blocks those 2 domains for the rest of the day.

The idea is this:

  1. In background.coffee we look for a domain name match. If the match was successfully, we inject a chunk of JS code.
  2. The injected peace of the code contains a counter that every 5 seconds updates a record in localStore.
  3. When a time limit comes, we move user to our internal page withing Chrome extension that shows a random Dilbert comics.

The most tricky part was making a 'mutex' for localStore records. Because the user can open several facebook pages & the counter (in the worst case) will count 2 times faster. It's actually a pity that we don't have any concurrency primitives in JS & so we have to invert poor man's busy waiting when using timers.