Friday, August 5, 2011

Why pkg_version or portmaster -L Are Slow

Long time no see.

If you ever wondered why (in FreeBSD) any program that prints installed packages vs. available updates from the Ports is so freaking slow, then I have an answer for you and a maybe even a solution.

tl;dr: install pkg_noisrev gem. [1]

Why it is So Freaking Slow?

Just grabbing the installed list is simple: you read /var/db/pkg directory and you are done. The interesting part is how to check this data with the Ports tree.

Every correctly installed package records its origin in /var/db/pkg/package-name-1.2.3/+CONTENTS file. An origin is a relative path to the /usr/ports directory, for example for zip-3.0 package, the origin is archivers/zip.

Having that we can theoretically read corresponding Makefile (/usr/ports/archivers/zip/Makefile in our example) and compute the version number of the particular port. The problem is that 'the version' is a string that can containg 3 components (3 different Makefile varibles): port version, revision and epoch. Somethimes there is no port version but exists so called vedor version. Sometimes Makefile doesn't containt any version information at all but include (via a bsd make .include directive) another Makefile that can include another and so on.

So, to extract that information you need either:

  • Properly parse Makefile and recursively expand all its variables, read all includes, etc, i.e. write your own mini make utility.
  • Run "make -V PKGVERSION" command in the port directory.

You can guess what path was chosen by authors of the system pkg_version program or famous portmaster manager.

Think: run make for every package name; if you have 2,000 packages installed, make will run exactly 2,000 times.

To make thing worse, this is not the end of The Problem. Next quiz after obtaining the version number is how to compare 2 versions: the installed one and one from the Ports tree. They are not just simple numbers as in lovery rubygems. For example, what is newer: 4.13a or 4.13.20110529_1? Is there is a difference between 0.3.9.p1.20080208_7 and 0.3.9-pre1.20080208_7?

The system pkg_version utility contains a tedious comparation aloritm (/usr/src/usr.sbin/pkg_install/lib/version.c), reproduction of which is very boring. [2] So boring, that portmaster just calls "pkg_version -t v1 v2" to compare a pair of version number strings. Yes, if you have 2,000 packages installed, portmaster will execute make program 2,000 times + 2,000 times pkg_version program.

The last bit of slowness of such applications as pkg_version or portmaster is an iterative model. They read the whole packages list and process items one after another with 0 attempts to do things in parallel.

Can we do all that faster?

Yes, we can. A simple pkg_noisrev utility does that 4-5 times faster.

It tries to do a primitive parsing of Makefiles and if that fails only then executes make. It ships with the comparator extracted from pkg_version as a shared library that can be loaded once via Ruby dllopen method. It creates several threads and does things in parallel.

So, if you were running "portmaster -L" in the past, run "pkg_noisrev --likeportmaster" now.

[1]It requires Ruby 1.9.2.
[2]And, I dare to say, unsafe, because pkg_version doesn't have any tests you can reuse, so you can easily come up with some nasty bugs in your implementation.

No comments:

Post a Comment