Sunday, January 8, 2023

Moving an EFI partition to a thumb drive

This is how one can "secure" Windows 11 desktop machine: transfer an EFI system partition from its internal drive to an external one. Then the machine will work only if a thumb drive is physically connected to it.

The closest analogy would be a smart card authentication, but without a PIN.

(This is a comical technique I've accidentally bumped into while trying to recover Windows partitions after a botched Ubuntu installation.)


  • Windows 11/22h2;
  • UEFI & GPT;
  • a bootable live Linux iso with gparted;
  • a thumb drive.

Modus operandi:

  • A user is turning a PC on, but its UEFI can't find any EFI partition on the connected drives: no boot. Oh noes, something is broken!

  • The user is inserting a specially prepared USB flash drive into the PC. UEFI either automatically finds an EFI partition on it, or asks the user to select a device to boot from. If the EFI partition contains a correct entry for a Windows partition, the PC boots into Windows. If the user ejects the flash drive, Windows throws a bsod within a second. (The last one is an interesting bonus I didn't anticipate.)

Here's an example of a partition table of a working Windows intallation before we do anything to it:

$ lsblk -o name,pttype,fstype,size,tran,partlabel | grep nvme
nvme0n1 gpt 60G nvme
├─nvme0n1p1 gpt vfat 100M nvme EFI system partition
├─nvme0n1p2 gpt 16M nvme Microsoft reserved partition
├─nvme0n1p3 gpt ntfs 59.3G nvme Basic data partition
└─nvme0n1p4 gpt ntfs 625M nvme

/dev/nvme0n1p1 was automatically made by the Windows setup. We are going to recreate it on a USB flash drive & reformat aftewards.

(In Windows) Insert a thumb drive. If it contains a fs Windows doesn't recognise–format it to fat32 & make sure Windows assigns a drive letter to it. In our example it's the letter D.

Under an administrator account run diskpart, type list disk & observe what number is assigned to the flash drive. (1 in our example.)

The thumb drive must be of GPT layout:

> cat efi.diskpart
select disk 1
clean all
convert gpt noerr
create partition efi

The following steps are destructive:

> diskpart /s efi.diskpart
> format /q /v:elf_boot /fs:fat32 /y d:

Create EFI/Microsoft/Boot directory and copy all required boot-environment files to it:

> bcdboot $env:systemroot /s d: /f uefi

Now you can examine D: drive, reboot, & instruct UEFI to boot from the usb. If all goes well, you'll need to boot from any Linux live .iso & reformat /dev/nvme0n1p1. The filesystem type doesn't matter (fat32 is fine), the only important detail is that you don't remove this partition, otherwise Windows throws IO1 INITIALIZATION FAILED bsod.

Finally, here's an adorable error that appears when you eject the thumb drive after a successfull boot:

Thursday, June 16, 2022

gdk-pixbuf loaders from mingw

I was mildly rejoicing after a successful cross-compilation of a Linux gtk3 app to Windows, when a warning from a resulting .exe appeared:

Could not load a pixbuf from /org/gtk/libgtk/theme/Adwaita/assets/bullet-symbolic.svg. This may indicate that pixbuf loaders or the mime database could not be found.

To even get to this warning, one has to copy all the required dlls & icons (5007, in my case) to a directory structure (called installation folder) like so:

$ tree -L 3
├── bin
│   ├── app.exe
│   ├── libgdk-3-0.dll
│   ├── libgdk_pixbuf-2.0-0.dll
│   ├── libgio-2.0-0.dll
│   ├── libglib-2.0-0.dll
│   ├── libgmodule-2.0-0.dll
│   ├── libgobject-2.0-0.dll
│   ├── libgtk-3-0.dll
│   ├── ...
├── lib
│   └── gdk-pixbuf-2.0
│   └── 2.10.0
└── share
├── glib-2.0
│   └── schemas
└── icons
├── Adwaita
└── hicolor

Here, gdk-pixbuf dlls gave me all the grief, particularly

$ peldd lib/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-svg.dll

Without gdk-pixbuf libraries loaded you get the aforementioned warning, that drove me nuts, & subtle rendering glitches, like no visible GtkSpinner.

Turns out, all you need is to

  1. cd to the root of the installation folder;
  2. run gdk-pixbuf-query-loaders.exe --update-cache.

This generates correct loaders.cache file.

Saturday, April 2, 2022

In search of a decent offline android Polish dictionary

Dime a dozen Play Store apps

As always, none of them are any good, for no one who wrote those apps is using them. Most of them are adware &/or junkware that crashes on random input.

One curious app is PWN-Oxford Dictionary, with, what it seems, a bespoke dictionary from a legit Polish book publisher Wydawnictwo Naukowe PWN. The Play Store says it's "not available for your device", though. At first, I thought the app has never been updated for Android 10+, but after opening the listing on a tablet with Android 8.x, I got a message saying the app isn't unavailable in my region (Ukraine). Great.

A generic dictionary app + a separate polish dictionary

Due to the peculiarities of Polish orthography, such an app should support diacritic-insensitive lookups. I settled on aard2.

The next step was to find a dictionary in the slob format. + wiktionary

Some Good Samaritan has webscrapped (słownik języka polskiego, a crowdsourced dictionary) & augmented the definitions from Wikisłownika (

The dictionary comes pre-formatted for Kindle, but there is also a .txt version in a simple word\tdefinition form. I grabbed & fed it to pyglossary, filtering out entries without useful content:

$ unzip -p | sed -E 's/\t[^<]+(<h1)/\t\1/' \
| grep -v '</h1>$' \
| grep -v '</h1><p><b<p class="s">Wikipedia</p>$' > sjp.txt
$ pyglossary/ sjp.txt sjp.slob

$ du *slob
27820K sjp.slob


The same company that made the geolocked app above, was publishing desktop dictionaries for Windows in the 2000s. Someone has uploaded Słownik języka polskiego PWN as an .iso on I'm not sure it fells under a category of abandonware, thus make of it what you will. Out of curiosity I tried to run it on w11 & then on a w7 vm, but the program's installer wouldn't even start. It did successfully run on a w2k vm.

Anyway, this desktop program contains a file named (59M). We can do a 2-step conversion: .win → .txt → slob, where .txt means a so called tabfile format, using the parlance of pyglossary.


  1. # dnf install unshield bsdtar
  2. Clone pwn2dict & pyglossary repos.

Save this Makefile

i := Slownik.iso
pwn2dict := ~/Downloads/pwn2dict/
pyglossary := ~/Downloads/pyglossary/
out := _out
cache := $(out)/cache

$(out)/pwn_słownik.slob: $(cache)/pwn_słownik.txt
$(pyglossary) $< $@

$(cache)/%.txt: $(cache)/Tekst/Data/
$(pwn2dict) -t $< $@

$(cache)/ $(cache)/setup/ $(cache)/setup/data1.hdr
unshield -d $(cache) -g Tekst x $<

$(cache)/setup/%: $(i)
@mkdir -p $(dir $@)
bsdtar -xf $< -C $(cache) setup/$*

in the same directory as Slownik.iso, correct the values of pwn2dict & pyglossary variables if nessesary, then run make. The result

$ du _out/*slob
9892K _out/pwn_słownik.slob

is 6 times smaller than the original .win file.


The following has nothing to do with Android, but it brought a smile to my face. file can be reused on a regular machine with a neat little ncurses program–a viewer of various PWN dictionaries. Compile it, then run as

$ pwnsjp -f /path/to/

Wednesday, February 9, 2022

GPU passthrough in Fedora with plain QEMU

QEMU 6.1.0
Host OS Fedora 35
Guest OS Windows 10 21H1
CPU AMD Ryzen 5 PRO 4650G (Zen 2, Renoir)
Host GPU Radeon Vega 7
Guest GPU Radeon HD 8570 (R7 240, Oland)


# dnf install qemu qemu-img libvirt-daemon

Enable virtualisation in your bios. For AMD, the option is usually called SVM. If a motherboard has a separate IOMMU option, leave it in auto state.


$ virt-host-validate | grep hardware | tr -s \\040
QEMU: Checking for hardware virtualization : PASS

(You may uninstall libvirt-daemon package, for we are using plain QEMU without libvirt.)

A guest GPU ROM must support UEFI. R7 240 in the table above is perhaps the cheapest valid GPU (for a passthrough) that money can buy, but which is still miles ahead of emulated graphic cards in QEMU or VMWare.

Connect an hdmi/dp cable from your 2nd GPU to a 2nd monitor (I actually use 1 monitor with 2 inputs).

Connect a spare USB keyboard & a mouse, i.e. your machine must have 2 physical keyboard+mouse pairs. After you successfully install a guest OS, software like Barrier makes the 2nd pair unnecessary (unless you play games).

IOMMU Groups

IOMMU is a feature that allows us to passthrough PCI devices. In Fedora it's off by default. Simultaneously with turning it on, we are going to unhook our 2nd GPU from it's current driver & hook it to a driver called vfio-pci.

PCI devices are mapped into IOMMU groups. You can passthrough a group of devices as a unit. That means if your main GPU is in the same group as a secondary one, you're screwed.

This (unfortunately long) script prints all IOMMU groups:

$ cat iommu-groups
# shellcheck disable=2012

[ -z "$1" ] || {
$0 | awk 'BEGIN { RS = "" } /'"$1"'/ {print; m++} END { exit (m == 0) }'
exit $?

device_info() {
lspci -s "$1" -nnk | sed -E 's/^[0-9].+/printf "* %s" "\0" | fmt -t/e'

ls -d /sys/kernel/iommu_groups/* | sort -V | while read -r grp; do
echo "${grp##*/}"
for dev in "$grp/devices/"*; do device_info "${dev##*/}"; done
echo ''

Check a particular group & what drivers it uses:

$ ./iommu-groups Oland
* 10:00.0 VGA compatible controller [0300]: Advanced Micro Devices,
Inc. [AMD/ATI] Oland [Radeon HD 8570 / R5 430 OEM / R7 240/340 /
Radeon 520 OEM] [1002:6611]
Subsystem: Dell Radeon R5 240 OEM [1028:210b]
Kernel driver in use: amdgpu
Kernel modules: radeon, amdgpu
* 10:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000 Series]
Subsystem: Dell Device [1028:aab0]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

Group #8 contains only 2 devices: 2nd GPU & its audio interface.

To turn IOMMU support on & hook the devices we're interested in to vfio-pci, we need to modify kernel parameters & rebuild initramfs.

$ grep CMDLINE /etc/default/grub | fmt -t
GRUB_CMDLINE_LINUX="resume=/dev/sdb6 iommu=1 amd_iommu=on
rd.driver.pre=vfio_pci vfio-pci.ids=1002:6611,1002:aab0"

Parameter vfio-pci.ids=1002:6611,1002:aab0 binds an array of devices to a vfio driver that prevents the host OS from interacting with them.

1002:6611, for example, is a PCI vendor:device pair from the snipped above.

To commit changes, type

# dracut -fv
# grub2-mkconfig -o /boot/grub2/grub.cfg

& reboot.

$ ./iommu-groups Oland | grep 'Kernel driver'
Kernel driver in use: vfio-pci
Kernel driver in use: vfio-pci

QEMU calls

This is the most annoying part. If you do it 'incorrectly', you may end up with a purportedly successfully installed guest OS, but then Windows updates GPU drivers and–boom!–the monitor goes black.

This (unfortunately big) makefile will get you started:

$ cat Makefile
iso := Win10_21H1_English_x64.iso
gpu := 10:00.0
audio := 10:00.1
usb_kbd := 24ae 1001
mem := 3G
cpu_cores := 2
hda := c.qcow2
hda.size := 64G

$(if $(SUDO_USER),,$(error Run this under sudo))
uefi_firmware := /usr/share/edk2/ovmf/OVMF_CODE.fd

all: uefi-vars.fd $(hda)
modprobe kvm_amd
qemu-system-x86_64 \
-enable-kvm \
-machine q35,accel=kvm,kernel_irqchip=on \
-cpu host,kvm=off,hv-vendor-id=1234567890ab \
-smp cores=$(cpu_cores) \
-m $(mem) \
-cdrom $(iso) \
-boot order=d \
-drive if=pflash,format=raw,readonly=on,file=$(uefi_firmware) \
-drive if=pflash,format=raw,file=$< \
-usb \
-device usb-host,vendorid=0x$(word 1, $(usb_kbd)),productid=0x$(word 2, $(usb_kbd)) \
-device ioh3420,id=hub1,chassis=0,slot=0,bus=pcie.0 \
-device vfio-pci,bus=hub1,addr=00.0,host=$(gpu),multifunction=on \
-device vfio-pci,bus=hub1,addr=00.1,host=$(audio) \
-vga none \
$(if $(offline),-nic none,-device e1000,netdev=n0 -netdev user,id=n0) \

uefi-vars.fd: /usr/share/edk2/ovmf/OVMF_VARS.fd
cp $< $@

qemu-img create -f qcow2 $@ $(hda.size)

Variables of interest:

gpu := 10:00.0
audio := 10:00.1
usb_kbd := 24ae 1001

gpu & audio are bus numbers you can obtain from iommu-groups script. usb_kbd here contains IDs (vendor product) for a wireless keyboard (that also includes a touchpad); you can look them up with lsusb command.

The makefile creates an empty disk drive, copies required files for a UEFI instance & runs qemu.

The holy grail is

-device ioh3420,id=hub1,chassis=0,slot=0,bus=pcie.0 \
-device vfio-pci,bus=hub1,addr=00.0,host=$(gpu),multifunction=on \
-device vfio-pci,bus=hub1,addr=00.1,host=$(audio) \
-vga none \

Without ioh3420 'north bridge' device I had the aforementioned issue with a black screen after Windows had had auto-updated AMD drivers.

When you run the makefile for the first time, qemu may try to boot from the network & if that takes too long to fail, rerun the command as

$ sudo make offline=1

It may also drop you into a UEFI interactive shell. Type exit there:

Next, choose Boot Manager menu item:

Finally, select 'DVD-ROM' option, press Return, then quickly press Return on your 2nd keyboard (connected to the VM) as well, & your 2nd monitor should display Windows installation screen.

Monday, December 13, 2021

Renaming Devices in PipeWire

To rename a device description (for example, for a USB headset) you need to tweak your current PipeWire's session manager.

At the time of writing, there are 2 of them available: Media Session & WirePlumber. Fedora 35 uses the latter one.

To see which one you're running:

$ systemctl --user show pipewire-session-manager | egrep Id\|SubState

While PipeWire daemon uses a superset of JSON for its configuration, WirePlumber uses Lua tables.

$ mkdir -p ~/.config/wireplumber/main.lua.d
$ cd !$
$ cp /usr/share/wireplumber/main.lua.d/50-alsa-config.lua .

In 50-alsa-config.lua file we need to add a proper entry to alsa_monitor.rules array. For example, to rename a default built-in sink from a too generic "Built-in Audio Analog Stereo" to "Desktop Speakers":

matches = {
{{"","matches","alsa_output.pci-0000_00_14.2.analog-stereo" }}
apply_properties = { ["node.description"] = "Desktop Speakers" }

(To check syntax of a .lua file, run luac -p file.)

alsa_output.pci-0000_00_14.2.analog-stereo string is a node name. You can get it either from

$ pw-cli ls Node | awk 'BEGIN {RS="\n\tid "; ORS="\n\n"} /Sink/'


$ pw-dump | json -c '["media.class"] === "Audio/Sink"' -a info.props

In any case, after you modify files in ~/.config/wireplumber/ directory, restart PipeWire server:

$ systemctl --user restart pipewire

Monday, November 1, 2021


After reading about a storm in a teacup with the which(1) utility in Debian, I decided to play code golf with myself: write a couple of minimalistic which(1) implementations in different languages. Before starting, I thought a shell script would be the cleanest solutions, but that prediction turned out to be wrong.

The spec:

  1. The util should stop immidiately after the first non-existing executable, e.g.

     $ ./my-which ls BOGUS cat
    ./my-which: BOGUS not found in PATH
  2. It should report an error to stderr & return with the code > 1 in case of the error.

The programs below are sorted by terseness.

GNU Make

The Make's manual contains a neat example of pathsearch function that abuses the internal wildcard function in a macro. We can use it with a 'match-anything' target:

#!/usr/bin/make -f
f = $(firstword $(wildcard $(addsuffix /$1,$(subst :, ,$(PATH)))))
%:;@echo $(or $(call f,$@),$(error $@ not found in PATH))

It works like this:

$ ./ ls BOGUS cat
/usr/bin/ls *** BOGUS not found in PATH. Stop.
$ echo $?

That's it. 2 lines + a shebang. If you're unfamiliar with the Make language, I advise you to try it.


A slightly bigger example that still fits in several lines:

#!/usr/bin/env ruby
def f e; (ENV['PATH'] || '').split(?:).map{|d| d+'/'+e}.filter{|p| File.executable?(p)}[0]; end
ARGV.each {|e| puts(f(e) || abort("#{$0}: #{e} not found in PATH")) }

We cheated here a little: there's no check if a file is a directory. Nothing stops you from adding but that increases the length of such a toy program by 18 bytes!


I thought it would be shorter:


f() {
for e in $PATH; do
[ -x "$e/$1" ] && { echo "$e/$1"; return; }
return 1

for d in "$@"; do
f "$d" || { echo "$0: $d not found in PATH" 1>&2; exit 1; }

If you decide to use f() in your scripts, a cut-&-paste won't do: you'll need to save & restore the value of IFS variable & mark e as the local one.

node: callbacks

Async IO doesn't always make life easier.
A philosopher

#!/usr/bin/env node
let fs = require('fs')

let f = (ok, error) => {
let dirs = (process.env.PATH || '').split(':')
return function dive(e) {
let dir = dirs.shift() || error(e); if (!dir) return
let file = dir+'/'+e
fs.access(file, fs.constants.X_OK, err => err ? dive(e) : ok(file))

let args = process.argv.slice(2)
let main = exe => exe && f( e => (console.log(e), main(args.shift())), e => {
console.error(`${process.argv[1]}: ${e} not found in PATH`)
process.exitCode = 1


Again, no checks whether a file is a directory.

We could've avoided callbacks, of course–node has fs.accessSync(), but it throws an exception. Also, just to make this slightly more challenging, I decided to avoid process.exit().

node: FP runs amok

sassa_nf didn't like the example above, mainly because of Array.prototype.shift(), & provided an enhanced version:

#!/usr/bin/env node
const fs = require('fs')
const dirs = (process.env.PATH || '').split(':')

const f = (e, cont) => => d + '/' + e)
.reduce((p, d) => g => p(f => f ? g(f):
fs.access(d, fs.constants.X_OK, err => g(!err && d))),
f => f())(f => f ? (console.log(f), cont()):
(console.error(`${process.argv[1]}: ${e} not found in PATH`), process.exitCode = 1))

process.argv.slice(2).reduce((p, c) => g => p(_ => f(c, g)), f => f())(_ => _)

To understand how it works, you'll need to reformat the arrow function expressions. Nevertheless, I think it serves an artistic purpose as is.

Node, async/await

Certainly, callbacks were an unfortunate chain of events. Thankfully, we have promises for a long time now.

#!/usr/bin/env node

let {access} = require('fs/promises')

let afilter = async (arr, predicate) => {
return (await Promise.allSettled(
.filter( v => v.status === 'fulfilled').map( v => v.value)

let f = e => afilter((process.env.PATH || '').split(':'), async p => {
await access(p+'/'+e, 1)
return p+'/'+e

async function main() {
let args = process.argv.slice(2).map( async p => {
return {exe: p, location: await f(p)}

for await (let r of args) {
if (!r.location.length) {
console.error(`${process.argv[1]}: ${r.exe} not found in PATH`)
process.exitCode = 1


This was tested with node v17.0.1.

I leave it up to you to judge which one of the node variants is more idiotic.


It was impossible to leave it out. It's the longest one, but I consider all the node examples much worse.

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <err.h>
#include <string.h>
#include <limits.h>
#include <stdbool.h>

bool is_exe(const char *name) {
struct stat s;
if (stat(name, &s)) return false;
return (s.st_mode & S_IFMT) == S_IFREG && (s.st_mode & S_IXUSR);

bool exe(const char *dir, const char *e, void *result) {
char *r = (char*)result;
snprintf(r, PATH_MAX, "%s/%s", dir, e);
if (!is_exe(r)) {
r[0] = '\0';
return false;
return true;

void f(const char *e, bool (*callback)(const char *, const char *, void *), void *result) {
char *path = strdup(getenv("PATH") ? getenv("PATH") : "");
char *PATH = path;
char *dir, *saveptr;
while ( (dir = strtok_r((char*)PATH, ":", &saveptr))) {
if (callback(dir, e, result)) break;

int main(int argc, char **argv) {
for (int idx = 1; idx < argc; idx++) {
char e[PATH_MAX];
f(argv[idx], exe, e);
strlen(e) ? (void)printf("%s\n", e) : errx(1, "%s not found in PATH", argv[idx]);

Coincidently, this version is the most correct one: it won't confuse a directory with an executable.

Thursday, September 9, 2021

Basic Latin, Diacritical Marks & IMDB

This is a story of not placing trust in public libraries.

The IMDB website has an auto-complete input element. While its mechanism isn't documented anywhere, you can easily explore it with curl:

$ alias labels='json d | json -a l'
$ imdb=

$ curl -s $imdb/a/ameli.json | labels
Amelia Warner (I)
Austin Amelio
Amelia Clarkson
Amelia Rose Blaire
Amelia Heinle
Amelia Bullmore
Amelia Eve

The endpoint understands acute accents & strokes:

$ curl -s $imdb/b/boże+ciało.json | labels
Corpus Christi
Corpus Christi
Olecia Obarianyk
Alecia Orsini Lebeda
Zwartboek: The Special
The Cult: Edie (Ciao Baby)
Anne-Marie: Ciao Adios
The C.I.A.: Oblivion

(Corpus Christi is the translation of Boże Ciało.)

The funny part starts when you try to enter the same string (boże ciało) in the input field on the IMDB website:

Where's the movie? Turns out, the actual query that a page makes looks like

boe_ciao? Apparently, it tried to convert the string to a basic latin set, replacing spaces with an undescore along the way. It's not terribly hard to spot a little problem here.

This is the actual function that does the convertion:

var ae = /[àÀáÁâÂãÃäÄåÅæÆçÇèÈéÉêÊëËìÍíÍîÎïÏðÐñÑòÒóÓôÔõÕöÖøØùÙúÚûÛüÜýÝÿþÞß]/
, oe = /[àÀáÁâÂãÃäÄåÅæÆ]/g
, ie = /[èÈéÉêÊëË]/g
, le = /[ìÍíÍîÎïÏ]/g
, se = /[òÒóÓôÔõÕöÖøØ]/g
, ce = /[ùÙúÚûÛüÜ]/g
, ue = /[ýÝÿ]/g
, de = /[çÇ]/g
, me = /[ðÐ]/g
, pe = /[ñÑ]/g
, fe = /[þÞ]/g
, be = /[ß]/g;

function ve(e) {
if (e) {
var t = e.toLowerCase();
return t.length > 20 && (t = t.substr(0, 20)),
t = t.replace(/^\s*/, "").replace(/[ ]+/g, "_"),
ae.test(t) && (t = t.replace(oe, "a").replace(ie, "e")
.replace(le, "i").replace(se, "o")
.replace(ce, "u").replace(ue, "y")
.replace(de, "c").replace(me, "d")
.replace(pe, "n").replace(fe, "t").replace(be, "ss")),
t = t.replace(/[\W]/g, "")
return ""

(It took me some pains to extract it from god-awful obfuscated mess that IMDB returns to browsers.)

It's not only the Polish folks whose alphabet gets mangled. The Turks are out of luck too:

ve('Ruşen Eşref Ünaydın')     // => ruen_eref_unaydn

I say the function above sometimes does its job rather wrong:

ve('ąśćńżółıźćę')             // => o

deburr() from lodash is available publicly since February 5, 2015 &, unlike the forlorn IMDB attempt, works fine:

deburr('Boże Ciało')          // => Boze Cialo
deburr('Ruşen Eşref Ünaydın') // => Rusen Esref Unaydin
deburr('ąśćńżółıźćę') // => ascnzolizce

Why not use it?

Tuesday, May 25, 2021

Missing charsets in String to FontSet conversion

After upgrading to Fedora 34 I started to get a strange warning when running vintage X11 apps:

$ xclock
Warning: Missing charsets in String to FontSet conversion

With gv(1) it was much worse–multi-line errors, all related to misconfigured fonts. Some errors I was able to fix via

# dnf reinstall xorg-x11-fonts\*

Why exactly rpm post-install scripts have miscarried during the distro upgrade, remains unknown. Still, the main warning about charsets persisted.

Most classic x11 apps (gv included) are written in (now ancient) libXt library. By grepping through libXt code, I found a function that emits the warning in question. It calls XCreateFontSet(3) & dutifully reports the error, but fails to describe which of the charsets weren't found for a particular font.

A simple patch to libXt:

--- libXt-1.2.0/src/   2021-05-22 00:18:36.359273335 +0300
+++ libXt-1.2.0/src/Converters.c 2021-05-22 00:21:08.550340341 +0300
@@ -973,6 +973,10 @@
"Missing charsets in String to FontSet conversion",
+ fprintf(stderr, "XFontSet fonts: %s\n", fromVal->addr);
+ for (int i = 0; i < missing_charset_count; i++) {
+ fprintf(stderr, " missing charset: %s\n", missing_charset_list[i]);
+ }
if (f != NULL) {
@@ -1006,6 +1009,10 @@
"Missing charsets in String to FontSet conversion",
+ fprintf(stderr, "XFontSet fonts: %s\n", value.addr);
+ for (int i = 0; i < missing_charset_count; i++) {
+ fprintf(stderr, " missing charset: %s\n", missing_charset_list[i]);
+ }
if (f != NULL)
@@ -1030,6 +1036,10 @@
"Missing charsets in String to FontSet conversion",
+ fprintf(stderr, "XFontSet fonts: %s\n", "-*-*-*-R-*-*-*-120-*-*-*-*,*");
+ for (int i = 0; i < missing_charset_count; i++) {
+ fprintf(stderr, " missing charset: %s\n", missing_charset_list[i]);
+ }
if (f != NULL)

gave me some clue:

$ gv
Warning: Missing charsets in String to FontSet conversion
... missing charset: KSC5601.1987-0

What is KSC5601.1987-0? Looks Korean. Why can't XCreateFontSet(3) suddenly find it? I didn't uninstall any fonts during the distro upgrade.

Turns out, the only bitmap font that provided KSC5601.1987-0 charset, daewoo-misc, was removed from xorg-x11-fonts-misc package due to licensing concerns. This is very rude.

It forced me to make a custom rpm package for daewoo-misc fonts. The spec file is here. Notice that I didn't bother to provide a fontconfig configuration (hence the installed font is invisible to Xft), for all I cared was to silence the annoying gv warning.

Saturday, January 30, 2021

Fixing “30 seconds of code”

In the past, the JS portion of 30 seconds of code was a single, big README in a github repo. You can still browse an old revision, of course. It was near perfect for a cursory inspection or a quick search.

In full conformance with all that's bright must fade adage, the README was scraped away for an alternative version that looks like this:

Why, why did they do that?

Thankfully, they put each code "snippet" into a separate .md file (there are 511 of them), which means we can concatenate them in 1 gargantuan file & create a TOC. I thought about an absolute minimum amount of code one would need for that & came up with this:

$ cat Makefile
$(if $(i),,$(error i= param is missing))
out := _out

$(out)/%.html: $(i)/
@mkdir -p $(dir $@)
echo '<h2 id="$(title)">$(title)</h2>' > $@
pandoc $< -t html --no-highlight >> $@

title = $(notdir $(basename $@))

$(out)/30-seconds-of-code.html: template.html $(patsubst $(i)/, $(out)/%.html, $(sort $(wildcard $(i)/*.md)))
cat $^ > $@
echo '</main>' >> $@


(i should be a path to a repo directory with .md files, e.g. make -j4 i=~/Downloads/30-seconds-of-code/snippets)

This converts each .md file to its .html counterpart & prepends template.html to the result:

What's in the template file?

  1. a TOC generator that runs once after DOM is ready;
  2. a handler for the <input> element that filters the TOC according to user's input;
  3. CSS for a 2-column layout.

There is nothing interesting about #3, hence I'm skipping it.

Items 1-2 could be accomplished using 3 trivial functions (look Ma, no React!):

$ sed -n '/script/,$p' template.html
document.addEventListener('DOMContentLoaded', main)

function main() {
let list = mk_list()
document.querySelector('#toc input').oninput = evt => {

function render(list, filter) {
document.querySelector('#toc__list').innerHTML = list(filter).map( v => {
return `<li><a href="#${v}">${v}</a></li>`

function mk_list() {
let h2s = [...document.querySelectorAll('h2')].map( v => v.innerText)
return query => {
return query ? h2s.filter( v => v.toLowerCase().indexOf(query.toLowerCase()) !== -1) : h2s

<nav id="toc"><div><input type="search"><ul id="toc__list"></ul></div></nav>
<main id="doc">

This is all fine & dandy, but 30 seconds of code has many more interesting repos, like snippets of css or reactjs code. They share the same lamentable fate with the js one–once being in a single readme, they have converged lately on a single, badly-searchable website, that displays 1 recipe per user’s query.

The difference between the css/react snippets & the plain js ones is in a necessity of a preview: if you see a tasty recipe for a “Donut spinner”, you’d like to see how the donut spins, before copying the example into your editor.

In such cases, people oft resort to pasting code into one of “Online IDE”s & embedding the result into their tutorial. CodePen, for example, has even more convenient feature: you create a form (with a POST request) that holds a field with a json-formatted string which contains html/css/js assets. That way you can easily make a button “check this out on codepen”. The downside is that a user leaves your page to play with the code.

Another way to show previews alongside the docs is to create an iframe & inject all assets from a snipped into it–in this implementation you don’t rely on 3rd parties & the docs stay fully usable in off-line scenarios (nobody actually needs that, but it sounds useful to have as an option).

This requires greatly expanding the examples above: either we need 3 separate templates: one for js snippets, some other for css recipes & a disheartening one for reactjs chunks; or we force a single template act differently depending on a payload content.

For the latter approach, see this repo.

Wednesday, January 6, 2021

Twitter stats using gnuplot, json & make

Twitter allows to download a subset of user's activites as a zip archive. Unfortunately, there's no useful visualisations of the provided data, except for a simple list of tweets with a date filtering.

For example, what I expected to find but there were no signs of it:

  1. a graph of activities over time;
  2. a list of:
    1. the most popular tweets;
    2. users, to whow I reply the most.

Inside the archive there is data/tweet.js file that contains an array (assigned to a global variable) of "tweet" objects:

window.YTD.tweet.part0 = [ {
"tweet" : {
"retweeted" : false,
"source" : "<a href=\"\" rel=\"nofollow\">Twitter Web Client</a>",
"favorite_count" : "2",
"id" : "12345",
"created_at" : "Sat Jun 23 16:52:42 +0000 2012",
"full_text" : "hello",
"lang" : "en",
}, ...]

The array is already json-formatted, hence it's trivial to convert it to a proper json for filtering with json(1) tool.

Say we want a list of top 5 languages in thich tweets were written. A small makefile:

$ cat
lang: tweets.json
json -a tweet.lang < $< | $(aggregate) | $(sort)
tweets.json: $(i)
unzip -qc $< data/tweet.js | sed 1d | cat <(echo [{) - > $@

aggregate = awk '{r[$$0] += 1} END {for (k in r) print k, r[k]}'
sort = sort -k2 -n | column -t
SHELL := bash -o pipefail

yields to:

$ make -f | tail -5
cs 16
und 286
ru 333
en 460
uk 1075

( is the archive that Twitter permits us to download.)

To draw activity bars, the same technique is applied: we extract a date from each tweet object & aggregate results by a day:

2020-12-31 5
2021-01-03 10
2021-01-04 5

This can be fed to gnuplot:

$ make -f activity.svg

This makefile has an embedded gnuplot script:

$ cat

%.svg: dates.txt
cat <(echo "$$plotscript") $< | gnuplot - > $@

dates.txt: tweets.json
json -e 'd = new Date(this.tweet.created_at); p = s => ("0"+s).slice(-2); = [d.getFullYear(), p(d.getMonth()+1), p(d.getDate())].join`-`' -a < $< | $(aggregate) > $@

export define plotscript =
set term svg background "white"
set grid

set xdata time
set timefmt "%Y-%m-%d"
set format x "%Y-%m"

set xtics rotate by 60 right

set style fill solid
set boxwidth 1

plot "-" using 1:2 with boxes title ""

To list users, to whom one replies the most, is quite simple:

$ cat
users: tweets.json
json -e 'this.users = v => v.screen_name).join`\n`' -a users < $< | $(aggregate) | $(sort)


I'm not much of a tweeter:

$ make -f | tail -5
<redacted> 41
<redacted> 49
<redacted> 60
<redacted> 210
<redacted> 656

Printing the most popular tweets is more cumbersome. We need to:

  1. calculate the rating of each tweet (by a such a complex foumula as favorite_count + retweet_count);
  2. sort all the tweet objects;
  3. slice N tweet objects.

A Make recipe for it is a little too long to show here, but you can grab a makefile that contains the recipe + all the recipes shown above.

Friday, December 11, 2020

Making high-resolution screenshots of Emacs frames

Emacs 27.1 can utilise Cairo drawing backend to take screenshots of itself via x-export-frames function. Unfortunately, the bare bone function is all we have here–there's no UI to it. Moreover, it doesn't support bitmap fonts, which means if you still use, say, Terminus, you get garbage in the output.

I wanted to share a screenshot of a Emacs frame on twitter. Twitter doesn't accept SVGs, for net income of $1.47bn isn't enough to support such a complex thing. The best way to obtain an arbitrary high-resolution png is to get it from a vector image. I found that postscript->png gives the best results & requires only ghostscript installed.

(defun my--screenshot-png(out)
"Save a screenshot of the current frame as a png file. Requires ghostscript."
(let ((ps (concat out ".tmp")))
(my--screenshot ps 'postscript)
(call-process "gs" nil (get-buffer-create "*Shell Command Output*") nil
"-sDEVICE=png16m" "-dBATCH" "-dNOPAUSE"
"-r300" "-dTextAlphaBits=4" "-dGraphicsAlphaBits=4"
(concat "-sOutputFile=" out) ps)
(delete-file ps)

We use 300 dpi here to render a png. my--screenshot function below temporally changes a frame font to Inconsolata:

(defun my--screenshot(out format)
(let ((fontdef (face-attribute 'default :font)))
(set-frame-font "Inconsolata 10")
(with-temp-file out
(insert (x-export-frames nil format)))
(set-frame-font fontdef))

The last bit left is to provide a prompt for a user where to save the screenshot:

(defun my-ss()
"Save a screenshot of the current frame in a file"
(let* ((out (expand-file-name (read-file-name "Output file name: ")))
(ext (file-name-extension out)))
((equal "png" ext)
(my--screenshot-png out))
((equal "ps" ext)
(my--screenshot out 'postscript))
(my--screenshot out (intern ext)))


M-x my-ss<RET>
Output file name: ~/Downloads/1.png<RET>

The physical image size here is 3133x3642.

How I read newsgroups in mutt

It took me a long time, but I've finally removed inn+newsstar from my machine. I don't use patches that add NNTP support to mutt any more. Yet, I still cannot put gmane away, for it's much more convenient to read mailing lists as newsgroups.

What if I just fetch last N posts from newsgroup foo.a & save them in an mbox file for viewing it mutt later on? Then I can do the same for newsgroups foo.b, foo.c, & so on.

How do I fetch? Turns out, there is a nice CLI NNTP client already, called sinntp. The following command downloads fresh articles from comp.lang.c into a conspicuously named mbox file comp.lang.c:

$ sinntp pull --server comp.lang.c

If you run it again, it won't re-download the same articles again, for it saves reported high water mark in ~/.local/share/sinntp/ file.

This only solves a problem for 1 news server & 1 newsgroup. I read multitudes of them; should I write a simple shell script then? If you follow this blog, you may have noticed I try not to employ shell scripts but write makefiles instead.

$ cat ~/.config/nntp2mbox/

This states that I want to grab articles from comp.lang.c newsgroup from news server.

$ cat ~/.config/nntp2mbox/

In this example, the server name is & 1 of the newsgroups is commented out.

There's no more configuration, everything else is done by nntp2mbox makefile:

#!/usr/bin/make -f

# a number of article to pull
limit := 500
g :=

conf := $(or $(XDG_CONFIG_HOME),~/.config)/nntp2mbox
servers := $(wildcard $(conf)/*.conf)
self := $(lastword $(MAKEFILE_LIST))

all: $(servers:%.conf=%.server)

# read a list of newsgroups & run Make for each newsgroup
%.server: %.conf
awk '!/^#/ {print $$1 ".newsgroup"}' $< | grep $(call se,$(g)) | xargs -r $(make) -Bk server=$(notdir $(basename $<))

sinntp pull --server $(server) --limit $(limit) $*

make = $(MAKE) --no-print-directory -f $(self)
se = '$(subst ','\'',$1)'

The following command downloads fresh articles from all the newsgroups (from all the news servers above) to the current directory:

$ nntp2mbox
awk '!/^#/ {print $1 ".newsgroup"}' /home/alex/.config/nntp2mbox/localhost.conf | grep '' | xargs -r /usr/bin/make --no-print-directory -f /home/alex/bin/nntp2mbox -Bk server=localhost
awk '!/^#/ {print $1 ".newsgroup"}' /home/alex/.config/nntp2mbox/ | grep '' | xargs -r /usr/bin/make --no-print-directory -f /home/alex/bin/nntp2mbox -Bk
sinntp pull --server --limit 500 comp.lang.c
awk '!/^#/ {print $1 ".newsgroup"}' /home/alex/.config/nntp2mbox/ | grep '' | xargs -r /usr/bin/make --no-print-directory -f /home/alex/bin/nntp2mbox -Bk
sinntp pull --server --limit 500 gmane.comp.gnu.make.devel
sinntp pull --server --limit 500 gmane.comp.gnu.make.general
sinntp pull --server --limit 500 gmane.comp.window-managers.fvwm

(Yes, it invokes Make recursively, which is a big no-no in many Make circles.)

$ ls
comp.lang.c gmane.comp.gnu.make.general
gmane.comp.gnu.make.devel gmane.comp.window-managers.fvwm

It even supports filtering by a newsgroup name:

$ nntp2mbox g=fvwm

I don't actually read comp.lang.c. If there's anything sane left in comp.* hierarchy, please let me know.

Thursday, December 10, 2020

Reading the Emacs User Survey 2020 Results

More than a month ago some guy made a survey of emacs users. A couple of days ago, he released the results alongside with raw data.

After importing Emacs-User-Survey-2020-clean.csv into sqlite (7,344 rows), the first thing I checked was if someone had mentioned any of my emacs packages &, I kid you not, I got 9 hits for wordnut! Yipee!

Then I started filtering by "For how many years have you been using Emacs?" column. The amount of matched old-timers was staggering (I expected to find next to none):

  • >= 20 years: 1,497 rows
  • >= 15: 2,058
  • >= 10: 2,975

Here's a tiny portion of interesting/hilarious entries:

while learning
42 Finnish Ispell usenet No cursor keys on ADM video terminal in 1978

God knows what is the ADM terminal & where did he get in Finland.

while learning
41 When I got booted into TECO, I was like WTF is this?? Did my modem disconnect? brain transplant
Be written in Common Lisp
41 I often got stuck inside multiple ^R recursive edits but once I understood it was because of mini buffer exits I was ok. Nowadays it's less of an issue.
40 Get rid of "kill" from nomenclature, commands, etc. Why "kill-emacs" instead of "exit-emacs"? (I remapped that one decades ago; maybe it's changed in the district but my command still works). I dislike violence and wish kill buffers were no so named. This may seem minor, but if you don't think that language matters, perhaps you haven't heard the rants of the USA's current president.
40 Stop changing behaviors of next-line, search, etc.
39 Emacs' TECO macros were impenetrable. Nothing comes to mind. I drank the Kool-Aid a loong time ago.
35 stop styling my text with weird font lock crap. gets increasingly hard to turn off
like to see vector drawing and variable width fonts in core
35 not really… the only difficulty is it wasn't on all the machines I used. I first used it on a VAX 11/780 having access to a bootleg hacked version crafted to run on the VAX under VMS. The guys that created that version were cad guys in DEC (AI CAD group) and I was lucky to have it… rather than being stuck with EDT.
But then I started using an Apollo and lost it. So I tried to write my own Emacs. from scratch. I was sort of successful but who has that much extra time… then came the Sun and unix and the HP 9000 and then came Linux (finally) and I had Emacs most everywhere.
35 The learning curve is steep, but quick. I got nuthin.
34 my init file quickly became scrambled up because I pasted in code that I didn't understand nor know how to organize why did I have to figure out how to correctly compile emacs 27 on the latest ubuntu? Why wasn't a package immediately available on all OSs when 27.1 was released?

You'd think that after 34 years of using Emacs, one would be able to discern Emacs maintainers from maintainers of the emacs package in their Linux distro, but no.

while learning
30 I liked it in the old days when, when you ran a repeating macro you would see all the changes zipping through on the screen.
29 Coming from Glosling Emacs, I didn't understand why ^T was so broken in GNU Emacs. In all honesty, when there's a new version of emacs I mostly spend time figuring out how to turn new abominations off or put them back to how they should be.
It would be a big improvement for me if Emacs stuck to text and didn't try to do things with images, tables etc. When I paste anything into Emacs, it should either turn into text or fail.
And more speed is always welcome.
27 This is 27 years ago and at that time everybody considered it cool to know emacs. So, no. Not really… I am professor in a computer science department and teach students. I provide for colleagues and friends a heavily adapted version of emacs that (I believe) is more user-friendly. Nevertheless, it is sad to see that students are not even interested anymore to learn emacs. So, I think the most pressing need is to have a simplified user interface that adheres to the usual standards (similar to what ergoemacs is trying to achieve). All basic functionality should be on menus, keyboard shortcuts should only be an add-on for power users.
26 Everything was hard. Copy paste. Saving files. The UI sort of sucks
25 Terrible, useless documentation
25 I don't understand elisp Responsiveness of the interface: Emacs sometimes feel slow.
23 More than 23 years ago, it was mandatory at my university. Key bindings and the like felt very alien.

What some people think about poor Richard Stallman:

  • RMS should resign so that politics stops guiding Emacs development. It is a tragedy that a great editor continues to be crippled because technical decisions are made for outdated ideological reasons. I would love to contribute but Emacs development is extremely hostile to any non-purist views.
  • Stop letting RMS block good ideas.
  • There appears to be a split between the core developers and the "package" developers. I am confused by the role that RMS still plays in Emacs stewardship, and puzzled that he is not familiar with org-mode.
  • RMS's computing habits are so completely beyond what's normal that he has no idea what modern users want in an editor. If you want emacs to be popular you have to ACTUALLY LISTEN TO FEEDBACK FROM NEW USERS instead of a bunch of greybeards going "oh well emacs is fine for me".
  • I've considered dropping emacs altogether a few times because of RMS's behavior. The one thing I would like emacs to do is to stop having any affiliation with him.
  • Ignore RMS's opinions going forward.
  • … from reading the exchanges on the mailing list and especially RMS' opposition to anything "newfangled" has discouraged me from even trying to contribute to the core.

Comments about the survey itself (I feel sorry for the guy who organized it):

  • I've refused to answer surveys that require proprietary JavaScript before. It's unacceptable for a community survey to demand cooperation with a corporation. I wouldn't've answered this survey were mailing this not an option. Of course, I'd issues sending this response, as I learned what server lay underneath the EMACSSURVEY.ORG domain MX records. It would be better if the SMTP servers were run differently, even by another business than that.
  • The last question on this page [What is the default keybinding to find a file?] stinks to high heaven. I don't know the answer, because my fingers do. But the real reason the last question stinks is that some doofus decided that answers placed there can only be some short number of characters, so I have to put my "I donoknow but mynfingers do" answer here instead of in that questions answer field. So I put a "nonsensical yet accurate" answer there.
  • Death to vi!
  • You are not enough experienced and whole survey is a joke with already set purposes, which we will find later.
    Would you be experienced you would know HTML, no Javascript is required. Would you be experienced, you would know who to hire, and not just linking to third party servers, thus exposing free software users to proprietary Javascript.
    Finally you are exposing their information to third party server which cannot be trusted.
    It is easy to edit few HTML elements and it would be to accept it over CGI and store in the database. I have rewritten the basic Perl form.cgi so many times for myself before 15+ years, and later wrote it for myself in Common Lisp, and I just wait for few free time to rewrite it in Emacs Lisp. All what you need is emacs CGI package and Emacs to prepare HTML.
    But I guess you are not getting what I am speaking about.
  • RMS did nothing wrong.
  • Is JotForm Free Software?
  • I disagree in the way the survey has been released without the emacs mantainers.
  • I had to disable no-script, so I'm angry.

On a serious note, if you'd like to read what newbies really think of Emacs, filter "For how many years have you been using Emacs?" by 0, although it'll take a great deal of time (533 rows to examine).

Tuesday, October 27, 2020


What do you do when you need to add a formula to a epub? Most epub readers don't support MathML yet, hence you resort to making SVGs via mathjax-node-cli. Then you test the epub in several über-popular readers to discover that only Kindle & Google Play Books render such SVGs correctly, the rest either loses all the characters in equations (KOReader) or just draws sad little boxes in place of the images (Moon+ Reader).

How do you produce PNGs then? In the past, mathjax-node had an option of a png export, but it has been deprecated.

There's a way to do it w/ pdflatex: (1) generate a pdf w/ the help of texlive-standalone package, (2) convert the pdf to a png.

This doesn't sound complicated & it's not complicated, but there's no helpful wrappers available and if you want to integrate the tex→png process into your build pipeline, prepare to deal w/ the usual aux/log rubbish that any TeX program leaves around.

Here's a makefile that does the conversion:

#!/usr/bin/make -f

$(if $(f),,$(error "Usage: tex2png f='E=mc^2' output.png"))
dpi := 600
devnull := $(if $(findstring s, $(MAKEFLAGS)),> /dev/null)

pdflatex -jobname "$(basename $@)" -interaction=batchmode '\nofiles\documentclass[border=0.2pt]{standalone}\usepackage{amsmath}\usepackage{varwidth}\begin{document}\begin{varwidth}{\linewidth}\[ '$(call se,$(f))' \]\end{varwidth}\end{document}' $(devnull)
@rm -f "$(basename $@).log"

%.png: %.pdf
gs -sDEVICE=pngalpha -dQUIET -dBATCH -dNOPAUSE -r$(dpi) -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -sOutputFile="$@" "$<"

se = '$(subst ','\'',$1)'

It automatically removes all intermidiate files; iff you mistype a formula it saves a .log file to peruse.

For example, render Parkinson's coefficient of inefficiency (published in ~1957):

$ ./tex2png -s output.png f='x = \frac{m^{o}(a-d)}{y + p\sqrt{b}}'

(x = the number of members effectively present at the moment when the efficient working of a committee has become manifestly impossible; m = the average number of members actually present; o = the number of members influenced by outside pressure groups; a = the average age of the members; d = the distance in cm between the two members who are seated farthest from each other; y = the number of years since the cabinet or committee was first formed; p = the patience of the chairman, as measured on the Peabody scale; b = the average blood pressure of the three oldest members, taken shortly before the time of meeting.)

Thursday, August 27, 2020

Steganography with zip archives

The elegance of CVE-2020-1464 comes from the internal structure of Zip file format. While many other archive formats, like Microsoft Cab, put an index of the compressed files in the beginning of an archive, zip archivers place it in the end of a file.

The reason is historical: apparently, in 1989 disk drives were so slow, that adding a new blob to an existing file & appending a new index to it was cheaper then copying chunks of the original archive to a new file.

The CVE reminded me of an old joke of hiding a .zip in a .jpg. When you append a .zip to an image file, the recipient of the jpeg not necessarily notices junk in the image, but if you know about such a 'hidden' part, any ordinary unzip tool is able to extract it.

This got me thinking: can we hide a file inside of a .zip? BlackHat Europe 2010 had a talk about steganography in popular archives formats. In one of the described tricks, carefully inserting a blob before a zip index, makes it invisible to all common unpackers.

To verify this claim, I wrote a couple of small Ruby scripts, that inject & extract a 'hidden' blob. The approach works: Windows Explorer, 7-Zip, WinRAR, bsdtar(1), unzip(1) didn't see anything unusual. Even in the extreme cases like:

$ du -h

$ bsdtar ftv
-rw-r--r-- 0 1000 100 1 Aug 25 21:58 q

that certainly may look unusual to an innocent user–a 4 gigabyte archive that unpacks into an exactly 1 byte file! The opposite of a zip bomb.

A Zip index is formally termed central directory. It consists of 2 main parts: ① central directory headers (CHDs) & ② end of central directory (EOCD) record. A CHD contain metadata about a particular file, EOCD–metadata about the index itself 1:

class Eocd < BinData::Record
endian :little

uint32 :signature, asserted_value: 0x06054b50
uint16 :disk
uint16 :disk_cd_start
uint16 :cd_entries_disk
uint16 :cd_entries_total
uint32 :cd_size
uint32 :cd_offset_start
uint16 :comment_len
string :comment, :read_length => :comment_len,
onlyif: -> { comment_len.nonzero? }

The thing of interest here is cd_offset_start (officially called offset of start of central directory 2), a 4-byte value that indicates how many bytes to skip after the first file entry in an archive.

Therefore, after inserting a blob, we need to update cd_offset_start, otherwise the zip file becomes broken.

Just because a user has no clue about the hidden blob whatsoever, doesn't mean specialized tools won't notice it. Say, we have an archive w/ 2 text files:

$ bsdtar ft
The Celebrated Jumping Frog of Calaveras County.txt
What You Want.txt

We inject a .png image to it:

$ zipography-inject blob1.png >

Whilst bsdtar is still none the wiser:

$ bsdtar ft
The Celebrated Jumping Frog of Calaveras County.txt
What You Want.txt

Hachoir correctly recognises it as an unparsed block:

  1. This is a DSL from BinData package that provides a declarative way to read/write structured binary data in Ruby.↩︎

  2. Field names in PKWARE's spec are quite verbose.↩︎

Wednesday, July 22, 2020

How to build Ruby in Windows natively without WSL, MSYS2 or Cygwin

Every Ruby release tarball contains file win32/README.win32. If you decide to distribute Ruby alongside your Windows app, you can either struggle with the instructions from that file or use MSYS2 (== the modern RubyInstaller). In the past there was Ruby-mswin32 project with an uninspiring motto The forever war against Windows ;-( [sic], but it has died of neglect.

When you ask anyone knowledgeable about compiling Ruby under Windows, they oft (always?) say it's unbearably difficult to get right. On hearing that, I, of course, knew I was destined to repeat the endeavour.

If you're going to do that blindly by installing VS2019 (the 'Community' edition, which is supposedly free) & by following the steps in win32/README.win32, you most probably come through, but end up with a crippled Ruby variant that has no openssl support whatsoever & hence you cannot run the gem command. Smashing.

After wasting time on that I searched for a binary version of openssl suitable for the VS, recompiled Ruby to make sure rubygems was working & decided that the process was indeed getting mighty wearisome.

Turns out, it can be simplified.

At the time of writing, there's exactly 1 post on the interwebs about this topic by some Japanese guy on a Japanese knowledge community platform in Japanese. Instead of building/finding dependencies manually (we need at least 3 of them: openssl, readline & zlib) we can employ vcpkg for that job.


  1. Install VS2019.

  2. Clone the vcpkg repo (say, to D:\opt\s\vcpkg).

  3. Open x64 Native Tools Command Prompt for VS 2019.

  4. Run bootstrap-vcpkg.bat inside the cloned vcpkg repo directory.

  5. Download & compile the dependencies:

     > vcpkg --triplet x64-windows install libxml2 libxslt openssl readline zlib
  6. Set 3 env variables:

     > set PATH=%PATH%;D:\opt\s\vcpkg\installed\x64-windows\bin
    > set INCLUDE=%INCLUDE%;D:\opt\s\vcpkg\installed\x64-windows\include
    > set LIB=%LIB%;D:\opt\s\vcpkg\installed\x64-windows\lib
  7. cd to the unpacked Ruby src directory & type:

     > win32\configure.bat --prefix=d:\opt\s\ruby

    then nmake & nmake install.

If you did everything correctly, even irb should work:

> irb -rfiddle -rfiddle/import
irb(main):001:1* module User32
irb(main):002:1* extend Fiddle::Importer
irb(main):003:1* dlload 'user32'
irb(main):004:1* extern 'int MessageBoxA(int, char*, char*, int)'
irb(main):005:0> end
=> #<Fiddle::Function:0x000000000676fbc8 ...>
irb(main):006:0> User32::MessageBoxA 0, RUBY_DESCRIPTION, "", 0
=> 1

By today's standards, the resulting full Ruby installation is pleasantly small:

$ du -shc vcpkg/installed/x64-windows/bin ruby/{bin,lib} --exclude '*.pdb'
7.0M    vcpkg/installed/x64-windows/bin
2.5M    ruby/bin
35M     ruby/lib
45M     total