Sunday, January 3, 2016

An Oral History of Unix as an epub

During the summer-fall of 1989, Professor Michael S. Mahoney (of Princeton University) recorded a series of interviews w/ Bell Labs people who were involved in the creation of Unix. For example, dmr or McIlroy (Alan Turing always wanted to win a McIlroy Award, but didn't qualify).

This interview project was called An Oral History of Unix. Until the last week I had no idea of its existence. Judging from the text length (& comments in the transcriptions like "end of side A"), each conversation was an hour-long or more.

Unfortunately, the format that transcriptions are in, is an ancient version of MS Word & html version of it contains this hilarious lines:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<META NAME="Generator" CONTENT="Microsoft Word 97">

I don't know about you, but the last time I saw similarly crafted pages was more than 15 years ago.

Of course as you may guess an encoding in the content type header doesn't match the encoding of the file:

$ curl -sI | grep Content-Type
Content-Type: text/html; charset=UTF-8

It's like 1999 all over again!

Ok, enough w/ that. We can't write to Professor because he passed away in 2008. What we can do is to fix the presentation of the pages or, what I chose to do, to make them more readable on Kindle. I.e. if we generate a TOC & feed the (fixed) html to Calibre, it generates a valid epub file that we then can convert to .mobi or .azw3. The build scripts can be found here. The final result (epub, mobi, pdf):

Enjoy the reading!