Monday, November 5, 2018

Elisp history for trivia buffs

From the majestic Evolution of Emacs Lisp paper (if you haven't read it yet, you're missing out):

  • 'lambda was not technically part of the Elisp language until around 1991 when it was added as a macro, early during the development of Emacs-19. In Emacs-18, anonymous functions were written as quoted values of the form:

      '(lambda (..ARGS..) ..BODY..)

    While the lambda macro has made this quote unnecessary for almost 30 years now, many instances of this practice still occur in Elisp code, even though it prevents byte-compilation of the body.'

  • 'The old TECO version of Emacs also allowed attaching hooks to variable changes, but this feature was not provided in Elisp because Richard Stallman considered it a misfeature, which could make it difficult to debug the code. Yet this very feature was finally added to Elisp in 2018 in the form of variable watchers, though they are ironically mostly meant to be used as debugging aides.'

  • 'Elisp does not optimize away tail calls. With Scheme being familiar to many Elisp developers, this is a disappointment for many. In 1991, Jamie Zawinski added an unbind all instruction to the Lucid Emacs byte-code engine (which appears in both Emacs and XEmacs to this day) that was intended to support tail-call optimization, but never implemented the optimization itself.'

  • 'During the learly years of Emacs, the main complaints from users about the simple mark&sweep algorithm were the GC pauses. These were solved very simply in Emacs-19.31 by removing the messages that indicated when GC was in progress. Since then complaints about the performance of the GC have been rare.'

    'Richard Stallman refused to incorporate XEmacs's FFI into Emacs for fear that it would open up a backdoor with which developers would be able to legally circumvent the GNU General Public License (GPL) and thus link Emacs's own code with code that does not abide by these licensing terms.

    After many years of pressure on this issue (not just within the Emacs project, since this affected several other GNU projects, most notably GCC), a solution was agreed to, which was to implement an FFI that would only accept to load libraries that came with a special symbol attesting that this library is compatible with the GPL.

    As a result, after a very long wait, 2016 finally saw the release of Emacs-25.1 with an FFI comparable in functionality to that of XEmacs. So far, we do not know of any publicly available package which makes use of this new functionality, sadly.'

Tuesday, September 18, 2018

3 Ways to Use ES6 Modules with Mocha

The problem: Mocha was written long before es6 modules became common within browsers. It's a classic node program that internally uses require() fn to load each test file. Therefore, running mocha under --experimental-modules node flag does no good, for in the es6 modules mode there is no require() fn.

Say we have 2 modules:

$ cat foo.mjs
export default function() { return 'foo' }

$ cat bar.mjs
export default function() { return 'bar' }

&

$ cat package.json
{
"devDependencies": {
"mocha": "5.2.0"
}
}

How can we test them using the celebrated Mocha that doesn't know anything about es6 modules?

1. Babel 7

Cons
Slow
Depends on Babel with a 3rd party plugin
Cumbersome to setup

This is the most famous method & the most sluggish one. It works like this: Mocha instructs Babel to convert import statements into require() calls, bind Babel to those calls through a special hook, then compiles the rest of mandatory files on the fly. Though it caches the compilation results (see node_modules/.cache dir), it's nevertheless noticeably slower than running the equivalent tests written in commonjs style.

If you like your node_modules directory fat, this option is for you!

We write tests for each module in a separate file. For foo module we use the usual static imports:

$ cat foo_test.mjs
import assert from 'assert'
import foo from './foo.mjs'

suite('Foo', function() {
test('smoke', function() {
assert.equal(foo(), 'foo')
})
})

but for bar module we employ a dynamic import to make the test less ordinary:

$ cat bar_test.mjs
import assert from 'assert'

suite('Bar', function() {
test('smoke', function() {
return import('./bar.mjs').then( module => {
let bar = module.default
assert.equal(bar(), 'bar')
})
})
})

Now we need to specify a proper list of dependencies, then write a configuration for Babel. For simplicity's sake, we confine everything to package.json:

{
  "devDependencies": {
    "@babel/core": "7.1.0",
    "@babel/preset-env": "7.1.0",
    "@babel/register": "7.0.0",
    "babel-plugin-dynamic-import-node": "2.1.0",
    "mocha": "5.2.0"
  },
  "babel": {
    "plugins": [
      [
        "dynamic-import-node"
      ]
    ],
    "presets": [
      [
        "@babel/preset-env",
        {
          "targets": {
            "node": "current"
          }
        }
      ]
    ]
  }
}

This is an absolute minimum (which is a sad state of affairs). You may omit babel-plugin-dynamic-import-node if you don't use dynamic imports (but we do here). Btw, don't be tempted to use the official @babel/plugin-syntax-dynamic-import plugin–it doesn't do any transformations at all, for it's written in the main for tools like Webpack that can handle the conversions in place of Babel.

Run the tests:

$ npm i
...
$ node_modules/.bin/mocha -R list --require @babel/register -u tdd *test.mjs

✓ Bar smoke: 4ms
✓ Foo smoke: 0ms

2 passing (21ms)

The size of the dependency tree is of course depressing:

$ rm -rf node_modules/.cache
$ du -h --max-depth=0 node_modules
16M node_modules
$ find node_modules -type f | wc -l
4870
$ find node_modules -name package.json | wc -l
161

2. In the browser

Pros Cons
No compilation step Chai dependency
Every test file must be listed explicitly

The version of Mocha for the browser obviously doesn't contain any require() fn calls & doesn't know about any module systems.

Our package.json looks much simpler:

{
"devDependencies": {
"mocha": "5.2.0",
"chai": "4.1.2"
}
}

Unfortunately we need to edit *_test.mjs files, for neither the browser doesn't have a built-in assert module, nor there is a way to inline the wrapper around Chai for mocking a global es6 module. Thus, foo_test.js becomes:

import foo from './foo.mjs'

suite('Foo', function() {
test('smoke', function() {
// assert is global & comes from Chai
assert.equal(foo(), 'foo')
})
})

(The same goes for bar_test.mjs.)

The test runner is an html page:

<!doctype html>
<link rel="stylesheet" href="node_modules/mocha/mocha.css">

<script src="node_modules/chai/chai.js"></script>
<script src="node_modules/mocha/mocha.js"></script>

<div id="mocha"></div>

<script type="module">
window.assert = chai.assert
mocha.setup('tdd')
</script>

<script type="module" src="foo_test.mjs"></script>
<script type="module" src="bar_test.mjs"></script>

<script type="module">
mocha.run()
</script>

Nothing should be surprising here, except that the "setup" parts must be marked as "modules" too.

3. Monkey Patching

Pros Cons
No compilation step A custom wrapper instead of the mocha executable
0 dependencies

Mocha has an API. It expects a commonjs usage, but we can always overwrite 2 methods in Mocha class: loadFiles() & run():

$ cat mocha.mjs
import path from 'path'
import fs from 'fs'
import Mocha from './node_modules/mocha/index.js'

Mocha.prototype.loadFiles = async function(fn) {
var self = this;
var suite = this.suite;
for await (let file of this.files) {
file = path.resolve(file);
suite.emit('pre-require', global, file, self);
suite.emit('require', await import(file), file, self);
suite.emit('post-require', global, file, self);
}
fn && fn();
}

Mocha.prototype.run = async function(fn) {
if (this.files.length) await this.loadFiles();

var suite = this.suite;
var options = this.options;
options.files = this.files;
var runner = new Mocha.Runner(suite, options.delay);
var reporter = new this._reporter(runner, options);

function done(failures) {
if (reporter.done) {
reporter.done(failures, fn);
} else {
fn && fn(failures);
}
}

runner.run(done);
};

let mocha = new Mocha({ui: 'tdd', reporter: 'list'})
process.argv.slice(2).forEach(mocha.addFile.bind(mocha))
mocha.run( failures => { process.exitCode = failures ? -1 : 0 })

Using node v10.10.0:

$ node --experimental-modules mocha.mjs *test.mjs
(node:100618) ExperimentalWarning: The ESM module loader is experimental.

✓ Bar smoke: 2ms
✓ Foo smoke: 0ms

2 passing (16ms)

The wrapper is intentionally bare bone. loadFiles() fn here is very similar to its original version, only it became async, for the es6 dynamic import statement returns a promise. run() fn, on the other hand, is significantly shorter then the original, for we don't support any CLOs & assume any CL argument to be a file name.

You can add at least -g option to the wrapper as a homework.

Saturday, July 14, 2018

A peek at the old win32 APIs

I wanted to write a simple GUI for HKCU\Control Panel\Desktop\WindowMetrics key. Until now Windows had a dialog that allowed fonts/colours/etc tweaking for the current 'theme', but as soon as the name of w10 became synonymous with all that is terrible, the dialog was removed in r1703.

What's the easiest way to write a similar dialog? I don't mind the default colours but I don't like the default font sizes. What if we could draw fake controls using html & provide a simple form to modify the appearance of the elements in the RT? It feels like a trivial task.

Obviously we would need a way to do get/set operations on the registry. What options do we have?

I didn't want to employ Electron. That one time I've tried to write a commercial peace of software with it, it brought me nothing but irritation. It's also a great way to become a mockery on HN/reddit: every time some pour soul posts about their little app written in JS+Electron, they receive so much shit from the 'real' programmers for not writing it in QT or something native to the platform, that most readers immediately start feeling sorry for the OP.

If not Electron then what? The 'new' Microsoft way is to use UWP API, that strongly reminds me of the Electron concept, only done in a vapid style. I didn't want to touch it with a 3.048 m pole.

What about plain Node + a browser? We can write a server in node that communicates with the registry & binds to a localhost address, to which a user connects using their favourite browser. Except that perhaps it could be too tedious for the user to run 2 programs in a consecutive manner + Node itself isn't installed on Windows by default.

32bit node.exe binary is < 10MB zipped, so shipping it with the app isn't a big deal. To simplify the program startup we could write a C wrapper that starts the node server & opens the default browser, except that there's no need to, for w10 still comes with Windows Scripting Host! God almighty, WSH is awful: its JScript is a subset of ES3 with interesting constructs like env("foo") = "bar" (this is not a joke) & virtually every WSH resource describes it using VBScript for reasons unbeknown to the mankind. Nevertheless, for a tiny wrapper it's an acceptable choice, especially if you manage not to puke while reading the antediluvian VBScript examples.

Fonts

The values inside HKCU\Control Panel\Desktop\WindowMetrics key that correspond to font options, like MenuFont, are binary blobs:

$ reg query "HKCU\Control Panel\Desktop\WindowMetrics" /v MenuFont

HKEY_CURRENT_USER\Control Panel\Desktop\WindowMetrics
MenuFont REG_BINARY F1FFFFFF0000000000000000000000009001000000000001000000005300650067006F006500200055004900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

People say the blobs are serialized Logfont structures. Apparently, MS guys have decided it's a safe bet, for the structure doesn't contain pointers.

Can we decode it properly in a browser using JS? I leave such a strive to another enterprising soul. But we can write a tiny C program that displays some 'standard' Windows font dialog. The idea is: a user clicks on a button in the browser, the browser sends a request to the node server.js that runs my-font-dialog.exe that shows the font dialog & prints to the stdout a deciphered Logfont structure + the hex encoded Logfont. Then the server sends the obtained data to the browser, that, in turn, updates the fake html controls.

It's not actually a logfont

If you don't know any winapi, how hard is to write my-font-dialog.exe? Can a non-console Windows program write to the stdout? Or should I use some kind of IPC if it cannot? The more I thought about downloading the Windows SDK the more I regarded the whole endeavour with aversion.

Then it dawned on me that I can take a shortcut & utilise Cygwin! It has w32api-runtime & w32api-headers packages, its C library has all the friendly helpers like readline(). Its 64bit build even has 32bit version of gcc (cygwin32-gcc-core, cygwin32-binutils, etc), thus we can create on a 64bit machine a 32bit executable. E.g., having a simple Makefile:

target := i686-pc-cygwin
CC := $(target)-gcc
LD := $(target)-ld
LDFLAGS := -mwindows -mconsole

& foo.c file, typing

$ make foo

produces such a foo, that can invoke Windows gui routines & simultaneously can access the stdin/stdout. The resulting exe depends on a 32bit cygwin1.dll but while we ship it with the app, nobody cares.

To display an ancient Windows font dialog box, we ought to call ChooseFont() fn that fills a special CHOOSEFONT structure. One field of that structure is the desired Logfont.

$ echo F4FFFFFF000000000000000000000000900100000000000100000500530065006700 6F006500200055004900000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000 | _out.i686-pc-cygwin/app/cgi-bin/choosefont.exe

If a user click OK, the choosefont.exe util prints:

F1FFFFFF00000000000000000000000090010000000000010302014249006E006B0020004600720065006500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000,Ink Free,400,0,11.250000,15

When I converted the Logfont value into a hex string I noticed that although it looked decidedly similar to the values in the registry, it wasn't the same: the blob was shorter. It was either an obscure Unicode issue with the toolchain or I didn't know how to computer or both. Have I mondegreened about the Logfont? Turns out that not everything you read on the web is accurate: it's not Logfont, it's Logfontw! I don't see a point of describing the reason why or what macro you should define to get Anythingw automatically, for Windows programmers are already laughing at me. Evidently, this is what you get for avoiding Visual Studio.

System logger

Being naïve, I thought of sending errors from server.js to the Windows event log. Ain't logs often useful? What is their syslog(3) here? ReportEvent()? Alrighty, we'll write a small C util to aid server.js!

facepalm.jpg

I was going to painstakingly describe the steps one must take to obtain the permissions for the proper writing to the event log, but ultimately deleted all the corresponding code & decided to stay with console.error. To get a sense for a gallant defence Windows makes before permitting anyone to write to its hallowed log storage, read these 2 SO answers:

Your pixels are not my pixels

At some point I wanted to show the current DPI to the user. There are ways to do it in a browser without external help, but they are unreliable in edge cases. Winapi has GetDpiForMonitor().

You think it's a piece of cake, until you play with w10 'custom scaling' feature. For some reasons it allows only to specify a relative value.

In

p = (d/c) * 100

p is the %-value in the Windows custom scaling input field, c is the current dpi, d–the desired one.

Say your initial dpi is 96 & you want to upgrade to 120, then p = 125 = (120/96) * 100. For the desired 144 dpi, p = 150.

This all breaks if d fails to hit one of the 'standard' Windows DPIs (120, 144). In such cases GetDpiForMonitor() fn yields 96.

The result

Download the full, pre-compiled app from the github page.

Friday, June 1, 2018

Emacs & gpg files: use the minibuffer for password prompts

In the past Emacs was communicating w/ gnupg directly & hence was responsible for reading/sending/catching passwords. In contrast, Emacs 26.1, by default, fully delegates the password handling to gpg2.

The code for the interoperation w/ gpg1 is still present in the Emacs core, but it's no longer advertised in favour of gpg2 + elpa pinentry.

If you don't want an additional overhead or a special gpg-agent setup, it's still possible to use gpg1 for (en|de)crypting ops.

Say we have a text file we want to encrypt & then transparently edit in Emacs afterwards. The editor should remember the correct pw for the file & not bother us w/ the pw during the file saving op.

$ rpm -qf `which gpg gpg2`
gnupg-1.4.22-6.fc28.x86_64
gnupg2-2.2.6-1.fc28.x86_64

$ echo rain raine goe away, little Johnny wants to play | gpg -c > nr.gpg
$ file nr.gpg
nr.gpg: GPG symmetrically encrypted data (AES cipher)

If you have both gpg1 & gpg2 installed, Emacs ignores gpg1 completely. E.g., run 'emacs -Q' & open nr.gpg file–gpg2 promptly contacts gpg-agent, which, in turn, runs the pinentry app:

Although, it may look as if everything is alright, try to edit the decrypted file & then save it. The pinentry window will reappear & you'll be forced to enter the pw twice.

The Emacs mode that handles the gnupg dispatch is called EasyPG Assistant. To check its current state, use epg-find-configuration fn:

ELISP> (car (epg-find-configuration 'OpenPGP))
(program . "/usr/bin/gpg2")

We can force EasyPG to use gpg1, despite that it's not documented anywhere.

The actual config data is located in epg-config--program-alist var:

ELISP> epg-config--program-alist
((OpenPGP epg-gpg-program
("gpg2" . "2.1.6")
("gpg" . "1.4.3"))
(CMS epg-gpgsm-program
("gpgsm" . "2.0.4")))

Here, if we shadow the gpg2 entry in the alist, EasyPG would regenerate a new config for all the (en|de)crypting ops on the fly:

(require 'epg-config)
(add-to-list 'epg-config--program-alist `(OpenPGP epg-gpg-program ("gpg" . ,epg-gpg-minimum-version)))
(setq epa-file-cache-passphrase-for-symmetric-encryption t)
(setq epg--configurations nil)

Now, if you open nr.gpg afresh, Emacs neither should use the gpg-agent any more:

Nor should it ask for the pw when you'll do edit+save later on.

To clear the internal pw cache, type

ELISP> (setq epa-file-passphrase-alist nil)

Friday, May 11, 2018

Writing a podcast client in GNU Make

Why? First, because I wanted parallel downloads & my old podcacher didn't support that. Second, because it sounded like a joke.

The result is gmakepod.

Evidently, some ingredients for such a client are practically impossible to write using plain Make (like xml parser). Would the client then be considered a truly Make program? E.g, there is a clever bash json parser, but when you look in its src you see that it shies now away from awk & grep.

At least we can try to write in Make as many components as possible, even if (when) they become a bottleneck. Again, why? 1

Overview

Using Make means constructing a proper DAG. gmakepod uses 6 vertices: run.download.mk.files.new.files.enclosures.feeds. All except the 1st one are file targets.

target desc
.feeds (phony) parse a config file to extract feeds names & urls
.enclosures fetch & parse each feed to extract enclosures urls
.files generate a proper output file name for each url
.files.new check if we've already downloaded a url in the past, filter out
.download.mk generate a makefile, where we list all the rules for all the enclosures
run (default) run the makefile

Every time a user runs gmakepod, it remakes all those files anew.

Config file

A user needs to keep a list of feed subscriptions somewhere. The 1st thing that comes to mind is to use a list of newline-separated urls, but what if we want to have diff options for each feed? E.g., a enclosures filter of some sort? We can just add 'options' to the end of line (Idk, like url=http://example.com!filer.type=audio) but then we need to choose a record sep that isn't a space, which means we the escaping of the record sep in urls or living w/ the notion 'no ! char is allowed' or similar nonsense.

The next question is: how does a makefile process a single record? It turns out, we can eval foo=bar inside of a recipe, so if we pass

make -f feed_parse.mk 'url=http://example.com!filer.type=audio'

where feed_parse.mk looks like

parse-record = ... # replace ! w/ a newline
%:
$(eval $(call parse-record,$*))
@echo $(url)
@echo $(filter.type)

then every 'option' becomes a variable! This sounds good but is actually a gimcrack.

Make will think that url=http://example.com!filer.type=audio is a variable override & complain about missing targets. To ameliorate that we can prefix the line w/ # or :. Sounds easy, but than we need to slice the line in parse-record macro. This is the easiest job in any lang except Make--you won't do it correctly w/o invoking awk or any other external tool.

If we use an external tool for parsing a mere config line, why use a self-inflicted parody of the config file instead of a human-readable one?

Ini format would perfectly fit. E.g.,

[JS Party]
url = https://changelog.com/jsparty/feed

lines are self-explanatory to anyone who has seen a computer once. We can use Ruby, for example, to convert the lines to :name=JS_Party!url=http://changelog.com... or even better to

:\{\".name\":\"JS_Party\",\".url\":\"https://changelog.com/jsparty/feed\"\}

(notice the amount of shell escaping) & use Ruby again in makefile to transform that json to name=val pairs, that we eval in the recipe later on.

How do we pass them to the makefile? If we escape each line correctly, xargs will suffice:

ruby ini-parse.rb subs.ini | xargs make -f feed-parse.mk

Parsing xml

Ruby, of course, has a full fledged rss parser in its stdlib, but do we need it? A fancy podcast client (that tracks your every inhalation & exhalation) would display all metadata from an rss it can obtain, but I don't want the fancy podcast client, what I want what's most important to me, is that I have a guarantee is a program that reliably downloads the last N enclosures from a list of feeds.

Thus the minimal parser looks like

$ curl -s https://emacsel.com/mp3.xml | \
nokogiri -e 'puts $_.css("enclosure,link[rel=\"enclosure\"]").\
map{|e| e["url"] || e["href"]}' \
| head -2
https://cdn.emacsel.com/episodes/emacsel-ep7.mp3
https://cdn.emacsel.com/episodes/emacsel-ep6.mp3

Options

One of the obviously helpful user options is the number of enclosures he wants to download. E.g, when the user types

$ gmakepod g=emacs e=5

the client produces .files file that has a list of 5 shell-escaped json 'records'. e=5 option could also appear in an .ini file. To distinguish options passed from the CL from options read from the .ini, we prefix options from the .ini w/ a dot. The opt macro is used to get the final value:

opt = $(or $($1),$(.$1),$2)

E.g.: $(call opt,e,2) checks the CL opt first, then the .ini opt, &, as a last resort, returns the def value 2.

Output file names

Not every enclosure url has a nice path name. What file name should we assign to an .mp3 from the url below?

https://play.podtrac.com/npr-510289/npr.mc.tritondigital.com/NPR_510289/media/anon.npr-mp3/npr/pmoney/2018/05/20180504_pmoney_pmpod839v2.mp3?orgId=1&d=1606&p=510289&story=608577210&t=podcast&e=608577210&ft=pod&f=510289

Maybe we can use the URI path, 20180504_pmoney_pmpod839v2.mp3 in this case. Is it possible to extract it in pure Make?

In the most extreme case, the uri path may not be even unique. Say a feed has 2 entries & each article has 1 enclosure, than they both may have the same path name:

<entry>
<title>Foo</title>
<link rel="enclosure" type="audio/mpeg" length="1234"
href="http://example.com/podcast?episode=2"/>
<id>2ba2a6ee-52fb-11e8-9176-000c2945132f</id>
</entry>

<entry>
<title>Bar</title>
<link rel="enclosure" type="audio/mpeg" length="5678"
href="http://example.com/podcast?episode=1"/>
<id>3f66b198-52fb-11e8-a2a8-000c2945132f</id>
</entry>

In addition, the output name must be 'safe' in terms of Make. This means no spaces or $, :, %, ?, *, [, ~, \, # chars.

All of this leads us to another use of Ruby in Make stead. We extract the uri path from the url, strip out the extension, prefix the path w/ a name of a feed (listed in the .ini), append a random string + an extension name, so the output file from the above url looks similar to:

media/NPR_Planet_Money/20180504_pmoney_pmpod839v2.84a8b172.mp3

A homework question: what should we do if a uri path lacks a file extension?

History

If we successfully downloaded an enclosure, there is rarely a need to download it again. A 'real' podcast client would look at id/guid (the date is usually useless) to determine if the entry has any updated enclosures; our Mickey Mouse parser relies on urls only.

Make certainly doesn't have any key/value store. We could try employing the sqlite CL interface or dig out gdbm or just append a url+'\n' to some history.txt file.

The last one is a tempting one, for grep is uber-fast. As the history file becomes a shared resource, we might get ourselves in trouble during parallel downloads, though. lockfile rubygem provides a CL wrapper around a user specified command, hence can protect our 'db':

rlock history.lock -- ruby -e 'IO.write "history.txt", ARGV[0]+"\n", mode: "a"' 'http://example.com/file.mp3'

It works similarly to flock(1), but supposedly is more portable.

Makefile generation

The last but one step is to generate a makefile named .download.mk. After we collected all enclosure urls, we write to the .mk file a set of rules like

media/Foobar_Podcast/file.84a8b172.mp3
@mkdir -p $(dir $@)
curl 'http://example.com/file.mp3' > $@
@rlock history.lock -- ruby -e 'IO.write "history.txt", ARGV[0]+"\n", mode: "a"' 'http://example.com/file.mp3'

Our last step is to run

make -f .download.mk -k -j1 -Oline

The number of jobs is 1 by default, but is controllable via the CL param (gmakepod j=4).


  1. Image src: Fried-Tomato 

Monday, April 2, 2018

Node FeedParser & Transform streams

FeedParser is itself a Transform stream that operates in object mode. Nevertheless, in the majority of examples it appears at the end of a pipeline, e.g.:

let fp = new FeedParser()
fp.on('readable', () => {
    // get the data
})
my_readable_stream.pipe(fp)

Say we want to get first 2 headlines from an rss:

$ curl -s https://emacsel.com/mp3.xml | node headlines1.js | head -2
Episode 7 - Jorgen Schäfer
Episode 6 - Charles Lowell

Let's use FeedParser as god hath intended it as a transform stream that reads the rss from the stdin & writes the articles to another transform stream that grabs only the headlines &, in turn, pipes them to the stdout:

$ cat headlines1.js
let Transform = require('stream').Transform
let FeedParser = require('feedparser')

class Filter extends Transform {
    constructor() {
      super()
      this._writableState.objectMode = true // we can eat objects
    }
    _transform(input, encoding, done) {
      this.push(input.title + '\n')
      done()
    }
}

process.stdin.pipe(new FeedParser()).pipe(new Filter()).pipe(process.stdout)

(Type npm i feedparser before running the example.)

This works, although it throws an EPIPE error, because head command abruptly closes the stdout descriptor, while the script tries to write into it.

You may add smthg like

process.stdout.on('error', (e) => e.code !== 'EPIPE' && console.error(e))

to silence it, but it's better to use pump module (npm i pump) & catch all errors from all the streams. There's a chance that the module will be added to the node core, so get used to it already.

$ diff -u1 headlines1.js headlines3.js | tail -n+4
 let FeedParser = require('feedparser')
+let pump = require('pump')

@@ -14,2 +15,3 @@

-process.stdin.pipe(new FeedParser()).pipe(new Filter()).pipe(process.stdout)
+pump(process.stdin, new FeedParser(), new Filter(), process.stdout,
+     err => err && err.code !== 'EPIPE' && console.error(err))

Now, what if we want to control the exact number of articles our Filter stream receives? I.e., if an rss is many MBs long & we want only n articles from it? First, we add a CL parameter to our script:

$ cat headlines4.js
let Transform = require('stream').Transform
let FeedParser = require('feedparser')
let pump = require('pump')

class Filter extends Transform {
    constructor(articles_max) {
      super()
      this._writableState.objectMode = true // we can eat objects
      this.articles_max = articles_max
      this.articles_count = 0
    }
    _transform(input, encoding, done) {
      if (this.articles_count++ < this.articles_max) {
          this.push(input.title + '\n')
      } else {
          console.error('ignore', this.articles_count)
      }
      done()
    }
}

let articles_max = Number(process.argv.slice(2)) || 1
pump(process.stdin, new FeedParser(), new Filter(articles_max), process.stdout,
     err => err && err.code !== 'EPIPE' && console.error(err))

Although this works too, it still downloads & parses the articles we don't want:

$ curl -s https://emacsel.com/mp3.xml | node headlines4.js 2
Episode 7 - Jorgen Schäfer
Episode 6 - Charles Lowell
ignore 3
ignore 4
ignore 5
ignore 6
ignore 7

Unfortunately to be able to 'unpipe' a readable stream (from the Filter standpoint it's the FeedParser instance) we have to have a ref to it, & I don't know a way to get such a ref from a Transform stream within, except via explicitly passing a pointer:

$ diff -u headlines4.js headlines5.js | tail -n+4
 let pump = require('pump')

 class Filter extends Transform {
-    constructor(articles_max) {
+    constructor(articles_max, feedparser) {
      super()
      this._writableState.objectMode = true // we can eat objects
      this.articles_max = articles_max
      this.articles_count = 0
+
+     if (feedparser) {
+         this.once('unpipe', () => {
+             this.end()      // ensure 'finish' event gets emited
+         })
+     }
+     this.feedparser = feedparser
     }
     _transform(input, encoding, done) {
      if (this.articles_count++ < this.articles_max) {
          this.push(input.title + '\n')
      } else {
-         console.error('ignore', this.articles_count)
+         console.error('stop on', this.articles_count)
+         if (this.feedparser) this.feedparser.unpipe(this)
      }
      done()
     }
 }

 let articles_max = Number(process.argv.slice(2)) || 1
-pump(process.stdin, new FeedParser(), new Filter(articles_max), process.stdout,
+let fp = new FeedParser()
+pump(process.stdin, fp, new Filter(articles_max, fp), process.stdout,
      err => err && err.code !== 'EPIPE' && console.error(err))

Test:

$ curl -s https://emacsel.com/mp3.xml | node headlines5.js 2
Episode 7 - Jorgen Schäfer
Episode 6 - Charles Lowell
stop on 3

Grab the gist w/ a final version here.

Friday, March 9, 2018

YA introduction to GNU Make

I don't want to convert this blog to a Make-propaganda outlet, but here's my another take on that versatile tool + a bunch of HN comments. Enjoy.

Saturday, February 24, 2018

A shopping hours calculator

Say you have a small mom&pop online shop that sell widgets.

If you have (a) dedicated personnel that calls customers on the phone to confirm an order w/ its shipping details, & (b) such a 'department' usually work regular hours & isn't available 24/7.

(This is the exact scheme to which the vast majority of Ukrainian online shops still adhere to.)

This is how a shop gets its 1st bad review: it's Friday evening 7 o'clock, a client places an order for a widget & waits for a call that doesn't come until the Monday morning. The enraged client then may even try to call the shop during the weekend not realising it's closed.

One of the possible solutions to this is to have a note (on a page where customers review their orders) that at this very moment the person that can confirm/discuss their order is offline.

How do you calculate that? It's an easy task if you work the same hours every day of the year but for a small online shops it's often not the case. Not only there are a handful of the official gov holidays, some holidays have moveable dates (Easter), sometimes a holiday falls on a weekend & must be transferred to the next working day. What if your customer department has a lunch break like everyone else? A client should be able to see that the call they're so eagerly waiting for is going to come in an hour, not in 2 minutes.

So I wrote a small JS library to help w/ that: https://github.com/gromnitsky/shopping-hours. You can use it on either server or client side. The basic idea is: we fetch a .txt file (a calendar) & check for a status "is the shop open" that simultaneously tells us when the shop will be open/closed. We fetch the cal only once & do the checking however often we care.

The calendar DSL looks like this:

-/-                 9:00-13:00,14:00-18:00
1/1                 0:0-0:0                   o   new year
easter_orthodox     0:0-0:0                   o
fri.4/11            6:30-23:00                -   black friday
sat/-               10:30-17:00
sun/-               0:0-0:0

On a client side, you can test it w/ adding to .html: <script src="shopping_hours.min.js"></script> & to .js:

async function getcal(url) {
    let cal = await fetch(url).then( r => r.text())
    return shopping_hours(cal)  // parse the calendar
}

getcal('calendar1.txt').then(sh => {
    console.log(sh.business())
})

which outputs smthg like {status: "open", next: Sat Feb 24 2018 17:00:00 GMT+0200 (EET)}. See the github page for the details.

Thursday, January 18, 2018

There is no price for good advice

From The Design & Evolution of C++ by Bjarne Stroustrup:

'In 1982 when I first planned Cfront, I wanted to use a recursive descent parser because I had experience writing and maintaining such a beast, because I liked such parsers' ability to produce good error messages, and because I liked the idea of having the full power of a general-purpose programming language available when decisions had to be made in the parser.

However, being a conscientious young computer scientist I asked the experts. Al Aho and Steve Johnson were in the Computer Science Research Center and they, primarily Steve, convinced me that writing a parser by hand was most old-fashioned, would be an inefficient use of my time, would almost certainly result in a hard-to-understand and hard-to-maintain parser, and would be prone to unsystematic and therefore unreliable error recovery. The right way was to use an LALR(1) parser generator, so I used Al and Steve's YACC.

For most projects, it would have been the right choice. For almost every project writing an experimental language from scratch, it would have been the right choice. For most people, it would have been the right choice. In retrospect, for me and C++ it was a bad mistake.

C++ was not a new experimental language, it was an almost compatible superset of C - and at the time nobody had been able to write an LALR(1) grammar for C. The LALR(1) grammar used by ANSI C was constructed by Tom Pennello about a year and a half later - far too late to benefit me and C++. Even Steve Johnson's PCC, which was the preeminent C compiler at the time, cheated at details that were to prove troublesome to C++ parser writers. For example, PCC didn't handle redundant parentheses correctly so that int(x); wasn't accepted as a declaration of x.

Worse, it seems that some people have a natural affinity to some parser strategies and others work much better with other strategies. My bias towards topdown parsing has shown itself many times over the years in the form of constructs that are hard to fit into a YACC grammar. To this day [1993], Cfront has a YACC parser supplemented by much lexical trickery relying on recursive descent techniques. On the other hand, it is possible to write an efficient and reasonably nice recursive descent parser for C++. Several modern C++ compilers use recursive descent.'