Friday, May 11, 2018

Writing a podcast client in GNU Make

Why? First, because I wanted parallel downloads & my old podcacher didn't support that. Second, because it sounded like a joke.

The result is gmakepod.

Evidently, some ingredients for such a client are practically impossible to write using plain Make (like xml parser). Would the client then be considered a truly Make program? E.g, there is a clever bash json parser, but when you look in its src you see that it uses awk & grep.

At least we can try to write in Make as many components as possible, even if (when) they become a bottleneck. Again, why? 1


Using Make means constructing a proper DAG. gmakepod uses 6 vertices: All except the 1st one are file targets.

target desc
.feeds (phony) parse a config file to extract feeds names & urls
.enclosures fetch & parse each feed to extract enclosures urls
.files generate a proper output file name for each url check if we've already downloaded a url in the past, filter out generate a makefile, where we list all the rules for all the enclosures
run (default) run the makefile

Every time a user runs gmakepod, it remakes all those files anew.

Config file

A user needs to keep a list of feed subscriptions somewhere. The 1st thing that comes to mind is to use a list of newline-separated urls, but what if we want to have diff options for each feed? E.g., a enclosures filter of some sort? We can just add 'options' to the end of line (Idk, like url=!filer.type=audio) but then we need to choose a record sep that isn't a space, which means the escaping of the record sep in urls or living w/ the notion 'no ! char is allowed' or similar nonsense.

The next question is: how would a makefile process a single record? It turns out, we can eval foo=bar inside of a recipe, so if we pass

make -f 'url=!filer.type=audio'

where looks like

parse-record = ... # replace ! w/ a newline
$(eval $(call parse-record,$*))
@echo $(url)
@echo $(filter.type)

then every 'option' becomes a variable! This sounds good but is actually a gimcrack.

Make will think that url=!filer.type=audio is a variable override & complain about missing targets. To ameliorate that we can prefix the line w/ # or :. Sounds easy, but than we need to slice the line in parse-record macro. This is the easiest job in any lang except Make--you won't do it correctly w/o invoking awk or any other external tool.

If we use an external tool for parsing a mere config line, why use a self-inflicted parody of the config file instead of a human-readable one?

Ini format would perfectly fit. E.g.,

[JS Party]
url =

lines are self-explanatory to anyone who has seen a computer once. We can use Ruby, for example, to convert the lines to :name=JS_Party!url= or even better to


(notice the amount of shell escaping) & use Ruby again in makefile to transform that json to name=val pairs, that we eval in the recipe later on.

How do we pass them to the makefile? If we escape each line correctly, xargs will suffice:

ruby ini-parse.rb subs.ini | xargs make -f

Parsing xml

Ruby, of course, has a full fledged rss parser in its stdlib, but do we need it? A fancy podcast client (that tracks your every inhalation & exhalation) would display all metadata from an rss it can obtain, but I don't want the fancy podcast client, what I want what's most important to me, is that I have a guarantee is a program that reliably downloads the last N enclosures from a list of feeds.

Thus the minimal parser looks like

$ curl -s | \
nokogiri -e 'puts $_.css("enclosure,link[rel=\"enclosure\"]").\
map{|e| e["url"] || e["href"]}' \
| head -2


One of the obviously helpful user options is the number of enclosures he wants to download. E.g, when the user types

$ gmakepod g=emacs e=5

the client produces .files file that has a list of 5 shell-escaped json 'records'. e=5 option could also appear in an .ini file. To distinguish options passed from the CL from options read from the .ini, we prefix options from the .ini w/ a dot. The opt macro is used to get the final value:

opt = $(or $($1),$(.$1),$2)

E.g.: $(call opt,e,2) checks the CL opt first, then the .ini opt, &, as a last resort, returns the def value 2.

Output file names

Not every enclosure url has a nice path name. What file name should we assign to an .mp3 from the url below?

Maybe we can use the URI path, 20180504_pmoney_pmpod839v2.mp3 in this case. Is it possible to extract it in pure Make?

In the most extreme case, the uri path may not be even unique. Say a feed has 2 entries & each article has 1 enclosure, than they both may have the same path name:

<link rel="enclosure" type="audio/mpeg" length="1234"

<link rel="enclosure" type="audio/mpeg" length="5678"

In addition, the output name must be 'safe' in terms of Make. This means no spaces or $, :, %, ?, *, [, ~, \, # chars.

All of this leads us to another use of Ruby in Make stead. We extract the uri path from the url, strip out the extension, prefix the path w/ a name of a feed (listed in the .ini), append a random string + an extension name, so the output file from the above url looks similar to:


A homework question: what should we do if a uri path lacks a file extension?


If we successfully downloaded an enclosure, there is rarely a need to download it again. A 'real' podcast client would look at id/guid (the date is usually useless) to determine if the entry has any updated enclosures; our Mickey Mouse parser relies on urls only.

Make certainly doesn't have any key/value store. We could try employing the sqlite CL interface or dig out gdbm or just append a url+'\n' to some history.txt file.

The last one is a tempting one, for grep is uber-fast. As the history file becomes a shared resource, we might get ourselves in trouble during parallel downloads, though. lockfile rubygem provides a CL wrapper around a user specified command, hence can protect our 'db':

rlock history.lock -- ruby -e 'IO.write "history.txt", ARGV[0]+"\n", mode: "a"' ''

It works similarly to flock(1), but supposedly is more portable.

Makefile generation

The last but one step is to generate a makefile named After we collected all enclosure urls, we write to the .mk file a set of rules like

@mkdir -p $(dir $@)
curl '' > $@
@rlock history.lock -- ruby -e 'IO.write "history.txt", ARGV[0]+"\n", mode: "a"' ''

Our last step is to run

make -f -k -j1 -Oline

The number of jobs is 1 by default, but is controllable via the CL param (gmakepod j=4).

  1. Image src: Fried-Tomato 

No comments:

Post a Comment