Sunday, October 2, 2016

How GNU Make's patsubst Function Really Works

Abstract

$(patsubst) is a GNU Make internal function that deals with text processing such as file names transformations. Despite of having a very simple idea behind it, the peculiar way of its implementation leads to confusion & uncertainty for novice Make users. The function doesn’t return any errors or signal any warnings. It uses its own wildcard mechanism that doesn’t have any resemblance with the usual glob or regexp patterns.

For example, why this transformation doesn’t work?

$(patsubst src/%.js, build/%.js, ./src/foo.js)

We expect ./src/foo.js to be converted to build/foo.js, but patsubst leaves the file name untouched.

Extract method

Before we begin, we need a quick way of inspecting the results of patsubst evaluations. GNU Make doesn’t have a REPL. There are primitive hacks around it like ims:

$ rlwrap -S 'ims> ' ./ims
ims> . $(words will cost ten thousand lives this day)
7

that allow you to play with Make functions interactively, but they won’t help you to examine Make’s internals, for there is no way to view the source code of a particular function like you do it in irb + method_source gem, for example.

I’ve extracted patsubst function from Make 4.2.90 into a separate command gmake-patsubst. After you compile it, just run it from the shell as:

$ ./gmake-patsubst src/%.js build/%.js ./src/foo.js
./src/foo.js

providing exactly 3 arguments as you would do in makefiles, only using the shell quoting/splitting rules instead of Make’s (i.e., use a space as an argument separator instead of a comma).

(A side note about the extract: it’s ≈ 520 lines of an imperative code! This is what you get when you program in C.)

If you want to read the algo itself, start from patsubst_expand_pat().

patsubst explained

Let’s recap first what patsubst does.

$(patsubst PATTERN, REPLACEMENT, TEXT)

The majority of its use is to tranform a list of file names. It operates like a map() on an iterable in JavaScript:

TEXT
    .split(/\s+/)
    .map( (file) => magic_transform(PATTERN, REPLACEMENT, file) )
    .join(' '))

It’s a pure function that returns a new result, leaving its arguments untouched. It works with supplied file names in TEXT as strings–it doesn’t do any IO.

The first thing to remember is that it splits TEXT into chunks before doing any substantial work further. All transforming is being done by individually applying PATTERN to each chunk.

For example, we have a list of .jsx file that we want to tranform into the list of .js files. You may think that the simplest way of doing it with patsubst would look like this:

$ ./gmake-patsubst .jsx .js "foo.jsx bar.jsx"
foo.jsx bar.jsx

Well, that didn’t work!

The problem here is that in this case patsubst checks if each chunk matches PATTERN exacly as a full word byte-to-byte. In regex terms this would look as ^\.jsx$. To prove this, we modify our pattern to be exactly foo.jsx:

$ ./gmake-patsubst foo.jsx .js "foo.jsx bar.jsx"
.js bar.jsx

Which works as we described but isn’t much of a help in real makefiles.

Thus patsubst has a wildcard support. It is similar to the character % in Make pattern rules, that mathes any non-empty string. For example, % in %.jsx pattern could match foo against foo.jsx text. The substring that % matches (foo in the example) is called a stem1.

There could be only one % in a pattern. If you have several of them, only the first one would be the wildcard, all others would be treated as regular characters.

To return to our example with .jsx files, using % in both PATTERN & REPLACEMENT arguments yields to desired result:

$ ./gmake-patsubst %.jsx %.js "foo.jsx bar.jsx"
foo.js bar.js

When REPLACEMENT contains a % character, it is replaced by the stem that matched the % in PATTERN.

Using the character % only in patterns is rarely useful, unless you want to replicate Make’s $(filter-out) function:

$ ./gmake-patsubst %.jsx "" "foo.jsx bar.js"
bar.js

Which is the equivalent of

$(filter-out %.jsx, foo.jsx bar.js)

If there is no % in PATTERN but there is % in REPLACEMENT, patsubst resorts to the case of a simple, exact substitution that we saw before.

$ ./gmake-patsubst foo.jsx % "foo.jsx bar.jsx"
% bar.jsx

Now, to return to our first example from Abstract:

$(patsubst src/%.js, build/%.js, ./src/foo.js)

Why didn’t it work out?

Putting together all we’ve learned so far, here is the high-level algorithm of what patsubst does:

  1. It searches for the % in PATTERN & REPLACEMENT. If found, it cuts off everything before %. Let’s call such a cut-out part pattern-prefix (src/) & replacement-prefix (build/). It leaves us with .js & (again) .js correspondingly. Let’s call those parts pattern-suffix & replacement-suffix.

  2. Splits TEXT into chunks. In our case there is nothing to split, for we have only 1 file name (a string w/o spaces): ./src/foo.js.

  3. If there is no % in PATTERN it does a simple substitution for each chunk & returns the result.

  4. If there indeed was % in PATTERN, it (for each chunk):

    4.1. (a) Makes sure that pattern-prefix is a substring of the chunk. In JavaScript it would look like:

     CHUNK.slice(0, PATTERN_PREFIX.length) === PATTERN_PREFIX
    

    It’s false in our example, for src/ != ./src/.

    (b) Makes sure that pattern-suffix is a substring of the chunk. In JavaScript it would look like:

     CHUNK.slice(-PATTERN_SUFFIX.length) === PATTERN_SUFFIX
    

    It’s true in our example, for .js == .js.

    4.2. If the subitem #4.1 is false (our case!) it just returns an unmodified copy of the original chunk.

    4.3. Iff2 both (a) & (b) in the subitem #4.1 were indeed true, it cuts-out pattern-prefix & pattern-suffix from the chunk, transforming it to a stem.

    4.4. Concatenates replacement-prefix + stem + replacement-suffix.

  5. Joins all the chunks (modified of unmodified) with a space & returns the result.

As you see, the algo is simple enough, but probably is not exactly similar to what you may have imagined after reading the Make documentation.

In conclusion, hopefully now you can explain the result of patsubst evaluation below (why only src/baz.js was transformed correctly):

$ ./gmake-patsubst src/%.js build/%.js "./src/foo.js src/bar.jsx src/baz.js"
./src/foo.js src/bar.jsx build/baz.js

The nodejs version of the patsubst can be found here. Note that it’s a simple example & it must not be held as a reference.

PS. Here is an alternate version of this post that can be more readable on your phone.

  1. (For non-English speakers like yours trully) The noun stem means several things: 1) (in linguistics) a form of a word after all affixes are removed; 2) (in botany) a slender structure that supports a plant.

  2. A quote from the Emacs manual: ‘“Iff” means “if and only if”. […] Try to avoid using this term in documentation, since many are unfamiliar with it and mistake it for a typo.

No comments:

Post a Comment