Tuesday, September 12, 2017

'indent-region' in the Emacs batch mode

I had an old project of mine that was using an archaic tab-based indentation w/ the assumed tab width of 4. The mere opening of its files in editors other than Emacs was causing such pain that I decided to do what you should rarely do--reindent the whole project.

(A side note: if you're setting the tab width to a value other than 8 & simultaneously is using tabs for indentation, you're a bad person.)

The project had ~40 files. Manually using Emacs indent-region for each file would have been too wearisome. Then I remembered that Emacs has the batch mode.

A quick googling gave me a recipe akin to:

$ emacs -Q -batch FILE --eval '(indent-region (point-min) (point-max))' \
  -f save-buffer

It would have worked but it had several drawbacks as a general solution, for it:

  1. modifies a file in-place;
  2. can't read from the stdin;
  3. doesn't work w/ multiple files forcing a user to use xargs(1) of smthg;
  4. if FILE doesn't exist, Emacs quietly creates an empty file, whereas a user probably expects to see an error message.

Whereas an ideal little script should:

  1. never overwrite the source file;
  2. read the input from the stdin or from its arguments;
  3. spit out the result to the stdout;
  4. handle the missing input reasonably well.

Everything else should be accommodated by the shell, including the item #2 from the 'drawbacks' list above.

Using Emacs for such a task is tempting, for it supports a big number of programming modes, giving us the ability to employ the editor as a universal reindenting tool, for example, in makefiles or Git hooks.

Writing to the stdout

We can slightly modify the above recipe to:

$ emacs -Q -batch FILE --eval '(indent-region (point-min) (point-max))' \
  --eval '(princ (buffer-string))'

In -f stead we are using --eval twice. buffer-string function does exactly what it says: returns the contents of the current buffer as a string.

$ cat 1.html
<p>
hey, <i>
what's up</i>?
</p>

$ emacs -Q -batch 1.html --eval '(indent-region (point-min) (point-max))' \
  --eval '(princ (buffer-string))'
Indenting region...
Indenting region...done
<p>
  hey, <i>
    what's up</i>?
</p>

The "Indenting region" lines come from Emacs message function (which the progress reporter uses). In the batch mode message prints the lines to the stderr.

The solution also address the item #3 from the "drawbacks" list--Emacs doesn't create an empty file on the invalid input, although it doesn't indicate the error properly, i.e., it fails w/ the exit code 0.

Processing all files at once

If you know what are you doing & the files you're going to process are under Git, overwriting the source it not a big deal. Perhaps for a quick hack this script will do:

:; exec emacs -Q --script "$0" -- "$@" # -*- emacs-lisp -*-

(defun indent(file)
  (set-buffer "*scratch*")
  (if (not (file-directory-p file))
      (when (and (file-exists-p file) (file-writable-p file))
        (message "Indenting `%s`" file)

        (find-file file)
        (indent-region (point-min) (point-max))
        (save-buffer))))

(setq args (cdr argv))                  ; rm --
(dolist (val args)
  (indent val))

If you put the src above into a file named emacs-indent-overwrite & add executable bits to it, then the shell thinks it's a usual sh-script that doesn't have a shebang line. A colon in sh is a noop, but on stumbling upon exec the shell replaces itself with the command

emacs -Q --script emacs-indent-overwrite -- arg1 arg2 ...

When Emacs reads the script, it doesn't expect it to be a sh one, but hopefully the file masks itself as a true Elisp, for in Elisp a symbol whose name starts with a colon is called a keyword symbol. : is a symbol w/o a name (a more usual constant would be :omg) that passes the check because it satisfies keywordp predicate:

ELISP> (keywordp :omg)
t
ELISP> (keywordp :)
t

Everything else in the file is usual Elisp code. Here's how it works:

$ emacs-indent-overwrite src/*
Indenting ‘src/1.html‘
Indenting region...
Indenting region...done
Indenting ‘src/2.html‘
Indenting region...
Indenting region...done

The only thing worth mentioning about the script is why indent procedure has (set-buffer "*scratch*") call. It's an easy way to switch the current directory to the directory from which the script was started. This is required, for find-file modifies the default directory of the current buffer (via modifying a buffer-local default-directory variable). The other way is to modify args w/ smthg like

(setq args (mapcar 'expand-file-name args))

& just pass the file argument as a full path.

Reading from the stdin

There is next to none info about the stdin in the Emacs Elisp manual. The section about minibuffers hints us that for reading from the standard input in the batch mode we ought to seek out for functions that normally read kbd input from the minibuffer.

One of such functions is read-from-minibuffer.

$ cat read-from-minibuffer
:; exec emacs -Q --script "$0"
(message "-%s-" (read-from-minibuffer ""))

$ printf hello | ./read-from-minibuffer
-hello-
$ printf 'hello\nworld\n' | ./read-from-minibuffer
-hello-

The function only read the input up to the 1st newline which it also impertinently ate. This leaves us w/ a dilemma: if we read multiple lines in a loop, should we unequivocally assume that the last line contained the newline?

read-from-minibuffer has another peculiarity: on receiving the EOF character it raises an unhelpful error:

$ ./read-from-minibuffer
^D
Error reading from stdin
$ echo $?
255

The simplest Elisp cat program must watch out for that:

$ ./cat.el < cat.el
:; exec emacs -Q --script "$0"
(while (setq line (ignore-errors (read-from-minibuffer "")))
  (princ (concat line "\n")))

Next, if we read the stdin & print the result to the stdout, our "ideal" reindent script cannot rely on find-file anymore, unless we save the input in a tmp file. Or we can leave out "heavy" find-file altogether & just create a temp buffer & copy the input there for the further processing. The latter implies we must manually set the proper major mode for the buffer, otherwise indent-region won't do anything good.

set-auto-mode function does the major mode auto-detection. One of the 1st hints it looks for is the file extension, but the stdin has none. We can ask the user to provide one in the arguments of the script.

:; exec emacs -Q --script "$0" -- "$@" # -*- emacs-lisp -*-

(defun indent(mode text)
  (with-temp-buffer
    (set-visited-file-name mode)
    (insert text)
    (set-auto-mode t)
    (message "`%s` major mode: %s" mode major-mode)

    (indent-region (point-min) (point-max))
    (buffer-string)))

(defun read-file(file)
  (with-temp-buffer
    (insert-file-contents file)
    (buffer-string)))

(defun read-stdin()
  (let (line lines)
    (while (setq line (ignore-errors (read-from-minibuffer "")))
      (push line lines))
    (push "" lines)
    (mapconcat 'identity (reverse lines) "\n")
    ))

 

(setq args (cdr argv))                  ; rm --

(setq mode (car args))
(if (equal "-" mode)
    (progn
      (setq mode (nth 1 args))
      (if (not mode) (error "No filename hint argument, like .js"))
      (setq text (read-stdin)))
  (setq text (read-file mode)))

(princ (indent mode text))

If the 1st argument to the script is '-', we look for the hint in the next argument & then start reading the stdin. It the 1st arg isn't '-' we assume it's a file that we insert into a temp buffer & return its contents as a string.

The

(mapconcat 'identity (reverse lines) "\n")

line in read-stdin procedure is an equivalent of

['world', 'hello'].reverse().join("\n")

in JavaScript.

Some examples:

$ printf "<p>\nhey, what's up?\n</p>" | ./emacs-indent - .html
‘.html‘ major mode: html-mode
Indenting region...
Indenting region...done
<p>
  hey, what's up?
</p>

$ printf "<p>\nhey, what's up?\n</p>" | ./emacs-indent - .txt
‘.txt‘ major mode: text-mode
Indenting region...
Indenting region...done
<p>
hey, what's up?
</p>

$ ./emacs-indent src/2.html 2>/dev/null
<p>
  not much
</p>