Saturday, October 29, 2016

BOM & exec


Recently, I’ve stumbled upon a post about an accidental BOM in a shell script file. tl;dr for those who don’t read Ukrainian:

  1. A guy had a typical shell script that got corrupted by some Windows editor by prefixing the first line of the file (the shebang line) with the BOM.
  2. The shell was trying to execute the script.
  3. Everybody got upset.

I got curious why bash tries to run scripts w/ BOM in the first place. I’ve looked into the latest bash-4.3 & tcsh-6.19.00 on Fedora 24. Everywhere in the text below we draw the BOM w/ the replacement character (codepoint U+FFFD): �.

Some findings:

  • I was wrong about the bloody shebang lines for I thought that no shell ever reads them.
  • bash & tcsh don’t use libc properly & both invent their own rigmarole instead of using the provided routine.
  • bash is a mess! (Which is hardly a discovery.)

With shebang

If a file contains a valid shebang line, everything is easy: when you pass the file name to any of execv, execve, execlp, etc. functions, the kernel steps in, reads the shebang line and executes the interpreter, that was mentioned in the shebang, with the file in question as its argument.

This picture falls to pieces, when the file contains the petty BOM, for the kernel fails to recognize that �#!/omg/lol should be (in our naïve mind) an equivalent to #!/omg/lol.

Both tcsh & bash have a backup plan for systems w/o the shebang support in the kernel. Besides the obvious win32 candidate, tcsh lists 2 other systems: os390 & bs2000 (I wonder who on earth still have them). bash uses autoconf & therefore doesn’t have a pre hard-coded build configuration set. Unfortunately, I believe the autoconf test for the shebang line support is bogus:

$ cat ac_sys_interpreter
#! /bin/cat
exit 69

Presumably, the thinking was: if you run it on any modern system, the kernel will run /bin/cat ac_sys_interpreter which will just print the file, but on prehistoric time-sharing machines a simple-minded /bin/sh will execute it as a shell script & then you can test if the exit code == 69. (For why it would do so–read the next section.) The trouble is, that the old system may very well have /bin/sh that does its own shebang processing in case kernel doesn’t, alas rendering the test useless, & henceforth compiling bash w/o shebang support.

Without shebang

As long as the kernel flops at the invalid first line, the whole commotion becomes the case of a file w/o the shebang.

This is how we were all taught about interpreter files back in the day:

“the shell reads the command and tries to execlp the filename. Because the shell script is an executable file but isn’t a machine executable, an error is returned and execlp assumes that the file is a shell script (which it is). Then /bin/sh is executed with the pathname of the shell script as its argument.”

(from APUE, the 3rd ed)

E.g. suppose we have

$ cat
echo Діти, це їжачок!
ps -p $$                # print the shell the script is running under

If we run it, the shell

  1. checks if the script has executable bits (suppose it has)
  2. tries to exec the file
  3. which fails with ENOEXEC, for it’s not a ELF
  4. [a tcsh/bash dance]
  5. exec again but this time it’s /bin/sh with as an argument

The last item is important & may be not quite apparent, for if you have a csh-script

$ cp demo2.csh

you may expect that tcsh will not run it as sh-one:

$ tcsh -f
> ./demo2.csh
Діти, це їжачок!
   PID TTY          TIME CMD
102213 pts/21   00:00:00 sh

which is false, for tcsh follows the standards here.

Expectations vs. reality

APUE says a shell is ought to use execlp that in turn is supposed to do all the dirty work for us. As it happens execlp does exactly that, at least in Linux glibc. Of course, both bash/tcsh ignore the advice & use their own scheme.

tcsh does a plain execv then, after failure, peeks into the first 2 bytes to see (w/ the help of iswprint(3)) if they are “printable”. Here, if tcsh (a) finds the file “acceptable” & (b) tries to run the script with the shebang line in it on a system w/o kernel support for such a line, it processes that line by itself.

If we poison our script with the BOM:

$ uconv --add-signature >
$ chmod +x !$
$ head -c 37 !$ | hexdump -c
0000000 357 273 277   e   c   h   o     320 224 321 226 321 202 320 270
0000010   ,     321 206 320 265     321 227 320 266 320 260 321 207 320
0000020 276 320 272   !  \n                                            

tcsh doesn’t try to re-execv & aborts:

> ./
./ Exec format error. Wrong Architecture.

bash, on the other hand, tries to be more clever, failing spectacularly. After execve it goes into a journey of figuring out why the exec has failed. It:

  1. opens the file & analyses the shebang line! In the example above we didn’t have one, but if we did, bash would have produced a message:

    $ cat demo3.invalid.awk
    #!/usr/bin/awwwwwwwk -f
    BEGIN { print "this is awk" }
    $ ./demo3.invalid.awk
    sh: ./demo3.invalid.awk: /usr/bin/awwwwwwwk: bad interpreter: No such file or directory

    tcsh won’t do anything like that & will print ./demo3.invalid.awk: Command not found..

  2. checks if the file has an ELF header & tries to find out what is wrong w/ it;

  3. reports the “success” of the execution, if the file has the length of 0.

  4. checks if the file is “binary”. I use quotes here, for this is an example of how the good intentions don’t always turn into reality. Instead of a simple 2 bytes check, like it’s done in tcsh, bash reads 80 bytes & calls a certain check_binary_file() function that is a good example of why you should not blindly trust the comments in the code:

    /* Return non-zero if the characters from SAMPLE are not all valid
       characters to be found in the first line of a shell script.  We
       check up to the first newline, or SAMPLE_LEN, whichever comes first.
       All of the characters must be printable or whitespace. */
    check_binary_file (sample, sample_len)
         char *sample;
         int sample_len;
      register int i;
      unsigned char c;
      for (i = 0; i < sample_len; i++)
          c = sample[i];
          if (c == '\n')
            return (0);
          if (c == '\0')
            return (1);
      return (0);

    Despite of the resolution for all of the characters must be printable or whitespace, the function returns 1 only in case when sample contains the NULL character. Our BOM-example doesn’t have one, thus the script runs, albeit with a somewhat cryptic error if you have no idea about the existence of the BOM in the file:

    $ ./
    ./ line 1: �echo: command not found
       PID TTY          TIME CMD
    115569 pts/26   00:00:00 sh

    What if we do have the NULL character?

    $ hexdump -c
    0000000   e   c   h   o      \0  \n   e   c   h   o     320 224 321 226
    0000010 321 202 320 270   ,     321 206 320 265     321 227 320 266 320
    0000020 260 321 207 320 276 320 272   !  \n   p   s       -   p       $
    0000030   $  \n                                                        

    Here NULL is an argument to echo command, which should be totally legal, but not w/ bash!

    $ ./
    sh: ./ cannot execute binary file: Exec format error

    Which of course wouldn’t be an issue had the file had the shebang line.

  5. If bash finds the file “acceptable” on a system w/o kernel support for the shebang line when the file indeed contains one, it does the same thing tcsh does: tries to process it by itself.


The most popular shells are too bloated, bizarre & have many undocumented features.

Some hints:

  • The shebang line isn’t necessary if you target /bin/sh, but the shell does less work if you provide it.
  • To view BOMs, use less(1) or hexdump(1).
  • To test for the BOM, use file(1).
  • To remove the BOM manually, use M-x find-file-literally in Emacs.

No comments:

Post a Comment