Shell Scripting in C++

Do you like C++?
Do you like the convenience of shell scripts where you just execute the source code without needing a separate build/compile step?
Do you like both so much that you wish you could write shell scripts in C++?

If you’re still nodding, here’s one possible way to do it:

#!/bin/sh
CXX=g++ ; CXXFLAGS="-O2"
fname=$(mktemp --tmpdir cscript.XXXX) ; exec 999<"$fname" ; rm "$fname"
sed '1c#if 0' "$0" | $CXX $CXXFLAGS -xc++ - -o /dev/fd/999 && exec -a "$0" /dev/fd/999 "$@" ; exit 1
#endif

#include <stdio.h>
int main(int argc, char *argv[])
{
    printf ("%d arguments:\n", argc);
    for (int i=0; i < argc; i++) {
        printf ("%d: %s\n", i, argv[i]);
    }
    return 42;
}

This is a POSIX shell script with embedded C++ source code (starting at line 6). The shell script runs the C++ compiler on itself to produce an executable, then runs the executable. (This isn’t a new idea, though it wasn’t easy to find examples of it. Here are a few examples: 1, 2, 3. The same technique is also used to make self-extracting archives, with shell commands at the beginning to untar the data contained later in the same file.)

Line 2 sets two variables to choose the compiler and compiler flags.
Line 3 creates a temporary file, opens it as file descriptor 999 (that hopefully doesn’t collide with a file descriptor already in use), then unlinks (rm) the file. Unlinking the file here is one way to ensure the temporary file gets deleted even if compilation or execution fails. The file gets deleted once there are no more open file handles, but remains accessible by file descriptor (and /dev/fd/999 and /proc/self/fd/999) while the file is still open.
Line 4 uses sed to replace the first line of the current script with “#if 0”, which matches the “#endif” on line 5, hiding the shell code from the C++ compiler. This is then piped to g++, which writes the executable to the temporary file. The -xc++ flag is needed because the input file name does not have a .cc/.cpp extension, so we need to explicitly declare the input language as C++ source code. Then we exec the temporary file (the temporary file is accessed via /dev/fd/999 rather than file name, since we already unlinked the file).

Using sed to replace the first line has the advantage of not changing the number of lines of input (compared to using just cutting off the first few lines using tail +6), so that any C++ compiler warning/error messages still have line numbers that match the source file.

The C++ code begins on line 6.

If we put the above into a file named test.sh and run it, we get this:

$ ./test.sh hello world
3 arguments:
0: ./test.sh
1: hello
2: world
$ echo $?
42

A C++ shell script that receives command-line arguments and produces an exit status.

Compile time overhead

One of the disadvantages of a compiled language is that the compiler itself takes a non-negligible amount of time to run. For the example above when run with no arguments, it takes about 52ms for the C++ version (~50ms compile + 2ms run), but only 10ms for an equivalent Bash version. The Bash version is below:

#!/bin/bash
echo "$(($# + 1)) arguments:"
I=0
echo "0: $0"
for a in $@; do
    echo "$I: $a"
    I=$((I+1))
done
exit 42

One way to mitigate the compile time is to use a compiler cache (ccache) so that repeated execution of the same script does not need a full recompile. The above script needs to be modified for ccache, because ccache can only cache the compile step (not link), and ccache requires the compiler input to be from a file (not pipe).

#!/bin/sh
CXX="g++" ; CXXFLAGS="-O2" ; CCACHE=`which ccache 2>/dev/null`
fname=$(mktemp --tmpdir cscriptXXXX) ; exec 998<"$fname" ; rm "$fname"
fname=$(mktemp --tmpdir cscriptXXXX) ; exec 999<"$fname" ; rm "$fname"
sed '1c#if 0' "$0" > /dev/fd/998
$CCACHE $CXX $CXXFLAGS -xc++ -c /dev/fd/998 -o /dev/fd/999 && $CXX $CXXFLAGS /dev/fd/999 -o /dev/fd/998 && exec -a "$0" /dev/fd/998 "$@" ; exit 1
#endif

#include <stdio.h>
int main(int argc, char *argv[])
{
    printf ("%d arguments:\n", argc);
    for (int i=0; i < argc; i++) {
        printf ("%d: %s\n", i, argv[i]);
    }
    return 42;
}

This time, we use two temporary files (because both the compiler input and output must be files), and we use ccache for the compile phase (ccache cannot cache the link phase). This cuts the total (ccache hit + link + execution) time to around 40ms. This is still slower than the Bash version. However, I expect the savings to be greater with longer programs that take longer to compile.

For this trivial program, the C++ version outperforms the Bash version when the number of input arguments exceeds around 12000. The faster execution time of the C++ code eventually pays for its overhead.

But this script isn’t C++ anymore…

That’s true. If you tried to compile this script with a C++ compiler, it would complain that the #!/bin/sh on the first line isn’t legal C preprocessor. I don’t see a way to make the shebang (#!) legal C code, but it is possible to remove it entirely and replace it with #if 0, and hope that the shell used to execute the script is POSIX compatible. This works because #if 0 is a comment in POSIX shell script, turning this file into both legal C++ code and shell script. Removing the shebang is an option if being able to feed the script unmodified through a C++ compiler is more important to you than not being completely certain which shell executes the script.

#if 0
CXX="g++" ; CXXFLAGS="-O2" ; CCACHE=`which ccache 2>/dev/null`
fname=$(mktemp --tmpdir cscriptXXXX) ; exec 998<"$fname" ; rm "$fname"
fname=$(mktemp --tmpdir cscriptXXXX) ; exec 999<"$fname" ; rm "$fname"
sed '1c#if 0' "$0" > /dev/fd/998
$CCACHE $CXX $CXXFLAGS -xc++ -c /dev/fd/998 -o /dev/fd/999 && $CXX $CXXFLAGS /dev/fd/999 -o /dev/fd/998 && exec -a "$0" /dev/fd/998 "$@" ; exit 1
#endif

// ... C++ code

2 comments to Shell Scripting in C++

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>