Austin suggested a while ago that the corpus size be printed in the
header. In the end it seems the corpus will be fixed per test script,
so this suggestion indeed makes sense.
The tabbing was wrapping on my usual 80 column terminal, so I joined
the input and output columns together.
Unlike in the correctness tests, the most common cause of non-zero
return seems to be the user interrupting, so killing the run seems
like the friendly thing to do.
This is not near as fancy as as the unit tests, on the theory that
the code should typically be crashing when performance tuning.
Nonetheless, there is plenty of room for improvement. Several more of
the pieces of the test infrastructure (e.g. the option parsing) could
be factored out into test/test-lib-common.sh