scons: Work around very long command lines and arg size limits.

In the background, scons invokes commands using what is essentially
'sh -c ${CMD}'. It's a little more sophisticated than that, but the net
effect is that all of the arguments of a given command are collected
into a single argument for sh. While Linux has a very generous limit
on the number of command line arguments you can have, and a relatively
generous limit on the total size of the arguments and environment
variables which can be read with "getconf ARG_MAX" (about 2MB only my
system), the limit on an individual argument, defined in the little
documented constant MAX_ARG_STRLEN, is much smaller, 32 pages or
131072 bytes from what I've read.

Unfortunately, when building gem5, especially without partial linking,
the command line for the final linking step can be very long, and with
some extra files brought in with EXTRAS can relatively easily exceed
this limit.

To work around this problem, this change adds a new scons "tool" which
replaces the SPAWN and PSPAWN construction environment variables (not
the shell environment, scon's env, aka main) with wrappes. These
wrappers create a temporary file, write the command into it, and then
tell the original SPAWN/PSPAWN implementation to run that file as the
actual command.

This works, but it has two potential drawbacks. First, it creates
extra temporary files which may add some overhead to the build. The
overhead is likely small, but is non-zero.

Second, while this restores the original breaks between command line
arguments, it doesn't completely avoid the total argument/environment
size limit which would be harder to hit, but does still exist.

Another option would be to use the same sort of technique, but to use
gcc's (and I assume clang's) ability to read options from a file using
an @file argument, where file is replaced with the name of the
arguments.

The upside of this approach is that it avoids all of these limits
since gcc is just reading a file, and the only limits are ones it self
imposes. Also, this would only apply to invocations of gcc, and so
would be worth plumbing in to only affect, say, the linking step.
There would be no hypothetical overhead from creating temp files.

One downside though, is that this is not a universal approach and
would break anything which doesn't understand @file style arguments.
Also, when intercepting calls to the shell, the arguments we're
getting have been made appropriate for that environment (escaping,
quoting, etc). The rules for escaping arguments in an @file may not be
quite the same as for the shell, and so there may be weird cases where
a command line gets garbled this way.

Given the tradeoffs, I think always putting the commands into temp
files and running them as scripts is the way to go.

Change-Id: I0a5288aed745a432ed72ffd990ceded2b9422585
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41273
Reviewed-by: Earl Ou <shunhsingou@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2 files changed