Discussion:
env: add -S option (split string for shebang lines in scripts)
Assaf Gordon
2018-04-22 10:25:51 UTC
Permalink
Hello,

Following recent discussion about shebang lines [1][2],
attached is a patch to add "env -S", with the same syntax as freebsd's
env(1).

[1] https://lists.gnu.org/r/coreutils/2017-05/msg00020.html
[2] https://lists.gnu.org/r/bug-coreutils/2018-04/msg00031.html

Also included a small patch adding "--debug/-v" option.

It includes tests with close to %100 coverage, but no documentation yet.
If you think this is a good addition, I'll update the patch with
documentation.

Comments welcomed,
- assaf
Pádraig Brady
2018-04-24 03:38:27 UTC
Permalink
Post by Assaf Gordon
Hello,
Following recent discussion about shebang lines [1][2],
attached is a patch to add "env -S", with the same syntax as freebsd's
env(1).
[1] https://lists.gnu.org/r/coreutils/2017-05/msg00020.html
[2] https://lists.gnu.org/r/bug-coreutils/2018-04/msg00031.html
Also included a small patch adding "--debug/-v" option.
It includes tests with close to %100 coverage, but no documentation yet.
If you think this is a good addition, I'll update the patch with
documentation.
This is quite useful functionality to have in a standard tool.
Thanks a lot for implementing it.

I'd rather be consistent with other utils and just use the long --debug form.
Oh I see, the FreeBSD version supports -v. So in that case I agree with
supporting the short option also.

Please add an extra space after "--split-string=S " so that the man page
will be formatted appropriately.

There is no need to specify "inline" on functions generally
(unless defining in headers). Best let the compiler sort it out.

The error() for --debug may need translations or use
a debug() wrapper to avoid the syntax check

In scan_varname(), I think we should we restrict to:
s/isalpha/c_isalpha/
s/isalnum/c_isalnum/

I see the test script identifies the case where the OS does split
and handles spaces in the src dir. Excellent.

I noticed some tweaks for the test which are attached.

------ some general notes -----

I see that if one wants to use other options in combo with -S
they need to be part of the -S option. I.E. after -S.
It would be good to illustrate that in the docs,
or perhaps issue a warning if -S comes after other options?

For example it looks buggy that -u TERM
is just ignored in this example:

src/env -v -u TERM -S sh -c "echo \$TERM"

We should at least warn about this case, but maybe error out.

Another gotcha is that portably specifying -v
with -S (i.e. with only one option) requires:
src/env '-vS sh -c "echo \$TERM"'

So I would document two forms in the man page/--help like:

env [OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]
env -[v]S '[OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]'

and make it clear usage() that -S is only needed with shebang lines.
The quotes above are important to document too,
as otherwise consistent processing (like interpolation etc.) wouldn't
be done on operating systems that did split shebangs
(or when used in a standard command invocation).

thanks!
Pádraig
Assaf Gordon
2018-04-24 22:57:47 UTC
Permalink
Hello Pádraig,

Thank you for the quick review.
Post by Assaf Gordon
Following recent discussion about shebang lines [1][2],
attached is a patch to add "env -S", with the same syntax as freebsd's
env(1).
Please add an extra space after "--split-string=S " [...]
There is no need to specify "inline" on functions generally [..]
The error() for --debug may need translations [...]
In scan_varname(), I think we should we restrict to: [...]
Will fix these and send update soon.
I noticed some tweaks for the test which are attached.
Thank you, I'll add them as well.
------ some general notes -----
I see that if one wants to use other options in combo with -S
they need to be part of the -S option. I.E. after -S.
It would be good to illustrate that in the docs,
or perhaps issue a warning if -S comes after other options?
Good catch!
This slipped by me, as I did test that "-i"
and "-C" do work the same inside and outside "-S".
For example it looks buggy that -u TERM
src/env -v -u TERM -S sh -c "echo \$TERM"
We should at least warn about this case, but maybe error out.
It is a bug, but slightly more intricate then first seems.

Here's what FreeBSD does:

$ TERM=AAA env -uTERM -S'sh -c "echo x${TERM}x =\$TERM="'
xx ==
$ TERM=AAA env -S'-uTERM sh -c "echo x${TERM}x =\$TERM="'
xAAAx ==

Note that "${TERM}" is expanded by 'env' *before* executing sh,
while '\$TERM' is passed to 'sh' and expanded by it.

This is the current (wrong) implementation:

$ TERM=AAA ./src/env -uTERM -S'sh -c "echo x${TERM}x =\$TERM="'
xAAAx =AAA=
$ TERM=AAA ./src/env -S'-uTERM sh -c "echo x${TERM}x =\$TERM="'
xAAAx ==

I am going to assume that FreeBSD's behaviour is the "correct" one,
whether or not it makes sense, because it has been so since FreeBSD 6 (ca. 2005).

And so I will update the patch to behave like FreeBSD, unless there
are objections or different opinions.
Another gotcha is that portably specifying -v
src/env '-vS sh -c "echo \$TERM"'
In this case it's actually the same as FreeBSD:
"-v" affects only what comes after it:

FreeBSD (needs multiple -v to get max verbosity):

$ env -vvv -S"A=B uname"
#env verbosity now at 2
#env verbosity now at 3
#env split -S: 'A=B uname'
#env into: 'A=B'
#env & 'uname'
#env setenv: A=B
#env executing: uname
#env arg[0]= 'uname'
FreeBSD

$ env -S"-vvv A=B uname"
#env verbosity now at 2
#env verbosity now at 3
#env setenv: A=B
#env executing: uname
#env arg[0]= 'uname'
FreeBSD


Coreutils:

$ ./src/env -v -S"A=B uname"
./src/env: split -S: ‘A=B uname’
./src/env: into: ‘A=B’
./src/env: & ‘uname’
./src/env: setenv: ‘A=B’
./src/env: executing: ‘uname’
./src/env: arg[0]= ‘uname’
Linux

$ ./src/env -S"-v A=B uname"
./src/env: setenv: ‘A=B’
./src/env: executing: ‘uname’
./src/env: arg[0]= ‘uname’
Linux


In both implementation, if "-v" appears before "-S",
the debug messages will include the "-S". Otherwise,
they will include only what's after it.
env [OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]
env -[v]S '[OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]'
Based on the above, do we still want to explicitly show it in usage(),
or is it sufficient to mention the subtlely in the documentation?
and make it clear usage() that -S is only needed with shebang lines.
The quotes above are important to document too,
as otherwise consistent processing (like interpolation etc.) wouldn't
be done on operating systems that did split shebangs
(or when used in a standard command invocation).
Agreed. Should this go in usage(), or man-page, or only the texinfo ?


regards,
- assaf
Pádraig Brady
2018-04-25 05:09:41 UTC
Permalink
Post by Assaf Gordon
Hello Pádraig,
Thank you for the quick review.
Post by Assaf Gordon
Following recent discussion about shebang lines [1][2],
attached is a patch to add "env -S", with the same syntax as freebsd's
env(1).
Please add an extra space after "--split-string=S " [...]
There is no need to specify "inline" on functions generally [..]
The error() for --debug may need translations [...]
In scan_varname(), I think we should we restrict to: [...]
Will fix these and send update soon.
I noticed some tweaks for the test which are attached.
Thank you, I'll add them as well.
------ some general notes -----
I see that if one wants to use other options in combo with -S
they need to be part of the -S option. I.E. after -S.
It would be good to illustrate that in the docs,
or perhaps issue a warning if -S comes after other options?
Good catch!
This slipped by me, as I did test that "-i"
and "-C" do work the same inside and outside "-S".
For example it looks buggy that -u TERM
src/env -v -u TERM -S sh -c "echo \$TERM"
We should at least warn about this case, but maybe error out.
It is a bug, but slightly more intricate then first seems.
$ TERM=AAA env -uTERM -S'sh -c "echo x${TERM}x =\$TERM="'
xx ==
$ TERM=AAA env -S'-uTERM sh -c "echo x${TERM}x =\$TERM="'
xAAAx ==
Better, but a pity it's inconsistent.
Post by Assaf Gordon
Note that "${TERM}" is expanded by 'env' *before* executing sh,
while '\$TERM' is passed to 'sh' and expanded by it.
$ TERM=AAA ./src/env -uTERM -S'sh -c "echo x${TERM}x =\$TERM="'
xAAAx =AAA=
$ TERM=AAA ./src/env -S'-uTERM sh -c "echo x${TERM}x =\$TERM="'
xAAAx ==
I am going to assume that FreeBSD's behaviour is the "correct" one,
whether or not it makes sense, because it has been so since FreeBSD 6 (ca. 2005).
And so I will update the patch to behave like FreeBSD, unless there
are objections or different opinions.
Well according to the FreeBSD docs it should output 'xAAAx ==' in both cases.
I'd have a slight preference for being consistent there rather than being
bug for bug compat with FreeBSD here.
Post by Assaf Gordon
Another gotcha is that portably specifying -v
src/env '-vS sh -c "echo \$TERM"'
$ env -vvv -S"A=B uname"
#env verbosity now at 2
#env verbosity now at 3
#env split -S: 'A=B uname'
#env into: 'A=B'
#env & 'uname'
#env setenv: A=B
#env executing: uname
#env arg[0]= 'uname'
FreeBSD
$ env -S"-vvv A=B uname"
#env verbosity now at 2
#env verbosity now at 3
#env setenv: A=B
#env executing: uname
#env arg[0]= 'uname'
FreeBSD
$ ./src/env -v -S"A=B uname"
./src/env: split -S: ‘A=B uname’
./src/env: into: ‘A=B’
./src/env: & ‘uname’
./src/env: setenv: ‘A=B’
./src/env: executing: ‘uname’
./src/env: arg[0]= ‘uname’
Linux
$ ./src/env -S"-v A=B uname"
./src/env: setenv: ‘A=B’
./src/env: executing: ‘uname’
./src/env: arg[0]= ‘uname’
Linux
In both implementation, if "-v" appears before "-S",
the debug messages will include the "-S". Otherwise,
they will include only what's after it.
Sure, but the main point is that for shebang lines you only get
one argument, so if you typed the first coreutils form in a shebang
you would get effectively:

$ src/env '-v -S A=B uname'
src/env: invalid option -- ' '

That subtlety would be documented/avoided by explicitly
Post by Assaf Gordon
env [OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]
env -[v]S '[OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]'
Based on the above, do we still want to explicitly show it in usage(),
or is it sufficient to mention the subtlely in the documentation?
I still think the above two forms should be used in the synopsis
for the env man page (propagated from --help).
It also makes it clear that the position of -S is important.
Post by Assaf Gordon
and make it clear usage() that -S is only needed with shebang lines.
The quotes above are important to document too,
as otherwise consistent processing (like interpolation etc.) wouldn't
be done on operating systems that did split shebangs
(or when used in a standard command invocation).
Agreed. Should this go in usage(), or man-page, or only the texinfo ?
I was thinking that the explanation of -S in usage() would say something like:

-S, --split-string=S process and split S into separate arguments
used to pass multiple arguments on shebang lines

cheers,
Pádraig
Kaz Kylheku (Coreutils)
2018-04-26 03:19:32 UTC
Permalink
Post by Pádraig Brady
I was thinking that the explanation of -S in usage() would say
-S, --split-string=S process and split S into separate arguments
used to pass multiple arguments on shebang lines
One little problem with this FreeBSD design is that in spite of
supporting variable substitution, it lacks the flexibility of
specifying where among those arguments the script name is inserted!

Say we have a "#!/usr/bin/env -S lang ..." script called "foo.lang".

Suppose we want it so that "lang -f foo.lang -x" is invoked, where
-f is the option to the lang interpreter telling it to read the
script file foo.lang, and -x is an option to the foo.lang script
itself.

It would be useful if the -S mechanism could indicate this insertion
of foo.lang into the middle of the arguments.

The variable expansion mechanism could do it. Suppose that a certain
reserved variable name like ${ENV_COMMAND} (not my actual suggestion)
expands to the name of the last argument received by env.

Furthermore, when env is asked to expands this variable one or more
times, it makes a note of this and then sets a flag which suppresses
the script name from appearing in its usual position at the end.

Then this is possible:

#!/usr/bin/env -S lang -f ${ENV_COMMAND} -x
Pádraig Brady
2018-04-26 06:04:29 UTC
Permalink
Post by Kaz Kylheku (Coreutils)
Post by Pádraig Brady
-S, --split-string=S process and split S into separate arguments
used to pass multiple arguments on shebang lines
One little problem with this FreeBSD design is that in spite of
supporting variable substitution, it lacks the flexibility of
specifying where among those arguments the script name is inserted!
Say we have a "#!/usr/bin/env -S lang ..." script called "foo.lang".
Suppose we want it so that "lang -f foo.lang -x" is invoked, where
-f is the option to the lang interpreter telling it to read the
script file foo.lang, and -x is an option to the foo.lang script
itself.
It would be useful if the -S mechanism could indicate this insertion
of foo.lang into the middle of the arguments.
The variable expansion mechanism could do it. Suppose that a certain
reserved variable name like ${ENV_COMMAND} (not my actual suggestion)
expands to the name of the last argument received by env.
Furthermore, when env is asked to expands this variable one or more
times, it makes a note of this and then sets a flag which suppresses
the script name from appearing in its usual position at the end.
#!/usr/bin/env -S lang -f ${ENV_COMMAND} -x
This is a very good point. Thanks for bringing it up.
If we're taking the time to do this, we should make it generally usable.
This would be useful with for example:

#!/usr/bin/env -S awk -f

Now that does work if awk would ignore any options after -f ...,
but it only ignores options it doesn't recognise :/,
so if you wanted to pass -f to the awk script for example you'd need this support.
Note one could hack it using ${_} which is generally set to the last arg, i.e.

#!/usr/bin/env -S awk -F: -f ${_} --
But then the awk script would need to ignore the first argument which is messy,
and also ${_} may not be set in all contexts.

I suppose we could support the bash/ksh syntax of ${@:-1}
and once that variable is referenced it's not passed on.

cheers,
Pádraig
Assaf Gordon
2018-04-26 07:17:01 UTC
Permalink
Hello,

Attached an updated patch, hopefully addressing all the issues below.
Initial documentation also included (seperated into several commits to ease review).

A renderd html page is available here:
http://housegordon.org/files//tmp/env-invocation.html

It's rather long, but I hope it is helpful and justified, to address
all sorts of subtleties with examples.

I also added a short blurb to the man page, mentioning the
"-S" option and directing to the full documentation for details.
Post by Pádraig Brady
Post by Pádraig Brady
For example it looks buggy that -u TERM
src/env -v -u TERM -S sh -c "echo \$TERM"
This is fixed now. It requried a minor change to the 'unset' flow,
which is done in the first commit. Review welcomed.
Post by Pádraig Brady
Well according to the FreeBSD docs it should output 'xAAAx ==' in both cases.
I'd have a slight preference for being consistent there rather than being
bug for bug compat with FreeBSD here.
The code now behaves in the above consistent way, and thus deviate from
FreeBSD in this respect.
Post by Pádraig Brady
Post by Pádraig Brady
Another gotcha is that portably specifying -v
src/env '-vS sh -c "echo \$TERM"'
[...]
Post by Pádraig Brady
Post by Pádraig Brady
env [OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]
env -[v]S '[OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]'
The 'usage' has been updated, and I also added explicit
section about it in the documentation.
Post by Pádraig Brady
Post by Pádraig Brady
and make it clear usage() that -S is only needed with shebang lines.
The quotes above are important to document too,
as otherwise consistent processing (like interpolation etc.) wouldn't
be done on operating systems that did split shebangs
(or when used in a standard command invocation).
The quotes are tricky: they are needed when running 'env -S' on the command line,
but should not be used on a shebang line.

For example, the following results in an infloop (both on FreeBSD and in my implementation):

$ cat bad.sh
#!/usr/bin/env -S'A=B uname'

This is because the kernel already passes -S'A=B uname' as a single argument,
and then env sees the single-quotes. It then sets the envar "A" to the value
of "B uname". lastly, the kernel adds "./bad.sh" as the last parameter.
The result is that 'env' sets envvar "A" and runs "bad.sh" again... forever.

So when using 'env -S' on the command line - use quotes.
When using 'env -S' in a shebang line - don't use quotes (unless
when explicitly setting values that contain spaces).

I have added an explicit section about this in the documentation.

---

Lastly, this update does not yet address Kaz's suggestion
of having a placeholder for the command-line arguments.

I understand this might be useful in some cases, but I'm
not sure we want to create yet another GNU extension (compared to FreeBSD).

Are there concrete cases where the interpreter (like awk)
requires argument in specific order, and the script filename
can't be last?

For example in this case:

#!/usr/bin/env -S lang -f ${ENV_COMMAND} -x

Would 'lang' fail to execute if we just reorder it like:

#!/usr/bin/env -S lang -x -f


---

comments very welcomed,
- assaf
Pádraig Brady
2018-04-27 05:19:56 UTC
Permalink
Post by Assaf Gordon
Hello,
Attached an updated patch, hopefully addressing all the issues below.
Initial documentation also included (seperated into several commits to ease review).
http://housegordon.org/files//tmp/env-invocation.html
It's rather long, but I hope it is helpful and justified, to address
all sorts of subtleties with examples.
Excellent.
Post by Assaf Gordon
I also added a short blurb to the man page, mentioning the
"-S" option and directing to the full documentation for details.
Post by Pádraig Brady
Post by Pádraig Brady
For example it looks buggy that -u TERM
src/env -v -u TERM -S sh -c "echo \$TERM"
This is fixed now. It requried a minor change to the 'unset' flow,
which is done in the first commit. Review welcomed.
Post by Pádraig Brady
Well according to the FreeBSD docs it should output 'xAAAx ==' in both cases.
I'd have a slight preference for being consistent there rather than being
bug for bug compat with FreeBSD here.
The code now behaves in the above consistent way, and thus deviate from
FreeBSD in this respect.
Post by Pádraig Brady
Post by Pádraig Brady
Another gotcha is that portably specifying -v
src/env '-vS sh -c "echo \$TERM"'
[...]
Post by Pádraig Brady
Post by Pádraig Brady
env [OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]
env -[v]S '[OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]'
The 'usage' has been updated, and I also added explicit
section about it in the documentation.
Post by Pádraig Brady
Post by Pádraig Brady
and make it clear usage() that -S is only needed with shebang lines.
The quotes above are important to document too,
as otherwise consistent processing (like interpolation etc.) wouldn't
be done on operating systems that did split shebangs
(or when used in a standard command invocation).
The quotes are tricky: they are needed when running 'env -S' on the command line,
but should not be used on a shebang line.
$ cat bad.sh
#!/usr/bin/env -S'A=B uname'
This is because the kernel already passes -S'A=B uname' as a single argument,
and then env sees the single-quotes. It then sets the envar "A" to the value
of "B uname". lastly, the kernel adds "./bad.sh" as the last parameter.
The result is that 'env' sets envvar "A" and runs "bad.sh" again... forever.
So when using 'env -S' on the command line - use quotes.
When using 'env -S' in a shebang line - don't use quotes (unless
when explicitly setting values that contain spaces).
I have added an explicit section about this in the documentation.
OK all this is very subtle.
Thanks for documenting and matching FreeBSD.
We might consider in future having -S process subsequent parameters
so that scripts would be portable to systems that presplit shebang lines.
Let's not over engineer for now.
Scripts wouldn't be portable when using /usr/bin/env on these systems anyway
as MacOS currently doesn't support -S.

Given -S'with quotes' is only used for debugging
outside the shebang context, and that debugging mode is
adequately documented in the info docs, let's remove the
separate -S mode from the synopsis. I've done that
and also tweaked the -S explanation a little wrt punctuation.

I also removed exdent from the info as my texinfo version
warns about it, and added static to the two new vars
as indicated by make syntax-check.

All tweaks are attached.
Post by Assaf Gordon
Lastly, this update does not yet address Kaz's suggestion
of having a placeholder for the command-line arguments.
I understand this might be useful in some cases, but I'm
not sure we want to create yet another GNU extension (compared to FreeBSD).
If we do we should probably do it in a separate patch anyway.
Post by Assaf Gordon
Are there concrete cases where the interpreter (like awk)
requires argument in specific order, and the script filename
can't be last?
#!/usr/bin/env -S lang -f ${ENV_COMMAND} -x
#!/usr/bin/env -S lang -x -f
Yes, lang may fail with -x not supported,
or if lang did support -x it would not pass it on to the script.
Though as mentioned previously one can use ${_} to
get awk to behave appropriately.

anyway for future consideration.

thank you!
Pádraig
Bernhard Voelker
2018-04-27 06:31:54 UTC
Permalink
Post by Assaf Gordon
Attached an updated patch, hopefully addressing all the issues below.
Great work, thanks!

One nit: env -v shows a confusing error diagnostic when it is
separated from the -S option on the shebang line:

$ cat xxx
#!src/env -v -S cat -n
hello

$ ./xxx
src/env: invalid option -- ' '
Try 'src/env --help' for more information.

or on the command line:

$ src/env '-v -S cat -n' <<<'a'
src/env: invalid option -- ' '
Try 'src/env --help' for more information.

Well, env doesn't know at that point that -S is coming later ... still
the error message is really confusing. Any idea?

Have a nice day,
Berny
Eric Blake
2018-04-27 13:13:00 UTC
Permalink
Post by Bernhard Voelker
Post by Assaf Gordon
Attached an updated patch, hopefully addressing all the issues below.
Great work, thanks!
One nit: env -v shows a confusing error diagnostic when it is
Well, env doesn't know at that point that -S is coming later ... still
the error message is really confusing. Any idea?
We could include ' ' (and maybe '\t') as part of the short-option
optstring accepted in getopt_long(), as an undocumented silent no-op.
That would make '-v ' behave the same as '-v'. But I'm not sure if it
is ever possible to coax getopt_long() into parsing '-v-' as accepting a
short option named '-' (since usually, the string '--' is treated as
end-of-options). Worth a try, to see if it works?
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Eric Blake
2018-04-27 14:22:21 UTC
Permalink
Post by Eric Blake
Post by Bernhard Voelker
Well, env doesn't know at that point that -S is coming later ... still
the error message is really confusing. Any idea?
We could include ' ' (and maybe '\t') as part of the short-option
optstring accepted in getopt_long(), as an undocumented silent no-op.
That would make '-v ' behave the same as '-v'. But I'm not sure if it
is ever possible to coax getopt_long() into parsing '-v-' as accepting a
short option named '-' (since usually, the string '--' is treated as
end-of-options). Worth a try, to see if it works?
I tested it, and it DOES seem to work:

Pre-patch:

$ env "$(printf -- '-0\t -0')" | grep nosuch
env: invalid option -- ' '
Try 'env --help' for more information.
$

Post-patch:
$ ./src/env "$(printf -- '-0\t -0')" | grep nosuch
$

(Admittedly, my experiment was done without the -S patch in place, but
you can figure out how to incorporate it)

diff --git i/src/env.c w/src/env.c
index bacef9b02..db2085034 100644
--- i/src/env.c
+++ w/src/env.c
@@ -96,10 +96,15 @@ main (int argc, char **argv)
initialize_exit_failure (EXIT_CANCELED);
atexit (close_stdout);

- while ((optc = getopt_long (argc, argv, "+C:iu:0", longopts, NULL))
!= -1)
+ while ((optc = getopt_long (argc, argv, "+C:iu:0 \t-", longopts,
NULL)) != -1)
{
switch (optc)
{
+ case ' ':
+ case '\t':
+ case '-':
+ /* Undocumented no-ops, to allow '-v -S' to behave like '-vS' */
+ break;
case 'i':
ignore_environment = true;
break;
@@ -128,7 +133,7 @@ main (int argc, char **argv)
}

optind = 0; /* Force GNU getopt to re-initialize. */
- while ((optc = getopt_long (argc, argv, "+C:iu:0", longopts, NULL))
!= -1)
+ while ((optc = getopt_long (argc, argv, "+C:iu:0 \t-", longopts,
NULL)) != -1)
if (optc == 'u' && unsetenv (optarg))
die (EXIT_CANCELED, errno, _("cannot unset %s"), quote (optarg));
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Eric Blake
2018-04-27 18:57:38 UTC
Permalink
Post by Assaf Gordon
Hello,
Attached an updated patch, hopefully addressing all the issues below.
Initial documentation also included (seperated into several commits to ease review).
http://housegordon.org/files//tmp/env-invocation.html
This is missing support for -P, which is one of the essential features
of FreeBSD env, per their man page:
https://www.freebsd.org/cgi/man.cgi?query=env


Probably the most common use of env is to find the correct interpreter
for a script, when the interpreter may be in different directories on
different systems. The following example will find the `perl' inter-
preter by searching through the directories specified by PATH.

#!/usr/bin/env perl

One limitation of that example is that it assumes the user's value for
PATH is set to a value which will find the interpreter you want to exe-
cute. The -P option can be used to make sure a specific list of
directo-
ries is used in the search for utility. Note that the -S option is
also
required for this example to work correctly.

#!/usr/bin/env -S -P/usr/local/bin:/usr/bin perl

The above finds `perl' only if it is in /usr/local/bin or /usr/bin.
That
could be combined with the present value of PATH, to provide more
flexi-
bility. Note that spaces are not required between the -S and -P
options:

#!/usr/bin/env -S-P/usr/local/bin:/usr/bin:${PATH} perl
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Eric Blake
2018-04-27 19:42:13 UTC
Permalink
Post by Assaf Gordon
Hello,
Attached an updated patch, hopefully addressing all the issues below.
Initial documentation also included (seperated into several commits to ease review).
Question - do we want to use 'S::' instead of 'S:' in the optstring?
Right now, your patches made -S take a mandatory argument, making:

#!/usr/bin/env -S

try to parse the script name as a string to be split (and unless the
script name has unusual characters, this results in an infloop of
treating the script name as the interpreter, for another round of trying
to exec the same command line). Although this is probably not what was
intended, and the shebang line is already suspicious for not providing
more text after -S, making the argument optional and then issuing an
error message if optarg is NULL would at least make this not an easy
infloop.

I'm also trying to think what happens if we want to support platforms
where the OS splits strings passed to shebang. (The BSD implementation
didn't have to worry quite as much about their code being run on a
different OS, like we do). Consider:

#!/usr/bin/env -S interpreter 'arg with space'

where it already sees "-S", "interpreter", "'arg", "with", "space'",
"script", "args..." as separate arguments. If we use 'S:' to
getopt_long, then "interpreter" will be subject to -S handling but
nothing else will; if we use 'S::', then none of the subsequent
arguments will be subject to -S handling (but then we have to revisit
whether a NULL optarg would be treated as an error on a shebang line
that ends in -S). But either way, it would be nice if we could
reconstruct "arg with space" as a single argument to hand to
"interpreter", rather than three separate arguments where two of them
include a lone "'".

I'm wondering if we need yet another magic environment variable for
portably marking the demarcation between the arguments to -S and the
script name, whether the script is run on a platform that hands -S a
single string, or run on a platform that splits arguments, as in:

#!/usr/bin/env -S interpreter 'arg with spaces' ${_ENV_END}

which on Linux calls "/usr/bin/env" "-S interpreter 'arg with spaces'
${ENV_END}" "script", but elsewhere calls "/usr/bin/env" "-S"
"interpreter" "'arg" "with" "spaces" "${_ENV_END}" "script". With the
magic marker in place, -S can be used as a toggle mode that says to look
if ANY later argument is the magic marker ${ENV_END}, or maybe do this
look forward only if optarg for -S contains no spaces, because if a
space is present at all, we know the kernel did not split the shebang
line. If no spaces are present in optarg, but ${_ENV_END} is present as
a later argument, then then attempt to reconstruct the same command line
as if all arguments in between had been a single string (so that quote
and escape processing is performed on the remaining arguments of the
shebang line, but not on the script name or arguments). If ${_ENV_END}
is not present, we can't make any assumptions, so we only perform string
splitting on optarg, rather than trying to reconstruct a string from a
subset of the remaining arguments.

Of course, reconstructing a single string can't tell what whitespace the
kernel ate in providing multiple arguments, so it will corrupt multiple
spaces and/or tabs down to a single space; perhaps the existing \_
escape sequence can be used to overcome the worst effects of that. We'd
probably want to document that the expansion of ${_ENV_END} is always
empty, even if someone defines that variable in the environment?

Another question: Does the BSD implementation have any way to pass empty
strings as explicit arguments? The code you posted turns:

#!/usr/bin/env -S sh -c '' echo

into "sh" "-c" "echo" "script", which did NOT preserve the empty string.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Assaf Gordon
2018-04-27 22:58:36 UTC
Permalink
Hello Eric, Bernhard,

Thank you for commenting, you raise many good point.
Below are some ideas regarding them (combining replies to the last 4 emails).
Post by Bernhard Voelker
One nit: env -v shows a confusing error diagnostic when it is
$ cat xxx
#!src/env -v -S cat -n
hello
$ ./xxx
src/env: invalid option -- ' '
Try 'src/env --help' for more information.
Agree, very confusing and unhelpful message.

For comparison, FreeBSD behaves the same:

$ cat xxx
#!/usr/bin/env -v -S cat -n
hello
$ ./xxx
env: illegal option --
usage: env [-iv] [-P utilpath] [-S string] [-u name]
[name=value ...] [utility [argument ...]]


But of course we can and should do better.
Post by Bernhard Voelker
We could include ' ' (and maybe '\t') as part of the short-option
optstring accepted in getopt_long(), as an undocumented silent no-op.
[...]
Post by Bernhard Voelker
switch (optc)
{
+ /* Undocumented no-ops, to allow '-v -S' to behave like '-vS' */
+ break;
Good solution, thanks for testing it.


I wonder - would it be better to detect this issue
and report an informative error message instead of silently accepting it?

If GNU env accepts it, we create yet another (very subtle) difference
between FreeBSD and GNU.
If we reject it and explain why, we create a better user experience,
but also promote portable scripting...
Post by Bernhard Voelker
This is missing support for -P, which is one of the essential features
I can certainly add support "-P" (I'll do it in a separate patch though).

Is "-P" (alternate path) something that is often requested?
I do see a lot of questions about passing multiple arguments with "#!/usr/bin/env",
but I haven't noticed people asking about setting per-script non-standard $PATH
(but without changing the actual $PATH).
Post by Bernhard Voelker
Question - do we want to use 'S::' instead of 'S:' in the optstring?
#!/usr/bin/env -S
try to parse the script name as a string to be split (and unless the
script name has unusual characters, this results in an infloop of
treating the script name as the interpreter, for another round of trying
to exec the same command line).
For reference, the same behaviour (infloop) happens on FreeBSD:

$ cat yyy
#!/usr/bin/env -vS

$ ./yyy 2>&1 | head
#env executing: ./yyy
#env arg[0]= './yyy'
#env executing: ./yyy
#env arg[0]= './yyy'
#env executing: ./yyy
#env arg[0]= './yyy'
#env executing: ./yyy
#env arg[0]= './yyy'
#env executing: ./yyy
#env arg[0]= './yyy'

And indeed if the file name contains recognizable escape sequences,
FreeBSD's env also processes them ('\_' is escape sequence for space):

$ cat uname\\_-l
#!/usr/bin/env -vS

$ ./uname\\_-l
#env executing: ./uname
#env arg[0]= './uname'
#env arg[1]= '-l'
env: ./uname: No such file or directory


Several more of these kind of edge cases can probably be found.
I wonder if FreeBSD has been living with them since 2005,
perhaps there's no need to over-engineer protection against them?

Otherwise, Perhaps we can detect it and provide an informative error message?
Post by Bernhard Voelker
I'm also trying to think what happens if we want to support platforms
where the OS splits strings passed to shebang. (The BSD implementation
didn't have to worry quite as much about their code being run on a
#!/usr/bin/env -S interpreter 'arg with space'
where it already sees "-S", "interpreter", "'arg", "with", "space'",
"script", "args..." as separate arguments. If we use 'S:' to
getopt_long, then "interpreter" will be subject to -S handling but
nothing else will; if we use 'S::', then none of the subsequent
arguments will be subject to -S handling (but then we have to revisit
whether a NULL optarg would be treated as an error on a shebang line
that ends in -S). But either way, it would be nice if we could
reconstruct "arg with space" as a single argument to hand to
"interpreter", rather than three separate arguments where two of them
include a lone "'".
Currently, I only found one operating system (MINIX3) which splits
shebang lines by spaces, and it ignores quotes (so quoting a space does nothing).
Linux, BSDs, HURD, Cygwin - pass everything as one parameter including spaces.
AIX, Solaris - pass only the second argument, ignores all other.
I haven't tested HP-UX (if someone with HP-UX access can test - will be much appreciated).

Technically:

$ cat showargs.c
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char*argv[])
{
for (int i=0;i<argc;++i)
printf("argv[%d] = '%s'\n",i,argv[i]);
return 0;
}

$ cat 1
#!showargs printf xx%%sxx\n a b c

On Linux, *BSDs, HURD:

$ ./1
argv[0] = '/home/miles/showargs'
argv[1] = 'printf xx%sxx\n a b c'
argv[2] = './1'

On AIX, Solaris:

$ ./1
argv[0] = '/home/agn/showargs'
argv[1] = 'printf'
argv[2] = './1'

On Minix3:

$ ./1
argv[0] = '/home/miles/showargs'
argv[1] = 'printf'
argv[2] = 'xx%sxx\n'
argv[3] = 'a'
argv[4] = 'b'
argv[5] = 'c'
argv[6] = './1'

And then on MINIX3 with quotes:

$ cat 2
#!/home/miles/showargs printf xx%sxx\n 'a b c'
$ ./2
argv[0] = '/home/miles/showargs'
argv[1] = 'printf'
argv[2] = 'xx%sxx\n'
argv[3] = ''a'
argv[4] = 'b'
argv[5] = 'c''
argv[6] = './2'


So for Linux, BSDs, HURD - no problem with splitting spaces.

For AIX, Solaris - anything after space is ignored, which is
why '\_' is an escape sequence for space.
I'll add an explicit note about this in the documentation.

For Minix3 - that's indeed an issue - but is it worth adding much complexity
to address it? I'm not sure (no offence to MINIX3 fans). Since minix3 natively
splits arguments, they have never needed the "env -S" hack anyhow.
opinions welcomed.
Post by Bernhard Voelker
I'm wondering if we need yet another magic environment variable for
portably marking the demarcation between the arguments to -S and the
script name, whether the script is run on a platform that hands -S a
#!/usr/bin/env -S interpreter 'arg with spaces' ${_ENV_END}
Given the above, is this marker still an important in practice?
(if so, I recommend using a new escape sequence, e.g. "\q" instead
of an environment variable).
Post by Bernhard Voelker
Another question: Does the BSD implementation have any way to pass empty
#!/usr/bin/env -S sh -c '' echo
into "sh" "-c" "echo" "script", which did NOT preserve the empty string.
Good catch!

Yes, FreeBSD does preserve empty strings:

$ cat www
#!/usr/bin/env -vS sh -c '' echo

$ ./www
#env executing: sh
#env arg[0]= 'sh'
#env arg[1]= '-c'
#env arg[2]= ''
#env arg[3]= 'echo'
#env arg[4]= './www'


This is a bug and I'll fix it.



regards,
- assaf
Bernhard Voelker
2018-05-01 15:47:45 UTC
Permalink
Post by Assaf Gordon
Hello Eric, Bernhard,
Thank you for commenting, you raise many good point.
Below are some ideas regarding them (combining replies to the last 4 emails).
Post by Bernhard Voelker
One nit: env -v shows a confusing error diagnostic when it is
$ cat xxx
#!src/env -v -S cat -n
hello
$ ./xxx
src/env: invalid option -- ' '
Try 'src/env --help' for more information.
Agree, very confusing and unhelpful message.
$ cat xxx
#!/usr/bin/env -v -S cat -n
hello
$ ./xxx
env: illegal option --
usage: env [-iv] [-P utilpath] [-S string] [-u name]
[name=value ...] [utility [argument ...]]
But of course we can and should do better.
Post by Bernhard Voelker
We could include ' ' (and maybe '\t') as part of the short-option
optstring accepted in getopt_long(), as an undocumented silent no-op.
[...]
Post by Bernhard Voelker
switch (optc)
{
+ /* Undocumented no-ops, to allow '-v -S' to behave like '-vS' */
+ break;
Good solution, thanks for testing it.
I wonder - would it be better to detect this issue
and report an informative error message instead of silently accepting it?
If GNU env accepts it, we create yet another (very subtle) difference
between FreeBSD and GNU.
If we reject it and explain why, we create a better user experience,
but also promote portable scripting...
Good point.
After some playing with it, I also think it's better to err with a nice
diagnostic. But even that isn't that easy. Although ignoring ' ', '\t',
and '-' is a nice solution, it seems to get more ugly to handle other
cases.
Post by Assaf Gordon
Post by Bernhard Voelker
This is missing support for -P, which is one of the essential features
I can certainly add support "-P" (I'll do it in a separate patch though).
Is "-P" (alternate path) something that is often requested?
I do see a lot of questions about passing multiple arguments with "#!/usr/bin/env",
but I haven't noticed people asking about setting per-script non-standard $PATH
(but without changing the actual $PATH).
Isn't this the regular case with specifying the PATH variable?

env PATH=/some:/path prg

Have a nice day,
Berny
Eric Blake
2018-05-01 16:40:31 UTC
Permalink
Post by Bernhard Voelker
Post by Assaf Gordon
Post by Eric Blake
This is missing support for -P, which is one of the essential features
I can certainly add support "-P" (I'll do it in a separate patch though).
Is "-P" (alternate path) something that is often requested?
It makes reasonable sense: the FreeBSD argument is that if you do

#!/usr/bin/env perl

you are at the mercy of finding perl on the user's current PATH, but if
you do:

#!/usr/bin/env -S -P /usr/bin:/opt/bin:${PATH} perl

you can favor /usr/bin/perl or /opt/bin/perl first regardless of whether
the user looks there normally, and without corrupting the user's own
choice of perl for any child processes within the script.
Post by Bernhard Voelker
Post by Assaf Gordon
I do see a lot of questions about passing multiple arguments with "#!/usr/bin/env",
but I haven't noticed people asking about setting per-script non-standard $PATH
(but without changing the actual $PATH).
Isn't this the regular case with specifying the PATH variable?
env PATH=/some:/path prg
Close, but not quite the same. -P can affect the lookup path without
also leaking that changed lookup to the script itself. You're right
that if you call 'env PATH=...', that env has to look up prg within that
specified PATH rather than the PATH in effect pre-env, but I can see
cases where you want to find the interpreter at a known location but NOT
change the PATH handed off to the final script.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Bernhard Voelker
2018-05-01 22:48:26 UTC
Permalink
Post by Eric Blake
It makes reasonable sense: the FreeBSD argument is that if you do
#!/usr/bin/env perl
you are at the mercy of finding perl on the user's current PATH, but if
#!/usr/bin/env -S -P /usr/bin:/opt/bin:${PATH} perl
you can favor /usr/bin/perl or /opt/bin/perl first regardless of whether
the user looks there normally, and without corrupting the user's own
choice of perl for any child processes within the script.
I see, thanks.

So regarding -S, I'm currently a bit worried about whether a +413 lines
change to the 185 lines env.c source is really warranted ... but I have
to admit that I don't see how -S could be supported in an easier way).

Have a nice day,
Berny
Assaf Gordon
2018-05-05 09:20:27 UTC
Permalink
Hello,

Attached updated patch, two main improvements:
1. preserves empty arguments, e.g.
env -S echo "" bar
2. prints an error message on incorrect shebang usage:

$ cat xxx
#!src/env -v -S cat -n

$ ./xxx
src/env: possible incorrect usage of env in a script
Try 'src/env --help' for more information.

Item 2 is implemented in a separate patch (number 9) to ease review.

Of course the error message could be improved, as can the detection
if the exact circumstances, but I'm not sure it's warranted.
Post by Bernhard Voelker
Post by Eric Blake
#!/usr/bin/env -S -P /usr/bin:/opt/bin:${PATH} perl
"env -Ppath" is not implemented in this patch, but if we think it's
worth while I'll add it in next patch.
Post by Bernhard Voelker
So regarding -S, I'm currently a bit worried about whether a +413 lines
change to the 185 lines env.c source is really warranted ... but I have
to admit that I don't see how -S could be supported in an easier way).
This is certainly a judgment call (whether this feature is bloat or
useful enough).

The amount of code is the same ball-park number as the BSD
implementation (mine is slightly more verbose and with more comments):
https://github.com/freebsd/freebsd/blob/master/usr.bin/env/envopts.c

Binary-wise, these are the size differences with --debug and with -S:

$ size src/env.orig src/env.debug src/env.debug.s
text data bss dec hex filename
26521 1424 448 28393 6ee9 src/env.orig
27313 1456 480 29249 7241 src/env.debug
31027 1512 480 33019 80fb src/env.debug.s



regards,
- assaf
Pádraig Brady
2018-05-05 23:28:27 UTC
Permalink
Post by Assaf Gordon
Hello,
1. preserves empty arguments, e.g.
env -S echo "" bar
$ cat xxx
#!src/env -v -S cat -n
$ ./xxx
src/env: possible incorrect usage of env in a script
Try 'src/env --help' for more information.
Item 2 is implemented in a separate patch (number 9) to ease review.
Of course the error message could be improved, as can the detection
if the exact circumstances, but I'm not sure it's warranted.
I've attached some changes on top to indicate the existing error to users
plus additional info on how to address with the -S option.
I've also done this for the case of passing options to commands.
That results in:

$ cat xxx
#!src/env -i ls -l

$ ./xxx
src/env: invalid option -- ' '
src/env: Use -[v]S to pass options in shebang lines
Try 'src/env --help' for more information.

and

$ cat xxx
#!src/env ls -l

$ ./xxx
src/env: ‘ls -l’: No such file or directory
src/env: Use -[v]S to pass options in shebang lines
Post by Assaf Gordon
Post by Eric Blake
#!/usr/bin/env -S -P /usr/bin:/opt/bin:${PATH} perl
"env -Ppath" is not implemented in this patch, but if we think it's
worth while I'll add it in next patch.
It's best done in a separate patch.

thanks!
Pádraig
Pádraig Brady
2018-06-18 00:43:09 UTC
Permalink
After contacting Assaf off list,
I'm going to squash Assaf's latest patches and the attached
to a single commit, and push for the upcoming release.

thanks,
Pádraig.

Loading...