History log of /netbsd-src/bin/sh/expand.c (Results 1 – 25 of 146)
Revision Date Author Comments
# a7e8b4a5 21-Oct-2024 kre <kre@NetBSD.org>

Fix processing of unknown variable expansion types.

Our shell is (was) one of the last not to do this correctly.

Expansions are supposed to happen only when the command in which
they occur is being

Fix processing of unknown variable expansion types.

Our shell is (was) one of the last not to do this correctly.

Expansions are supposed to happen only when the command in which
they occur is being executed, not while it is being parsed.
If the expansion only happens them, errors should only be
detected then.

Make it work like that (I saw after I fixed this that FreeBSD
had done it, long ago, almost the same way - it is kind of an
obvious thing to do).

This will allow code like

if test it is shell X
then
commands using shell X specific expansion ops
else if it is shell Y
then
commands using shell Y specific expansion ops
else ...
fi

Previously expansion errors were detected while parsing, so
if we're not shell X, and don't implement something that it
does (some extension to the standard) that would have generated
a parser syntax error, and the script could not be executed
(despite the line with the error never being executed).

Note that this change does not handle all such possible
extensions, just this one. Others are much harder.

One side effect of this change is that sh will now continue
reading a variable expansion until it locates the terminating
'}' (in ${var} forms) regardless of how broken it obviously
is (to our shell) whereas previously it would have bailed out
as soon as an oddity was spotted.

show more ...


# d47295cc 03-Oct-2024 rillig <rillig@NetBSD.org>

bin: fix lint warning "effectively discards 'const'"

For example: src/bin/ed/io.c(339): warning: call to 'strchr' effectively
discards 'const' from argument [346]

No binary change.


# 6654ff1c 29-Dec-2023 kre <kre@NetBSD.org>

PR bin/57773

Fix another bug reported by Jarle Fredrik Greipsland and added
to PR bin/57773, which relates to calculating the length of a
positional parameter which contains CTL chars -- yes, this o

PR bin/57773

Fix another bug reported by Jarle Fredrik Greipsland and added
to PR bin/57773, which relates to calculating the length of a
positional parameter which contains CTL chars -- yes, this one
really is that specific, though it would also affect the special
param $0 if it were to contain CTL chars, and its length was
requested - that is fixed with the same change. And note: $0
is not affected because it looks like a positional param (it
isn't, ${00} would be, but is always unset, ${0} isn't) all
special parame would be affected the same way, but the only one
that can ever contain a CTL char is $0 I believe. ($@ and $*
were affected, but just because they're expanding the positional
params ... ${#@} and ${#*} are both technically unspecified
expansions - and different shells produce different results.

See the PR for the details of this one (and the previous).

Thanks for the PR.

XXX pullup to everything.

show more ...


# 391d4540 25-Dec-2023 kre <kre@NetBSD.org>

Correct a bizarre piece of source formatting that crept in by
accident several years ago (change a space into newline tab).

NFC


# 726d188a 06-Mar-2023 kre <kre@NetBSD.org>

Adjust tilde expansion as will be documented in the forthcoming
version of the POSIX standard (Issue 8). I believe we were already
compliant with what is to be required, but POSIX is now encouragin

Adjust tilde expansion as will be documented in the forthcoming
version of the POSIX standard (Issue 8). I believe we were already
compliant with what is to be required, but POSIX is now encouraging
(and will likely require in a later version) that if a tilde expansion
produces a string which ends in a '/' and the '~' that was expanded
is immediately followed by a '/' in the input word, that one of those
two slashes be omitted. The worst (current) example of this is
when HOME=/ and we expand ~/foo - previously producing //foo which is
(in POSIX) a path with implementation defined semantics, and so not
what we should be generating by accident. Change that, so now if
the ~ prefix expansion ends in a '/' and there is a '/' following
immediately after, the resulting word contains only one of those
chars (in the example just given, we will now produce /foo instead).

POSIX is also making it clear that the expansion that results from
the tilde expansion is treated as quoted (not subject to pathname
expansion, or field splitting, or any var/arith/command substitutions)
and that if HOME="" the expansion of ~ must generate "" (not nothing).
Our implementation did all of that already (though older versions
used to treat an empty expansion of HOME the same as if HOME was
unset - that was fixed some time ago).

The actual modification made here is probably smaller than this log entry,
and without added comments, certainly is!

show more ...


# 16d85571 22-Nov-2021 kre <kre@NetBSD.org>

PR bin/53550

Here we go again... One more time to redo how here docs are
processed (it has been a few years since the last time!)

This is actually a relatively minor change, mostly to timimg
(to

PR bin/53550

Here we go again... One more time to redo how here docs are
processed (it has been a few years since the last time!)

This is actually a relatively minor change, mostly to timimg
(to just when things happen). Now here docs are expanded at the
same time the "filename" word in a redirect is expanded, rather than
later when the heredoc was being sent to its process. This actually
makes things more consistent - but does break one of the ATF tests
which was testing that we were (effectively) internally inconsistent
in this area.

Not all shells agree on the context in which redirection expansions
should happen, some make any side effects visible to the parent shell
(the majority do) others do the redirection expansions in a subshell
so any side effcts are lost. We used to have a foot in each camp,
with the majority for everything but here docs, and the minority for
here docs. Now we're all the way with LBJ ... (or something like that).

show more ...


# b8bee70d 10-Nov-2021 kre <kre@NetBSD.org>

DEBUG mode changes only. NFC (NC) for any normally compiled shell.

Mostly adding DEBUG mode tracing (when appropriate verbose tracing
is enabled generally) whenever a shell (including sushell) pro

DEBUG mode changes only. NFC (NC) for any normally compiled shell.

Mostly adding DEBUG mode tracing (when appropriate verbose tracing
is enabled generally) whenever a shell (including sushell) process
exits, so shells that the tracing should indicate why ehslls that
vanish did that.

Note for future investigators: if the relevant tracing is enabled,
and a (sub-)shell still simply seems to have vanished without trace,
the likely cause is that it was killed by a signal - and of those,
the most common that occurs is SIGPIPE.

show more ...


# 4cb87529 10-Sep-2021 rillig <rillig@NetBSD.org>

bin: remove unnecessary lint comment CONSTCOND

Since 2021-01-31, lint no longer warns about 'do ... while (0)'.

No functional change.


# b95d46c2 01-Aug-2020 kre <kre@NetBSD.org>

Remove a redundant set of parentheses that were added (along with a
extra && or || or something ... forgotten now) as part a failed attempt
to fix an earlier bug (later fixed a better way) - when the

Remove a redundant set of parentheses that were added (along with a
extra && or || or something ... forgotten now) as part a failed attempt
to fix an earlier bug (later fixed a better way) - when the extra
test (never committed) was removed, the now-redundant parentheses got
forgotten...

NFC.

show more ...


# c69ada4c 13-Feb-2020 kre <kre@NetBSD.org>

When expanding a here-doc (NXHERE - the type with an unquoted end delim)
the output will not be further processed (at all) so there is no need
to escape magic chars in the output, and doing so leaves

When expanding a here-doc (NXHERE - the type with an unquoted end delim)
the output will not be further processed (at all) so there is no need
to escape magic chars in the output, and doing so leaves stray CTLESC
chars in the here doc text. Not good. So don't do that...

To save a strlen() of the result, to determine the size of the here doc,
make rmescapes() return the length of the resulting string (this isn't
needed for other uses, so didn't happen previously).

Reported on current-users@ (2020-02-06) by Jun Ebihara

XXX pullup -9

show more ...


# d08d589d 14-Oct-2019 christos <christos@NetBSD.org>

remove masking and cast (requested by kre@)


# 7a3a738c 13-Oct-2019 christos <christos@NetBSD.org>

prevent sign extension from making expression always false.


# e291c05e 08-Oct-2019 kre <kre@NetBSD.org>

Remove a (completely harmless) duplicate assignment introduced in a
code merge from FreeBSD in 2017. NFC.

Pointed out by Roland Illig.


# 7dca2b7e 08-Oct-2019 kre <kre@NetBSD.org>

Open code the validity test & copy of the character class name in
a bracket expression in a pattern (ie: [[:THISNAME:]]). Previously
the code used strspn() to look for invalid chars in the name, an

Open code the validity test & copy of the character class name in
a bracket expression in a pattern (ie: [[:THISNAME:]]). Previously
the code used strspn() to look for invalid chars in the name, and
then memcpy(), now we do the test and copy a character at a time.
This might, or might not, be faster, but it now correctly handles
\ quoted characters in the name (' and " quoting were already
dealt with, \ was too in an earlier version, but when the \ handling
changes were made, this piece of code broke).

Not exactly a vital bug fix (who writes [[:\alpha:]] or similar?)
but it should work correctly regardless of how obscure the usage is.

Problem noted by Harald van Dijk

XXX pullup -9

show more ...


# 265b0617 10-Apr-2019 kre <kre@NetBSD.org>

PR bin/54112

Fix handling of "$@" (that is, double quoted dollar at), when it
appears in a string which will be subject to field splitting.

Eg:
${0+"$@" }

More common usages, like the simple "$@"

PR bin/54112

Fix handling of "$@" (that is, double quoted dollar at), when it
appears in a string which will be subject to field splitting.

Eg:
${0+"$@" }

More common usages, like the simple "$@" or ${0+"$@"} end up
being entirely quoted, so no field splitting happens, and the
problem was avoided.

See the PR for more details.

This ends up making a bunch of old hack code (and some that was
relatively new) vanish - for now it is just #if 0'd or commented out.
Cleanups of that stuff will happen later.

That some of the worst $@ hacks are now gone does not mean that processing
of "$@" does not retain a very special place in every hackers heart.
RIP extreme ugliness - long live the merely ordinary ugly.

Added a new bin/sh ATF test case to verify that all this remains fixed.

show more ...


# 256d645d 27-Feb-2019 kre <kre@NetBSD.org>

Finish the fixes from Feb 4 for handling of random data that
matches the internal CTL* chars.

The earlier fixes handled CTL* char values in var expansions,
but not in various other places they can o

Finish the fixes from Feb 4 for handling of random data that
matches the internal CTL* chars.

The earlier fixes handled CTL* char values in var expansions,
but not in various other places they can occur (positional
parameters, $@ $* -- even potentially $0 and ~ expansions,
as well as byte strings generated from a \u in a $'' string).

These should all be correctly handled now. There is a new
ISCTL() macro to make the test, rather than using the old
BASESYNTAX[c]==CCTL form (which us still a viable alternative)
as the new way allows compiler optimisations, and less mem
references, so it should be smaller and faster.

Also, be sure in all cases to remove any CTLESC (or other)
CTL* chars from all strings before they are made available
for any external use (there was one case missed - which didn't
matter when we weren't bothering to escape the CTL* chars at
all.)

XXX pullup-8 (will need to be via a patch) along with the Feb 4 fixes.

show more ...


# 58e34de6 04-Feb-2019 kre <kre@NetBSD.org>

Fix an old bug (very old) that was made worse in 1.128 (the "${1+$@}"
fixes) where a variable containing a CTL char (the only possibility used
to be CTLESC (0x81)) would lose that character if the va

Fix an old bug (very old) that was made worse in 1.128 (the "${1+$@}"
fixes) where a variable containing a CTL char (the only possibility used
to be CTLESC (0x81)) would lose that character if the variable was expanded
when "set -f" (noglob) was in effect.

1.128 made this worse by adding more 0x8z values (a couple more) which would
see the same behaviour, and one of those was noticed by Martijn Dekker.

The reasoning was that when noglob is on, when a var is expanded, there are
no magic chars, so (apparently) no need to escape anything. Hence nothing
was escaped .. including any CTL chars that happened to be present. When
we later rmescapes() the CTL chars that we expect might occur are summarily
removed - even if they weren't really CTL chars, but just data masquerading.

We must *always* escape any CTL char clones that are in the var value,
no matter what other conditions apply, and what we expect to happen next.

While here, fix rmescapes() (and its $(()) clone, rmescapes_nl()) to
be more robust, less likely to forget to delete anything (which was
not the issue here, just the reverse) and in a DEBUG shell, have the
shell abort() if it encounters something in rmescapes() it is not
anticipating, so the code can be made to handle it, or if it should
not happen, we can find out why it did.

XXX pullup -8 (but will need to be via patch, code is quite different).

show more ...


# c9f333ad 03-Dec-2018 kre <kre@NetBSD.org>

Yet another foray into the mysterious world of $@ -- this time
to fix the (unusual) idiom "${1+$@}" (the quotes are part of it).
This seems to have broken about 5 or 6 years ago (somewhere
between -

Yet another foray into the mysterious world of $@ -- this time
to fix the (unusual) idiom "${1+$@}" (the quotes are part of it).
This seems to have broken about 5 or 6 years ago (somewhere
between -6 and -7), I believe.

Note this is not the same as "$@" and also not the same as ${1+"$@"}
(much more common idioms) which both worked.

Also attempt to deal with "" more correctly, especially when it
appears adjacent to "$@" (or one of the similar constructs.)

This stuff is still all as ugly and hackish (and fragile) as is
possible to imagine, but in an effort to allow some of the weirdness
to eventually go away, the parser output has been made more
regular and all quoted (parts of) words always now start with
CTLQUOTEMARK and end with CTLQUOTEEND regardless of where the
quotes appear.

This allows us to tell the difference between """$@" and "$@"
which was impossible before - yet they are required to generate
different output when there are no args (when "$@" simply vanishes).

Needless to say that change had ramifications all over the place.
To simplify any similar change in the future, there are some new
macros that can generally be used to detect the "noise" data when
processing words, rather than open coding that every time (which
meant that there would *always* be one which missed getting
updated...)

Several other bugs (of my making, and older ones) are also fixed.

The aim is that (aside from anything that is detecting the cases
that were broken before - which were all unlikely uses of sh
syntax) these changes should have no external visible impact.

Sure...

show more ...


# df073671 18-Nov-2018 kre <kre@NetBSD.org>

Rationalise (slightly) the way that expansions are processed
to hide meta-characters in the result when the expansion was
in (double) quotes, and so should not be further processed.

Most of this has

Rationalise (slightly) the way that expansions are processed
to hide meta-characters in the result when the expansion was
in (double) quotes, and so should not be further processed.

Most of this has been OK for a long while, but \ needs hiding
as well, which complicates things, as \ cannot simply be hidden
in the syntax tables as one of the group of random special characters.

This was fixed earlier for simple variable expansions, but
every variety has its own code path ($var uses different code
than $n which is different than $(...), which is different
again from ~ expansions, and also from what $'...' produces).

This could be fixed by moving them all to a common code path,
but that's harder than it seems. The form in which the data
is made available differs, so one common routine would need
a whole bunch of different "get the next char or indicate end"
methods - probably via passing in an accessor function.
That's all a lot of churn, and would probably slow the shell.

Instead, just make macros for doing the standard tests, and
use those instead of open coding (differently) each time.
This way some of the code paths don't end up forgetting to
handle '\' (which is different than all the others).

This removes one optimisation ... when no escaping is needed
(like just $var (unquoted) where magic chars (think '*') in
the value are intended to remain magic), the code avoided doing
two tests for each char ("do we need escapes" and "is this char
one that needs escaping") by choosing two different syntax
tables (choice made outside the loop) - one of which never
returns the magic "needs escaping" result, and the other does
when appropriate, and then just avoiding the "do we need escapes"
test for each character processed. Then when '\' was fixed,
there needed to be another test for it, as it cannot (for other
reasons) be the same as all the others for which "this char
need escaping" is true. So that added a 2nd test for each char...
Not all the code paths were updated. Hence the bugs...

nb: this is all rarely seen in the wild, so it is no big
surprised that no-one ever noticed.

Now the "use two different syntax tables" is gone (the two
returned the same for '\' which is why '\' needed special
processing) - and in order to avoid two tests for each
char (plus the \ test) we duplicate the loops, one of which
tests each char to see if it needs an escape, the 2nd just
copies them. This should be faster in the "no escapes"
code path (though that is not the point) and perhaps also
in the "escapes needed" path (no indirect reference to
the syntax table - though that would probably be in a
register) but makes the code slightly bigger. For /bin/sh
the text segment (on amd64) has grown by 48 bytes. But
it still uses the same number of 512 byte pages (and hence
also any bigger page size). The resulting file size
(/bin/sh) is identical before and after. So is /rescue/sh
(or /rescue/anything-else).

show more ...


# 14482abc 22-Jul-2018 kre <kre@NetBSD.org>

Part 2 of pattern matching (glob etc) fixes.

Attempt to correctly deal with \ (both when it is a literal,
in appropriate cases, and when it appears as CTLESC when it was
detected as a quoting charac

Part 2 of pattern matching (glob etc) fixes.

Attempt to correctly deal with \ (both when it is a literal,
in appropriate cases, and when it appears as CTLESC when it was
detected as a quoting character during parsing).

In a pattern, in sh, no quoted character can ever be anything other
than a literal character. This is quite different than regular
expressions, and even different than other uses of glob matching,
where shell quoting is not an issue.

In something like

ls ?\*.c

the ? is a meta-character, the * is a literal (it was quoted). This
is nothing new, sh has handled that properly for ever.

But the same happens with
VAR='?\*.c'
and
ls $VAR

which has not always been handled correctly. Of course, in

ls "$VAR"

nothing in VAR is a meta-character (the entire expansion is quoted)
so even the '\' must match literally (or more accurately, no matching
happens - VAR simply contains an "unusual" filename). But if it had
been

ls *"$VAR"

then we would be looking for filenames that end with the literal 5
characters that make up $VAR.

The same kinds of things are requires of matching patterns in case
statements, and sub-strings with the % and # operators in variable
expansions.

While here, the final remnant of the ancient !! pattern matching
hack has been removed (the code that actually implemented it was
long gone, but one small piece remained, not doing any real harm,
but potentially wasting time - if someone gave a pattern which would
once have invoked that hack.)

show more ...


# d211c89f 22-Jul-2018 kre <kre@NetBSD.org>

NFC: Whitespace cleanups


# bcacfd9a 22-Jul-2018 kre <kre@NetBSD.org>

DEBUG mode only change (ie: no effect to any normal shell).

Add tracing of pattern matching (aid in debugging various issues.)


# c83568a7 20-Jul-2018 kre <kre@NetBSD.org>

First pass at fixing some of the more arcane pattern matching
possibilities that we do not currently handle all that well.

This mostly means (for now) making sure that quoted pattern
magic character

First pass at fixing some of the more arcane pattern matching
possibilities that we do not currently handle all that well.

This mostly means (for now) making sure that quoted pattern
magic characters (as well as quoted sh syntax magic chars)
are properly marked, so they remain known as being quoted,
and do not turn into pattern magic. Also, make sure that an
unquoted \ in a pattern always quotes whatever comes next
(which, unlike in regular expressions, includes inside []
matches),

show more ...


# b81009ce 22-Jun-2018 kre <kre@NetBSD.org>

When processing character classes ([:xxx:] inside []), treat a class name
that is longer than we can handle the same way we treat an unknown
class name (as a valid char class which contains nothing,

When processing character classes ([:xxx:] inside []), treat a class name
that is longer than we can handle the same way we treat an unknown
class name (as a valid char class which contains nothing, so never
matches). Previously a "too long" class name invalidated the
class, so [:very-long-name:] would match any of '[' ':' 'v' ...
(note: "very-long-name" is not long enough to trigger this, but you
get the idea!)

However, the name itself has a restricted syntax ([[:***:]] is not a
character class, it is a match for one of a '[' ':' or '*', followed by
a ']') which we did not implement - check the syntax of the name before
treating it as a character class (but we do add '_' to alphanumerics
as legal class name characters).

show more ...


# 829cc62a 22-Jun-2018 kre <kre@NetBSD.org>

When matching a char class ([[:name:]]) in a pattern (for filename
expansion, case patterrns, etc) do not force '[' to be a member of
every class.

Before this fix, try:
case [ in [[:alpha:]]) echo

When matching a char class ([[:name:]]) in a pattern (for filename
expansion, case patterrns, etc) do not force '[' to be a member of
every class.

Before this fix, try:
case [ in [[:alpha:]]) echo Huh\?;; esac

XXX pullup-8 (Perhaps -7 as well, though that shell version has
much more relevant bugs than this one.) This bug is not in -6 as
that has no charclass support.

show more ...


123456