| #
a7e8b4a5 |
| 21-Oct-2024 |
kre <kre@NetBSD.org> |
Fix processing of unknown variable expansion types.
Our shell is (was) one of the last not to do this correctly.
Expansions are supposed to happen only when the command in which they occur is being
Fix processing of unknown variable expansion types.
Our shell is (was) one of the last not to do this correctly.
Expansions are supposed to happen only when the command in which they occur is being executed, not while it is being parsed. If the expansion only happens them, errors should only be detected then.
Make it work like that (I saw after I fixed this that FreeBSD had done it, long ago, almost the same way - it is kind of an obvious thing to do).
This will allow code like
if test it is shell X then commands using shell X specific expansion ops else if it is shell Y then commands using shell Y specific expansion ops else ... fi
Previously expansion errors were detected while parsing, so if we're not shell X, and don't implement something that it does (some extension to the standard) that would have generated a parser syntax error, and the script could not be executed (despite the line with the error never being executed).
Note that this change does not handle all such possible extensions, just this one. Others are much harder.
One side effect of this change is that sh will now continue reading a variable expansion until it locates the terminating '}' (in ${var} forms) regardless of how broken it obviously is (to our shell) whereas previously it would have bailed out as soon as an oddity was spotted.
show more ...
|
| #
d47295cc |
| 03-Oct-2024 |
rillig <rillig@NetBSD.org> |
bin: fix lint warning "effectively discards 'const'"
For example: src/bin/ed/io.c(339): warning: call to 'strchr' effectively discards 'const' from argument [346]
No binary change.
|
| #
6654ff1c |
| 29-Dec-2023 |
kre <kre@NetBSD.org> |
PR bin/57773
Fix another bug reported by Jarle Fredrik Greipsland and added to PR bin/57773, which relates to calculating the length of a positional parameter which contains CTL chars -- yes, this o
PR bin/57773
Fix another bug reported by Jarle Fredrik Greipsland and added to PR bin/57773, which relates to calculating the length of a positional parameter which contains CTL chars -- yes, this one really is that specific, though it would also affect the special param $0 if it were to contain CTL chars, and its length was requested - that is fixed with the same change. And note: $0 is not affected because it looks like a positional param (it isn't, ${00} would be, but is always unset, ${0} isn't) all special parame would be affected the same way, but the only one that can ever contain a CTL char is $0 I believe. ($@ and $* were affected, but just because they're expanding the positional params ... ${#@} and ${#*} are both technically unspecified expansions - and different shells produce different results.
See the PR for the details of this one (and the previous).
Thanks for the PR.
XXX pullup to everything.
show more ...
|
| #
391d4540 |
| 25-Dec-2023 |
kre <kre@NetBSD.org> |
Correct a bizarre piece of source formatting that crept in by accident several years ago (change a space into newline tab).
NFC
|
| #
726d188a |
| 06-Mar-2023 |
kre <kre@NetBSD.org> |
Adjust tilde expansion as will be documented in the forthcoming version of the POSIX standard (Issue 8). I believe we were already compliant with what is to be required, but POSIX is now encouragin
Adjust tilde expansion as will be documented in the forthcoming version of the POSIX standard (Issue 8). I believe we were already compliant with what is to be required, but POSIX is now encouraging (and will likely require in a later version) that if a tilde expansion produces a string which ends in a '/' and the '~' that was expanded is immediately followed by a '/' in the input word, that one of those two slashes be omitted. The worst (current) example of this is when HOME=/ and we expand ~/foo - previously producing //foo which is (in POSIX) a path with implementation defined semantics, and so not what we should be generating by accident. Change that, so now if the ~ prefix expansion ends in a '/' and there is a '/' following immediately after, the resulting word contains only one of those chars (in the example just given, we will now produce /foo instead).
POSIX is also making it clear that the expansion that results from the tilde expansion is treated as quoted (not subject to pathname expansion, or field splitting, or any var/arith/command substitutions) and that if HOME="" the expansion of ~ must generate "" (not nothing). Our implementation did all of that already (though older versions used to treat an empty expansion of HOME the same as if HOME was unset - that was fixed some time ago).
The actual modification made here is probably smaller than this log entry, and without added comments, certainly is!
show more ...
|
| #
16d85571 |
| 22-Nov-2021 |
kre <kre@NetBSD.org> |
PR bin/53550
Here we go again... One more time to redo how here docs are processed (it has been a few years since the last time!)
This is actually a relatively minor change, mostly to timimg (to
PR bin/53550
Here we go again... One more time to redo how here docs are processed (it has been a few years since the last time!)
This is actually a relatively minor change, mostly to timimg (to just when things happen). Now here docs are expanded at the same time the "filename" word in a redirect is expanded, rather than later when the heredoc was being sent to its process. This actually makes things more consistent - but does break one of the ATF tests which was testing that we were (effectively) internally inconsistent in this area.
Not all shells agree on the context in which redirection expansions should happen, some make any side effects visible to the parent shell (the majority do) others do the redirection expansions in a subshell so any side effcts are lost. We used to have a foot in each camp, with the majority for everything but here docs, and the minority for here docs. Now we're all the way with LBJ ... (or something like that).
show more ...
|
| #
b8bee70d |
| 10-Nov-2021 |
kre <kre@NetBSD.org> |
DEBUG mode changes only. NFC (NC) for any normally compiled shell.
Mostly adding DEBUG mode tracing (when appropriate verbose tracing is enabled generally) whenever a shell (including sushell) pro
DEBUG mode changes only. NFC (NC) for any normally compiled shell.
Mostly adding DEBUG mode tracing (when appropriate verbose tracing is enabled generally) whenever a shell (including sushell) process exits, so shells that the tracing should indicate why ehslls that vanish did that.
Note for future investigators: if the relevant tracing is enabled, and a (sub-)shell still simply seems to have vanished without trace, the likely cause is that it was killed by a signal - and of those, the most common that occurs is SIGPIPE.
show more ...
|
| #
4cb87529 |
| 10-Sep-2021 |
rillig <rillig@NetBSD.org> |
bin: remove unnecessary lint comment CONSTCOND
Since 2021-01-31, lint no longer warns about 'do ... while (0)'.
No functional change.
|
| #
b95d46c2 |
| 01-Aug-2020 |
kre <kre@NetBSD.org> |
Remove a redundant set of parentheses that were added (along with a extra && or || or something ... forgotten now) as part a failed attempt to fix an earlier bug (later fixed a better way) - when the
Remove a redundant set of parentheses that were added (along with a extra && or || or something ... forgotten now) as part a failed attempt to fix an earlier bug (later fixed a better way) - when the extra test (never committed) was removed, the now-redundant parentheses got forgotten...
NFC.
show more ...
|
| #
c69ada4c |
| 13-Feb-2020 |
kre <kre@NetBSD.org> |
When expanding a here-doc (NXHERE - the type with an unquoted end delim) the output will not be further processed (at all) so there is no need to escape magic chars in the output, and doing so leaves
When expanding a here-doc (NXHERE - the type with an unquoted end delim) the output will not be further processed (at all) so there is no need to escape magic chars in the output, and doing so leaves stray CTLESC chars in the here doc text. Not good. So don't do that...
To save a strlen() of the result, to determine the size of the here doc, make rmescapes() return the length of the resulting string (this isn't needed for other uses, so didn't happen previously).
Reported on current-users@ (2020-02-06) by Jun Ebihara
XXX pullup -9
show more ...
|
| #
d08d589d |
| 14-Oct-2019 |
christos <christos@NetBSD.org> |
remove masking and cast (requested by kre@)
|
| #
7a3a738c |
| 13-Oct-2019 |
christos <christos@NetBSD.org> |
prevent sign extension from making expression always false.
|
| #
e291c05e |
| 08-Oct-2019 |
kre <kre@NetBSD.org> |
Remove a (completely harmless) duplicate assignment introduced in a code merge from FreeBSD in 2017. NFC.
Pointed out by Roland Illig.
|
| #
7dca2b7e |
| 08-Oct-2019 |
kre <kre@NetBSD.org> |
Open code the validity test & copy of the character class name in a bracket expression in a pattern (ie: [[:THISNAME:]]). Previously the code used strspn() to look for invalid chars in the name, an
Open code the validity test & copy of the character class name in a bracket expression in a pattern (ie: [[:THISNAME:]]). Previously the code used strspn() to look for invalid chars in the name, and then memcpy(), now we do the test and copy a character at a time. This might, or might not, be faster, but it now correctly handles \ quoted characters in the name (' and " quoting were already dealt with, \ was too in an earlier version, but when the \ handling changes were made, this piece of code broke).
Not exactly a vital bug fix (who writes [[:\alpha:]] or similar?) but it should work correctly regardless of how obscure the usage is.
Problem noted by Harald van Dijk
XXX pullup -9
show more ...
|
| #
265b0617 |
| 10-Apr-2019 |
kre <kre@NetBSD.org> |
PR bin/54112
Fix handling of "$@" (that is, double quoted dollar at), when it appears in a string which will be subject to field splitting.
Eg: ${0+"$@" }
More common usages, like the simple "$@"
PR bin/54112
Fix handling of "$@" (that is, double quoted dollar at), when it appears in a string which will be subject to field splitting.
Eg: ${0+"$@" }
More common usages, like the simple "$@" or ${0+"$@"} end up being entirely quoted, so no field splitting happens, and the problem was avoided.
See the PR for more details.
This ends up making a bunch of old hack code (and some that was relatively new) vanish - for now it is just #if 0'd or commented out. Cleanups of that stuff will happen later.
That some of the worst $@ hacks are now gone does not mean that processing of "$@" does not retain a very special place in every hackers heart. RIP extreme ugliness - long live the merely ordinary ugly.
Added a new bin/sh ATF test case to verify that all this remains fixed.
show more ...
|
| #
256d645d |
| 27-Feb-2019 |
kre <kre@NetBSD.org> |
Finish the fixes from Feb 4 for handling of random data that matches the internal CTL* chars.
The earlier fixes handled CTL* char values in var expansions, but not in various other places they can o
Finish the fixes from Feb 4 for handling of random data that matches the internal CTL* chars.
The earlier fixes handled CTL* char values in var expansions, but not in various other places they can occur (positional parameters, $@ $* -- even potentially $0 and ~ expansions, as well as byte strings generated from a \u in a $'' string).
These should all be correctly handled now. There is a new ISCTL() macro to make the test, rather than using the old BASESYNTAX[c]==CCTL form (which us still a viable alternative) as the new way allows compiler optimisations, and less mem references, so it should be smaller and faster.
Also, be sure in all cases to remove any CTLESC (or other) CTL* chars from all strings before they are made available for any external use (there was one case missed - which didn't matter when we weren't bothering to escape the CTL* chars at all.)
XXX pullup-8 (will need to be via a patch) along with the Feb 4 fixes.
show more ...
|
| #
58e34de6 |
| 04-Feb-2019 |
kre <kre@NetBSD.org> |
Fix an old bug (very old) that was made worse in 1.128 (the "${1+$@}" fixes) where a variable containing a CTL char (the only possibility used to be CTLESC (0x81)) would lose that character if the va
Fix an old bug (very old) that was made worse in 1.128 (the "${1+$@}" fixes) where a variable containing a CTL char (the only possibility used to be CTLESC (0x81)) would lose that character if the variable was expanded when "set -f" (noglob) was in effect.
1.128 made this worse by adding more 0x8z values (a couple more) which would see the same behaviour, and one of those was noticed by Martijn Dekker.
The reasoning was that when noglob is on, when a var is expanded, there are no magic chars, so (apparently) no need to escape anything. Hence nothing was escaped .. including any CTL chars that happened to be present. When we later rmescapes() the CTL chars that we expect might occur are summarily removed - even if they weren't really CTL chars, but just data masquerading.
We must *always* escape any CTL char clones that are in the var value, no matter what other conditions apply, and what we expect to happen next.
While here, fix rmescapes() (and its $(()) clone, rmescapes_nl()) to be more robust, less likely to forget to delete anything (which was not the issue here, just the reverse) and in a DEBUG shell, have the shell abort() if it encounters something in rmescapes() it is not anticipating, so the code can be made to handle it, or if it should not happen, we can find out why it did.
XXX pullup -8 (but will need to be via patch, code is quite different).
show more ...
|
| #
c9f333ad |
| 03-Dec-2018 |
kre <kre@NetBSD.org> |
Yet another foray into the mysterious world of $@ -- this time to fix the (unusual) idiom "${1+$@}" (the quotes are part of it). This seems to have broken about 5 or 6 years ago (somewhere between -
Yet another foray into the mysterious world of $@ -- this time to fix the (unusual) idiom "${1+$@}" (the quotes are part of it). This seems to have broken about 5 or 6 years ago (somewhere between -6 and -7), I believe.
Note this is not the same as "$@" and also not the same as ${1+"$@"} (much more common idioms) which both worked.
Also attempt to deal with "" more correctly, especially when it appears adjacent to "$@" (or one of the similar constructs.)
This stuff is still all as ugly and hackish (and fragile) as is possible to imagine, but in an effort to allow some of the weirdness to eventually go away, the parser output has been made more regular and all quoted (parts of) words always now start with CTLQUOTEMARK and end with CTLQUOTEEND regardless of where the quotes appear.
This allows us to tell the difference between """$@" and "$@" which was impossible before - yet they are required to generate different output when there are no args (when "$@" simply vanishes).
Needless to say that change had ramifications all over the place. To simplify any similar change in the future, there are some new macros that can generally be used to detect the "noise" data when processing words, rather than open coding that every time (which meant that there would *always* be one which missed getting updated...)
Several other bugs (of my making, and older ones) are also fixed.
The aim is that (aside from anything that is detecting the cases that were broken before - which were all unlikely uses of sh syntax) these changes should have no external visible impact.
Sure...
show more ...
|
| #
df073671 |
| 18-Nov-2018 |
kre <kre@NetBSD.org> |
Rationalise (slightly) the way that expansions are processed to hide meta-characters in the result when the expansion was in (double) quotes, and so should not be further processed.
Most of this has
Rationalise (slightly) the way that expansions are processed to hide meta-characters in the result when the expansion was in (double) quotes, and so should not be further processed.
Most of this has been OK for a long while, but \ needs hiding as well, which complicates things, as \ cannot simply be hidden in the syntax tables as one of the group of random special characters.
This was fixed earlier for simple variable expansions, but every variety has its own code path ($var uses different code than $n which is different than $(...), which is different again from ~ expansions, and also from what $'...' produces).
This could be fixed by moving them all to a common code path, but that's harder than it seems. The form in which the data is made available differs, so one common routine would need a whole bunch of different "get the next char or indicate end" methods - probably via passing in an accessor function. That's all a lot of churn, and would probably slow the shell.
Instead, just make macros for doing the standard tests, and use those instead of open coding (differently) each time. This way some of the code paths don't end up forgetting to handle '\' (which is different than all the others).
This removes one optimisation ... when no escaping is needed (like just $var (unquoted) where magic chars (think '*') in the value are intended to remain magic), the code avoided doing two tests for each char ("do we need escapes" and "is this char one that needs escaping") by choosing two different syntax tables (choice made outside the loop) - one of which never returns the magic "needs escaping" result, and the other does when appropriate, and then just avoiding the "do we need escapes" test for each character processed. Then when '\' was fixed, there needed to be another test for it, as it cannot (for other reasons) be the same as all the others for which "this char need escaping" is true. So that added a 2nd test for each char... Not all the code paths were updated. Hence the bugs...
nb: this is all rarely seen in the wild, so it is no big surprised that no-one ever noticed.
Now the "use two different syntax tables" is gone (the two returned the same for '\' which is why '\' needed special processing) - and in order to avoid two tests for each char (plus the \ test) we duplicate the loops, one of which tests each char to see if it needs an escape, the 2nd just copies them. This should be faster in the "no escapes" code path (though that is not the point) and perhaps also in the "escapes needed" path (no indirect reference to the syntax table - though that would probably be in a register) but makes the code slightly bigger. For /bin/sh the text segment (on amd64) has grown by 48 bytes. But it still uses the same number of 512 byte pages (and hence also any bigger page size). The resulting file size (/bin/sh) is identical before and after. So is /rescue/sh (or /rescue/anything-else).
show more ...
|
| #
14482abc |
| 22-Jul-2018 |
kre <kre@NetBSD.org> |
Part 2 of pattern matching (glob etc) fixes.
Attempt to correctly deal with \ (both when it is a literal, in appropriate cases, and when it appears as CTLESC when it was detected as a quoting charac
Part 2 of pattern matching (glob etc) fixes.
Attempt to correctly deal with \ (both when it is a literal, in appropriate cases, and when it appears as CTLESC when it was detected as a quoting character during parsing).
In a pattern, in sh, no quoted character can ever be anything other than a literal character. This is quite different than regular expressions, and even different than other uses of glob matching, where shell quoting is not an issue.
In something like
ls ?\*.c
the ? is a meta-character, the * is a literal (it was quoted). This is nothing new, sh has handled that properly for ever.
But the same happens with VAR='?\*.c' and ls $VAR
which has not always been handled correctly. Of course, in
ls "$VAR"
nothing in VAR is a meta-character (the entire expansion is quoted) so even the '\' must match literally (or more accurately, no matching happens - VAR simply contains an "unusual" filename). But if it had been
ls *"$VAR"
then we would be looking for filenames that end with the literal 5 characters that make up $VAR.
The same kinds of things are requires of matching patterns in case statements, and sub-strings with the % and # operators in variable expansions.
While here, the final remnant of the ancient !! pattern matching hack has been removed (the code that actually implemented it was long gone, but one small piece remained, not doing any real harm, but potentially wasting time - if someone gave a pattern which would once have invoked that hack.)
show more ...
|
| #
d211c89f |
| 22-Jul-2018 |
kre <kre@NetBSD.org> |
NFC: Whitespace cleanups
|
| #
bcacfd9a |
| 22-Jul-2018 |
kre <kre@NetBSD.org> |
DEBUG mode only change (ie: no effect to any normal shell).
Add tracing of pattern matching (aid in debugging various issues.)
|
| #
c83568a7 |
| 20-Jul-2018 |
kre <kre@NetBSD.org> |
First pass at fixing some of the more arcane pattern matching possibilities that we do not currently handle all that well.
This mostly means (for now) making sure that quoted pattern magic character
First pass at fixing some of the more arcane pattern matching possibilities that we do not currently handle all that well.
This mostly means (for now) making sure that quoted pattern magic characters (as well as quoted sh syntax magic chars) are properly marked, so they remain known as being quoted, and do not turn into pattern magic. Also, make sure that an unquoted \ in a pattern always quotes whatever comes next (which, unlike in regular expressions, includes inside [] matches),
show more ...
|
| #
b81009ce |
| 22-Jun-2018 |
kre <kre@NetBSD.org> |
When processing character classes ([:xxx:] inside []), treat a class name that is longer than we can handle the same way we treat an unknown class name (as a valid char class which contains nothing,
When processing character classes ([:xxx:] inside []), treat a class name that is longer than we can handle the same way we treat an unknown class name (as a valid char class which contains nothing, so never matches). Previously a "too long" class name invalidated the class, so [:very-long-name:] would match any of '[' ':' 'v' ... (note: "very-long-name" is not long enough to trigger this, but you get the idea!)
However, the name itself has a restricted syntax ([[:***:]] is not a character class, it is a match for one of a '[' ':' or '*', followed by a ']') which we did not implement - check the syntax of the name before treating it as a character class (but we do add '_' to alphanumerics as legal class name characters).
show more ...
|
| #
829cc62a |
| 22-Jun-2018 |
kre <kre@NetBSD.org> |
When matching a char class ([[:name:]]) in a pattern (for filename expansion, case patterrns, etc) do not force '[' to be a member of every class.
Before this fix, try: case [ in [[:alpha:]]) echo
When matching a char class ([[:name:]]) in a pattern (for filename expansion, case patterrns, etc) do not force '[' to be a member of every class.
Before this fix, try: case [ in [[:alpha:]]) echo Huh\?;; esac
XXX pullup-8 (Perhaps -7 as well, though that shell version has much more relevant bugs than this one.) This bug is not in -6 as that has no charclass support.
show more ...
|