expand.c - OpenGrok history log for /netbsd-src/bin/sh/expand.c

Revision	Date	Author	Comments
# a7e8b4a5	21-Oct-2024	kre <kre@NetBSD.org>	Fix processing of unknown variable expansion types. Our shell is (was) one of the last not to do this correctly. Expansions are supposed to happen only when the command in which they occur is being Fix processing of unknown variable expansion types. Our shell is (was) one of the last not to do this correctly. Expansions are supposed to happen only when the command in which they occur is being executed, not while it is being parsed. If the expansion only happens them, errors should only be detected then. Make it work like that (I saw after I fixed this that FreeBSD had done it, long ago, almost the same way - it is kind of an obvious thing to do). This will allow code like if test it is shell X then commands using shell X specific expansion ops else if it is shell Y then commands using shell Y specific expansion ops else ... fi Previously expansion errors were detected while parsing, so if we're not shell X, and don't implement something that it does (some extension to the standard) that would have generated a parser syntax error, and the script could not be executed (despite the line with the error never being executed). Note that this change does not handle all such possible extensions, just this one. Others are much harder. One side effect of this change is that sh will now continue reading a variable expansion until it locates the terminating '}' (in ${var} forms) regardless of how broken it obviously is (to our shell) whereas previously it would have bailed out as soon as an oddity was spotted. show more ...
# d47295cc	03-Oct-2024	rillig <rillig@NetBSD.org>	bin: fix lint warning "effectively discards 'const'" For example: src/bin/ed/io.c(339): warning: call to 'strchr' effectively discards 'const' from argument [346] No binary change.
# 6654ff1c	29-Dec-2023	kre <kre@NetBSD.org>	PR bin/57773 Fix another bug reported by Jarle Fredrik Greipsland and added to PR bin/57773, which relates to calculating the length of a positional parameter which contains CTL chars -- yes, this o PR bin/57773 Fix another bug reported by Jarle Fredrik Greipsland and added to PR bin/57773, which relates to calculating the length of a positional parameter which contains CTL chars -- yes, this one really is that specific, though it would also affect the special param $0 if it were to contain CTL chars, and its length was requested - that is fixed with the same change. And note: $0 is not affected because it looks like a positional param (it isn't, ${00} would be, but is always unset, ${0} isn't) all special parame would be affected the same way, but the only one that can ever contain a CTL char is $0 I believe. ($@ and $* were affected, but just because they're expanding the positional params ... ${#@} and ${#*} are both technically unspecified expansions - and different shells produce different results. See the PR for the details of this one (and the previous). Thanks for the PR. XXX pullup to everything. show more ...
# 391d4540	25-Dec-2023	kre <kre@NetBSD.org>	Correct a bizarre piece of source formatting that crept in by accident several years ago (change a space into newline tab). NFC
# 726d188a	06-Mar-2023	kre <kre@NetBSD.org>	Adjust tilde expansion as will be documented in the forthcoming version of the POSIX standard (Issue 8). I believe we were already compliant with what is to be required, but POSIX is now encouragin Adjust tilde expansion as will be documented in the forthcoming version of the POSIX standard (Issue 8). I believe we were already compliant with what is to be required, but POSIX is now encouraging (and will likely require in a later version) that if a tilde expansion produces a string which ends in a '/' and the '~' that was expanded is immediately followed by a '/' in the input word, that one of those two slashes be omitted. The worst (current) example of this is when HOME=/ and we expand ~/foo - previously producing //foo which is (in POSIX) a path with implementation defined semantics, and so not what we should be generating by accident. Change that, so now if the ~ prefix expansion ends in a '/' and there is a '/' following immediately after, the resulting word contains only one of those chars (in the example just given, we will now produce /foo instead). POSIX is also making it clear that the expansion that results from the tilde expansion is treated as quoted (not subject to pathname expansion, or field splitting, or any var/arith/command substitutions) and that if HOME="" the expansion of ~ must generate "" (not nothing). Our implementation did all of that already (though older versions used to treat an empty expansion of HOME the same as if HOME was unset - that was fixed some time ago). The actual modification made here is probably smaller than this log entry, and without added comments, certainly is! show more ...
# 16d85571	22-Nov-2021	kre <kre@NetBSD.org>	PR bin/53550 Here we go again... One more time to redo how here docs are processed (it has been a few years since the last time!) This is actually a relatively minor change, mostly to timimg (to PR bin/53550 Here we go again... One more time to redo how here docs are processed (it has been a few years since the last time!) This is actually a relatively minor change, mostly to timimg (to just when things happen). Now here docs are expanded at the same time the "filename" word in a redirect is expanded, rather than later when the heredoc was being sent to its process. This actually makes things more consistent - but does break one of the ATF tests which was testing that we were (effectively) internally inconsistent in this area. Not all shells agree on the context in which redirection expansions should happen, some make any side effects visible to the parent shell (the majority do) others do the redirection expansions in a subshell so any side effcts are lost. We used to have a foot in each camp, with the majority for everything but here docs, and the minority for here docs. Now we're all the way with LBJ ... (or something like that). show more ...
# b8bee70d	10-Nov-2021	kre <kre@NetBSD.org>	DEBUG mode changes only. NFC (NC) for any normally compiled shell. Mostly adding DEBUG mode tracing (when appropriate verbose tracing is enabled generally) whenever a shell (including sushell) pro DEBUG mode changes only. NFC (NC) for any normally compiled shell. Mostly adding DEBUG mode tracing (when appropriate verbose tracing is enabled generally) whenever a shell (including sushell) process exits, so shells that the tracing should indicate why ehslls that vanish did that. Note for future investigators: if the relevant tracing is enabled, and a (sub-)shell still simply seems to have vanished without trace, the likely cause is that it was killed by a signal - and of those, the most common that occurs is SIGPIPE. show more ...
# 4cb87529	10-Sep-2021	rillig <rillig@NetBSD.org>	bin: remove unnecessary lint comment CONSTCOND Since 2021-01-31, lint no longer warns about 'do ... while (0)'. No functional change.
# b95d46c2	01-Aug-2020	kre <kre@NetBSD.org>	Remove a redundant set of parentheses that were added (along with a extra && or \|\| or something ... forgotten now) as part a failed attempt to fix an earlier bug (later fixed a better way) - when the Remove a redundant set of parentheses that were added (along with a extra && or \|\| or something ... forgotten now) as part a failed attempt to fix an earlier bug (later fixed a better way) - when the extra test (never committed) was removed, the now-redundant parentheses got forgotten... NFC. show more ...
# c69ada4c	13-Feb-2020	kre <kre@NetBSD.org>	When expanding a here-doc (NXHERE - the type with an unquoted end delim) the output will not be further processed (at all) so there is no need to escape magic chars in the output, and doing so leaves When expanding a here-doc (NXHERE - the type with an unquoted end delim) the output will not be further processed (at all) so there is no need to escape magic chars in the output, and doing so leaves stray CTLESC chars in the here doc text. Not good. So don't do that... To save a strlen() of the result, to determine the size of the here doc, make rmescapes() return the length of the resulting string (this isn't needed for other uses, so didn't happen previously). Reported on current-users@ (2020-02-06) by Jun Ebihara XXX pullup -9 show more ...
# d08d589d	14-Oct-2019	christos <christos@NetBSD.org>	remove masking and cast (requested by kre@)
# 7a3a738c	13-Oct-2019	christos <christos@NetBSD.org>	prevent sign extension from making expression always false.
# e291c05e	08-Oct-2019	kre <kre@NetBSD.org>	Remove a (completely harmless) duplicate assignment introduced in a code merge from FreeBSD in 2017. NFC. Pointed out by Roland Illig.
# 7dca2b7e	08-Oct-2019	kre <kre@NetBSD.org>	Open code the validity test & copy of the character class name in a bracket expression in a pattern (ie: [[:THISNAME:]]). Previously the code used strspn() to look for invalid chars in the name, an Open code the validity test & copy of the character class name in a bracket expression in a pattern (ie: [[:THISNAME:]]). Previously the code used strspn() to look for invalid chars in the name, and then memcpy(), now we do the test and copy a character at a time. This might, or might not, be faster, but it now correctly handles \ quoted characters in the name (' and " quoting were already dealt with, \ was too in an earlier version, but when the \ handling changes were made, this piece of code broke). Not exactly a vital bug fix (who writes [[:\alpha:]] or similar?) but it should work correctly regardless of how obscure the usage is. Problem noted by Harald van Dijk XXX pullup -9 show more ...
# 265b0617	10-Apr-2019	kre <kre@NetBSD.org>	PR bin/54112 Fix handling of "$@" (that is, double quoted dollar at), when it appears in a string which will be subject to field splitting. Eg: ${0+"$@" } More common usages, like the simple "$@" PR bin/54112 Fix handling of "$@" (that is, double quoted dollar at), when it appears in a string which will be subject to field splitting. Eg: ${0+"$@" } More common usages, like the simple "$@" or ${0+"$@"} end up being entirely quoted, so no field splitting happens, and the problem was avoided. See the PR for more details. This ends up making a bunch of old hack code (and some that was relatively new) vanish - for now it is just #if 0'd or commented out. Cleanups of that stuff will happen later. That some of the worst $@ hacks are now gone does not mean that processing of "$@" does not retain a very special place in every hackers heart. RIP extreme ugliness - long live the merely ordinary ugly. Added a new bin/sh ATF test case to verify that all this remains fixed. show more ...
# 256d645d	27-Feb-2019	kre <kre@NetBSD.org>	Finish the fixes from Feb 4 for handling of random data that matches the internal CTL* chars. The earlier fixes handled CTL* char values in var expansions, but not in various other places they can o Finish the fixes from Feb 4 for handling of random data that matches the internal CTL* chars. The earlier fixes handled CTL* char values in var expansions, but not in various other places they can occur (positional parameters, $@ $* -- even potentially $0 and ~ expansions, as well as byte strings generated from a \u in a $'' string). These should all be correctly handled now. There is a new ISCTL() macro to make the test, rather than using the old BASESYNTAX[c]==CCTL form (which us still a viable alternative) as the new way allows compiler optimisations, and less mem references, so it should be smaller and faster. Also, be sure in all cases to remove any CTLESC (or other) CTL* chars from all strings before they are made available for any external use (there was one case missed - which didn't matter when we weren't bothering to escape the CTL* chars at all.) XXX pullup-8 (will need to be via a patch) along with the Feb 4 fixes. show more ...
# 58e34de6	04-Feb-2019	kre <kre@NetBSD.org>	Fix an old bug (very old) that was made worse in 1.128 (the "${1+$@}" fixes) where a variable containing a CTL char (the only possibility used to be CTLESC (0x81)) would lose that character if the va Fix an old bug (very old) that was made worse in 1.128 (the "${1+$@}" fixes) where a variable containing a CTL char (the only possibility used to be CTLESC (0x81)) would lose that character if the variable was expanded when "set -f" (noglob) was in effect. 1.128 made this worse by adding more 0x8z values (a couple more) which would see the same behaviour, and one of those was noticed by Martijn Dekker. The reasoning was that when noglob is on, when a var is expanded, there are no magic chars, so (apparently) no need to escape anything. Hence nothing was escaped .. including any CTL chars that happened to be present. When we later rmescapes() the CTL chars that we expect might occur are summarily removed - even if they weren't really CTL chars, but just data masquerading. We must always escape any CTL char clones that are in the var value, no matter what other conditions apply, and what we expect to happen next. While here, fix rmescapes() (and its $(()) clone, rmescapes_nl()) to be more robust, less likely to forget to delete anything (which was not the issue here, just the reverse) and in a DEBUG shell, have the shell abort() if it encounters something in rmescapes() it is not anticipating, so the code can be made to handle it, or if it should not happen, we can find out why it did. XXX pullup -8 (but will need to be via patch, code is quite different). show more ...
# c9f333ad	03-Dec-2018	kre <kre@NetBSD.org>	Yet another foray into the mysterious world of $@ -- this time to fix the (unusual) idiom "${1+$@}" (the quotes are part of it). This seems to have broken about 5 or 6 years ago (somewhere between - Yet another foray into the mysterious world of $@ -- this time to fix the (unusual) idiom "${1+$@}" (the quotes are part of it). This seems to have broken about 5 or 6 years ago (somewhere between -6 and -7), I believe. Note this is not the same as "$@" and also not the same as ${1+"$@"} (much more common idioms) which both worked. Also attempt to deal with "" more correctly, especially when it appears adjacent to "$@" (or one of the similar constructs.) This stuff is still all as ugly and hackish (and fragile) as is possible to imagine, but in an effort to allow some of the weirdness to eventually go away, the parser output has been made more regular and all quoted (parts of) words always now start with CTLQUOTEMARK and end with CTLQUOTEEND regardless of where the quotes appear. This allows us to tell the difference between """$@" and "$@" which was impossible before - yet they are required to generate different output when there are no args (when "$@" simply vanishes). Needless to say that change had ramifications all over the place. To simplify any similar change in the future, there are some new macros that can generally be used to detect the "noise" data when processing words, rather than open coding that every time (which meant that there would always be one which missed getting updated...) Several other bugs (of my making, and older ones) are also fixed. The aim is that (aside from anything that is detecting the cases that were broken before - which were all unlikely uses of sh syntax) these changes should have no external visible impact. Sure... show more ...
# df073671	18-Nov-2018	kre <kre@NetBSD.org>	Rationalise (slightly) the way that expansions are processed to hide meta-characters in the result when the expansion was in (double) quotes, and so should not be further processed. Most of this has Rationalise (slightly) the way that expansions are processed to hide meta-characters in the result when the expansion was in (double) quotes, and so should not be further processed. Most of this has been OK for a long while, but \ needs hiding as well, which complicates things, as \ cannot simply be hidden in the syntax tables as one of the group of random special characters. This was fixed earlier for simple variable expansions, but every variety has its own code path ($var uses different code than $n which is different than $(...), which is different again from ~ expansions, and also from what $'...' produces). This could be fixed by moving them all to a common code path, but that's harder than it seems. The form in which the data is made available differs, so one common routine would need a whole bunch of different "get the next char or indicate end" methods - probably via passing in an accessor function. That's all a lot of churn, and would probably slow the shell. Instead, just make macros for doing the standard tests, and use those instead of open coding (differently) each time. This way some of the code paths don't end up forgetting to handle '\' (which is different than all the others). This removes one optimisation ... when no escaping is needed (like just $var (unquoted) where magic chars (think '*') in the value are intended to remain magic), the code avoided doing two tests for each char ("do we need escapes" and "is this char one that needs escaping") by choosing two different syntax tables (choice made outside the loop) - one of which never returns the magic "needs escaping" result, and the other does when appropriate, and then just avoiding the "do we need escapes" test for each character processed. Then when '\' was fixed, there needed to be another test for it, as it cannot (for other reasons) be the same as all the others for which "this char need escaping" is true. So that added a 2nd test for each char... Not all the code paths were updated. Hence the bugs... nb: this is all rarely seen in the wild, so it is no big surprised that no-one ever noticed. Now the "use two different syntax tables" is gone (the two returned the same for '\' which is why '\' needed special processing) - and in order to avoid two tests for each char (plus the \ test) we duplicate the loops, one of which tests each char to see if it needs an escape, the 2nd just copies them. This should be faster in the "no escapes" code path (though that is not the point) and perhaps also in the "escapes needed" path (no indirect reference to the syntax table - though that would probably be in a register) but makes the code slightly bigger. For /bin/sh the text segment (on amd64) has grown by 48 bytes. But it still uses the same number of 512 byte pages (and hence also any bigger page size). The resulting file size (/bin/sh) is identical before and after. So is /rescue/sh (or /rescue/anything-else). show more ...
# 14482abc	22-Jul-2018	kre <kre@NetBSD.org>	Part 2 of pattern matching (glob etc) fixes. Attempt to correctly deal with \ (both when it is a literal, in appropriate cases, and when it appears as CTLESC when it was detected as a quoting charac Part 2 of pattern matching (glob etc) fixes. Attempt to correctly deal with \ (both when it is a literal, in appropriate cases, and when it appears as CTLESC when it was detected as a quoting character during parsing). In a pattern, in sh, no quoted character can ever be anything other than a literal character. This is quite different than regular expressions, and even different than other uses of glob matching, where shell quoting is not an issue. In something like ls ?\.c the ? is a meta-character, the is a literal (it was quoted). This is nothing new, sh has handled that properly for ever. But the same happens with VAR='?\.c' and ls $VAR which has not always been handled correctly. Of course, in ls "$VAR" nothing in VAR is a meta-character (the entire expansion is quoted) so even the '\' must match literally (or more accurately, no matching happens - VAR simply contains an "unusual" filename). But if it had been ls "$VAR" then we would be looking for filenames that end with the literal 5 characters that make up $VAR. The same kinds of things are requires of matching patterns in case statements, and sub-strings with the % and # operators in variable expansions. While here, the final remnant of the ancient !! pattern matching hack has been removed (the code that actually implemented it was long gone, but one small piece remained, not doing any real harm, but potentially wasting time - if someone gave a pattern which would once have invoked that hack.) show more ...
# d211c89f	22-Jul-2018	kre <kre@NetBSD.org>	NFC: Whitespace cleanups
# bcacfd9a	22-Jul-2018	kre <kre@NetBSD.org>	DEBUG mode only change (ie: no effect to any normal shell). Add tracing of pattern matching (aid in debugging various issues.)
# c83568a7	20-Jul-2018	kre <kre@NetBSD.org>	First pass at fixing some of the more arcane pattern matching possibilities that we do not currently handle all that well. This mostly means (for now) making sure that quoted pattern magic character First pass at fixing some of the more arcane pattern matching possibilities that we do not currently handle all that well. This mostly means (for now) making sure that quoted pattern magic characters (as well as quoted sh syntax magic chars) are properly marked, so they remain known as being quoted, and do not turn into pattern magic. Also, make sure that an unquoted \ in a pattern always quotes whatever comes next (which, unlike in regular expressions, includes inside [] matches), show more ...
# b81009ce	22-Jun-2018	kre <kre@NetBSD.org>	When processing character classes ([:xxx:] inside []), treat a class name that is longer than we can handle the same way we treat an unknown class name (as a valid char class which contains nothing, When processing character classes ([:xxx:] inside []), treat a class name that is longer than we can handle the same way we treat an unknown class name (as a valid char class which contains nothing, so never matches). Previously a "too long" class name invalidated the class, so [:very-long-name:] would match any of '[' ':' 'v' ... (note: "very-long-name" is not long enough to trigger this, but you get the idea!) However, the name itself has a restricted syntax ([[:**:]] is not a character class, it is a match for one of a '[' ':' or '', followed by a ']') which we did not implement - check the syntax of the name before treating it as a character class (but we do add '_' to alphanumerics as legal class name characters). show more ...
# 829cc62a	22-Jun-2018	kre <kre@NetBSD.org>	When matching a char class ([[:name:]]) in a pattern (for filename expansion, case patterrns, etc) do not force '[' to be a member of every class. Before this fix, try: case [ in [[:alpha:]]) echo When matching a char class ([[:name:]]) in a pattern (for filename expansion, case patterrns, etc) do not force '[' to be a member of every class. Before this fix, try: case [ in [[:alpha:]]) echo Huh\?;; esac XXX pullup-8 (Perhaps -7 as well, though that shell version has much more relevant bugs than this one.) This bug is not in -6 as that has no charclass support. show more ...
12 3 4 5 6