History log of /netbsd-src/lib/libedit/chartype.h (Results 1 – 25 of 37)
Revision Date Author Comments
# 3b1edbf2 11-Apr-2022 tnn <tnn@NetBSD.org>

libedit/chartype.h: portability fix for OSF/1


# a5a5a01f 15-Sep-2019 christos <christos@NetBSD.org>

Fix type and remove cast (Yuichiro NAITO/FreeBSD).


# c74ab69c 22-May-2017 christos <christos@NetBSD.org>

Add DragonFly.


# a2d6b270 09-May-2016 christos <christos@NetBSD.org>

s/protected/libedit_private/g


# 300e2ca4 02-May-2016 christos <christos@NetBSD.org>

eliminate static buffer with custom resizing code.


# 9ff2bfe4 02-May-2016 christos <christos@NetBSD.org>

fix typos from Pedro Giffuni @FreeBSD


# 469d44f8 11-Apr-2016 christos <christos@NetBSD.org>

Get rid of private/public; keep protected (Ingo Schwarze)


# a75ea7b9 11-Apr-2016 christos <christos@NetBSD.org>

chartype cleanups from Ingo Schwarze:

- The file tokenizer.c no longer uses chartype.h,
so don't include the header.

- The dummy definitions of ct_{de,en}code_string() for the
NARROWCHAR ca

chartype cleanups from Ingo Schwarze:

- The file tokenizer.c no longer uses chartype.h,
so don't include the header.

- The dummy definitions of ct_{de,en}code_string() for the
NARROWCHAR case are only used in history.c, so move them there.

- Now the whole content of chartype.h is for the wide character
case only. So remove the NARROWCHAR ifdef and include the
header only in the wide character case.

- In chartype.h, move ct_encode_char() below the comment explaining it.

- No more need for underscores before ct_{de,en}code_string().

- Make the conversion buffer resize functions private.
They are only called from the decoding and encoding functions
inside chartype.c, and no need can possibly arise to call them
from anywhere else.

show more ...


# 0594af80 11-Apr-2016 christos <christos@NetBSD.org>

Char -> wchar_t from Ingo Schwarze.


# 0aefc7f9 11-Apr-2016 christos <christos@NetBSD.org>

more macro WIDECHAR undoing from Ingo Schwarze.


# fcf85103 09-Apr-2016 christos <christos@NetBSD.org>

More WIDECHAR elimination (Ingo Schwarze)


# 4e541d85 23-Mar-2016 christos <christos@NetBSD.org>

Start removing the WIDECHAR ifdefs; building without it has stopped working
anyway. (Ingo Schwarze)


# 72614d1e 07-Mar-2016 christos <christos@NetBSD.org>

Remove advertising clause.


# d784c575 02-Mar-2016 christos <christos@NetBSD.org>

PR/50880: David Binderman: Remove redundant code.
While here, fix all debugging formats.


# fe2cf455 24-Feb-2016 christos <christos@NetBSD.org>

Tuck in mbstate_t to the wide char version only to avoid exposing the zeroing
hack and doing it in the narrow case.


# 94623721 24-Feb-2016 christos <christos@NetBSD.org>

Make the read_char function always take a wchar_t * argument (Ingo Schwarze)


# 22383670 17-Feb-2016 christos <christos@NetBSD.org>

whitespace and header sorting changes (Ingo Schwarze). No functional changes.


# 2884af9f 14-Feb-2016 christos <christos@NetBSD.org>

From Ingo Schwarze:

el_getc() for the WIDECHAR case, that is, the version in eln.c.
For a UTF-8 locale, it is broken in four ways:

1. If the character read is outside the ASCII range, the function

From Ingo Schwarze:

el_getc() for the WIDECHAR case, that is, the version in eln.c.
For a UTF-8 locale, it is broken in four ways:

1. If the character read is outside the ASCII range, the function
does an undefined cast from wchar_t to char. Even if wchar_t
is internally represented as UCS-4, that is wrong and dangerous
because characters beyond codepoint U+0255 get their high bits
truncated, meaning that perfectly valid printable Unicode
characters get mapped to arbitrary bytes, even the ASCII escape
character for some Unicode characters. But wchar_t need not
be implemented in terms of UCS-4, so the outcome of this function
is undefined for any and all input.

2. If insufficient space is available for the result, the function
fails to detect failure and returns garbage rather than -1 as
specified in the documentation.

3. The documentation says that errno will be set on failure, but
that doesn't happen either in the above case.

4. Even for ASCII characters, the results may be wrong if wchar_t
is not using UCS-4.

show more ...


# f54e4f97 14-Feb-2016 christos <christos@NetBSD.org>

From Ingo Schwarze:

As we have seen before, "histedit.h" can never get rid of including
the <wchar.h> header because using the data types defined there is
deeply ingrained in the public interfaces o

From Ingo Schwarze:

As we have seen before, "histedit.h" can never get rid of including
the <wchar.h> header because using the data types defined there is
deeply ingrained in the public interfaces of libedit.

Now POSIX unconditionally requires that <wchar.h> defines the type
wint_t. Consequently, it can be used unconditionally, no matter
whether WIDECHAR is active or not. Consequently, the #define Int
is pointless.

Note that removing it is not gratuitious churn. Auditing for
integer signedness problems is already hard when only fundamental
types like "int" and "unsigned" are involved. It gets very hard
when types come into the picture that have platform-dependent
signedness, like "char" and "wint_t". Adding yet another layer
on top, changing both the signedness and the width in a platform-
dependent way, makes auditing yet harder, which IMHO is really
dangerous. Note that while removing the #define, i already found
one bug caused by this excessive complication - in the function
re_putc() in refresh.c. If WIDECHAR was defined, it printed an
Int = wint_t value with %c. Fortunately, that bug only affects
debugging, not production. The fix is contained in the patch.

With WIDECHAR, this doesn't change anything. For the case without
WIDECHAR, i checked that none of the places wants to store values
that might not fit in wint_t.

This only changes internal interfaces; public ones remain unchanged.

show more ...


# 61ee3048 14-Feb-2016 christos <christos@NetBSD.org>

From Ingo Schwartze:

Next step: Remove #ifdef'ing in read_char(), in the same style
as we did for setlocale(3) in el.c.

A few remarks are required to explain the choices made.

* On first sight,

From Ingo Schwartze:

Next step: Remove #ifdef'ing in read_char(), in the same style
as we did for setlocale(3) in el.c.

A few remarks are required to explain the choices made.

* On first sight, handling mbrtowc(3) seems a bit less trivial
than handling setlocale(3) because its prototype uses the data
type mbstate_t from <wchar.h>. However, it turns out that
"histedit.h" already includes <wchar.h> unconditionally (i don't
like headers including other headers, but that ship has sailed,
people are by now certainly used to the fact that including
"histedit.h" doesn't require including <wchar.h> before), and
"histedit.h" is of course included all over the place. So from
that perspective, there is no problem with using mbrtowc(3)
unconditionally ever for !WIDECHAR.

* However, <wchar.h> also defines the mbrtowc(3) prototype,
so we cannot just #define mbrtowc away, or including the header
will break. It would also be a bad idea to porovide a local
implementation of mbrtowc() and hope that it overrides the one
in libc. Besides, the required prototype is subtly different:
While mbrtowc(3) takes "wchar_t *" as its first argument, we
need a function that takes "Char *". So unfortunately, we have
to keep a ct_mbrtowc #define, at least until we can maybe get
rid of "Char *" in the more remote future.

* After getting rid of the #else clause in read_char(), we can
pull "return 1;" into the default: clause. After that, we can
get rid of the ugly "goto again_lastbyte;" and just "break;".
As a bonus, that also gets rid of the ugly CONSTCOND.

* While here, delete the unused ct_mbtowc() from chartype.h.

show more ...


# 28c02909 11-Feb-2016 christos <christos@NetBSD.org>

remove unused wrapper (Ingo Schwarze)


# 6b42622b 08-Feb-2016 christos <christos@NetBSD.org>

UTF-8 fixes from Ingo Schwarze:

1. Assume that errno is non-zero when entering read_char()
and that read(2) returns 0 (indicating end of file).
Then, the code will clear errno before return

UTF-8 fixes from Ingo Schwarze:

1. Assume that errno is non-zero when entering read_char()
and that read(2) returns 0 (indicating end of file).
Then, the code will clear errno before returning.
(Obviously, the statement "errno = 0" is almost always
a bug unless there is save_errno = errno right before it
and the previous value is properly restored later,
in all reachable code paths.)

2. When encountering an invalid byte sequence, the code discards
all following bytes until MB_LEN_MAX overflows; consider, for
example, 0xc2 immediately followed by a few valid ASCII bytes.
Three of those ASCII bytes will be discarded.

3. On a POSIX system, EILSEQ will always be set after reading a
valid (yes, valid, not invalid!) UTF-8 character. The reason
is that mbtowc(3) will first be called with a length limit
(third argument) of 1, which will fail, return -1, and - on
a POSIX system - set errno to EILSEQ.
This third bug is mitigated a bit because i couldn't find any
system that actually conforms to POSIX in this respect: None
of OpenBSD, NetBSD, FreeBSD, Solaris 11, and glibc set errno
when an incomplete character is passed to mbtowc(3), even though
that is required by POSIX.
Anyway, that mbtowc(3) bug will be fixed at least in OpenBSD
after release unlock, so it would be good to fix this bug in
libedit before fixing the bug in mbtowc(3).

How can these three bugs be fixed?

1. As far as i understand it, the intention of the bogus errno = 0
is to undo the effects of failing system calls in el_wset(),
sig_set(), and read__fixio() if the subsequent read(2) indicates
end of file. So, restoring errno has to be moved right after
read__fixio(). Of course, neither 0 nor e is the right value
to restore: 0 is wrong if errno happened to be set on entry, e
would be wrong because if one read(2) fails but a second attempt
succeeds after read__fixio(), errno should not be touched. So,
the errno to be restored in this case has to be saved before
calling read(2) for the first time.

2. Solving the second issue requires distinguishing invalid and
incomplete characters, but that is impossible with the function
mbtowc(3) because it returns -1 in both cases and sets errno
to EILSEQ in both cases (once properly implemented).

It is vital that each input character is processed right away.
It is not acceptable to wait for the next input character before
processing the previous one because this is an interactive
library, not a batch system. Consequently, the only situation
where it is acceptable to wait for the next byte without first
processing the previous one(s) is when the previous one(s) form
an incomplete sequence that can be continued to form a valid
character.

Consequently, short of reimplementing a full UTF-8 state machine
by hand, the only correct way forward is to use mbrtowc(3).
Even then, care is needed to always have the state object
properly initialized before using it, and to not discard a valid
ASCII or UTF-8 lead byte if it happens to follow an invalid
sequence.

3. Fortunately, solution 2. also solves issue 3. as a side effect,
by no longer using mbtowc(3) in the first place.

show more ...


# 5113710e 17-May-2015 christos <christos@NetBSD.org>

add FreeBSD


# d2ed0c78 14-May-2015 christos <christos@NetBSD.org>

fix warnings on ubuntu 32 bit (Miki Rozloznik)


# 21ea10bd 22-Feb-2015 christos <christos@NetBSD.org>

split the allocation functions, their mixed usage was too confusing.


12