xref: /netbsd-src/external/gpl3/gcc.old/dist/contrib/unicode/README (revision 4c3eb207d36f67d31994830c0a694161fc1ca39b)
1*4c3eb207SmrgThis directory contains a mechanism for GCC to have its own internal
2*4c3eb207Smrgimplementation of wcwidth functionality.  (cpp_wcwidth () in libcpp/charset.c).
3*4c3eb207Smrg
4*4c3eb207SmrgThe idea is to produce the necessary lookup table
5*4c3eb207Smrg(../../libcpp/generated_cpp_wcwidth.h) in a reproducible way, starting from the
6*4c3eb207Smrgfollowing files that are distributed by the Unicode Consortium:
7*4c3eb207Smrg
8*4c3eb207Smrgftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
9*4c3eb207Smrgftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt
10*4c3eb207Smrgftp://ftp.unicode.org/Public/UNIDATA/PropList.txt
11*4c3eb207Smrg
12*4c3eb207SmrgThese three files have been added to source control in this directory;
13*4c3eb207Smrgplease see unicode-license.txt for the relevant copyright information.
14*4c3eb207Smrg
15*4c3eb207SmrgIn order to keep in sync with glibc's wcwidth as much as possible, it is
16*4c3eb207Smrgdesirable for the logic that processes the Unicode data to be the same as
17*4c3eb207Smrgglibc's.  To that end, we also put in this directory, in the from_glibc/
18*4c3eb207Smrgdirectory, the glibc python code that implements their logic.  This code was
19*4c3eb207Smrgcopied verbatim from glibc, and it can be updated at any time from the glibc
20*4c3eb207Smrgsource code repository.  The files copied from that respository are:
21*4c3eb207Smrg
22*4c3eb207Smrglocaledata/unicode-gen/unicode_utils.py
23*4c3eb207Smrglocaledata/unicode-gen/utf8_gen.py
24*4c3eb207Smrg
25*4c3eb207SmrgAnd the most recent versions added to GCC are from glibc git commit:
26*4c3eb207Smrg2a764c6ee848dfe92cb2921ed3b14085f15d9e79
27*4c3eb207Smrg
28*4c3eb207SmrgFinally, the script gen_wcwidth.py found here contains the GCC-specific code to
29*4c3eb207Smrgmap glibc's output to the lookup tables we require.  This script should not need
30*4c3eb207Smrgto change, unless there are structural changes to the Unicode data files or to
31*4c3eb207Smrgthe glibc code.
32*4c3eb207Smrg
33*4c3eb207SmrgThe procedure to update GCC's wcwidth tables is the following:
34*4c3eb207Smrg
35*4c3eb207Smrg1.  Update the three Unicode data files from the above URLs.
36*4c3eb207Smrg
37*4c3eb207Smrg2.  Update the two glibc files in from_glibc/ from glibc's git.  Update
38*4c3eb207Smrg    the commit number above in this README.
39*4c3eb207Smrg
40*4c3eb207Smrg3.  Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
41*4c3eb207Smrg    (where X.Y is the version of the Unicode standard corresponding to the
42*4c3eb207Smrg    Unicode data files being used, most recently, 12.1).
43*4c3eb207Smrg
44*4c3eb207SmrgAfter that, GCC's wcwidth will match the most recent glibc.
45