1# The goal is for perl to compile and reasonably run any version of Unicode. 2# Working reasonably well doesn't mean that the test suite will run without 3# showing errors. A few of the very-Unicode specific test files have been 4# modified to account for different versions, but most have not. For example, 5# some tests use characters that aren't encoded in all Unicode versions; others 6# have hard-coded the General Categories for a code point that were correct at 7# the time the test was written. Perl itself will not compile under Unicode 8# releases prior to 3.0 without a simple change to Unicode::Normalize. 9# mktables contains instructions for this. 10 11# The *.txt files were copied from 12 13# ftp://www.unicode.org/Public/UNIDATA 14 15# (which always points to the latest version) with subdirectories 'extracted' and 16# 'auxiliary'. Older versions are located under Public with an appropriate name. 17# They are also available via http at www.unicode.org/versions/ 18# 19 20# The Unihan files were not included due to space considerations. Also NOT 21# included were any *.html files. It is possible to add the Unihan files and 22# have some properties from them automatically compiled. By editing mktables 23# (see instructions near its beginning) you can add other Unihan properties. 24 25# The file named 'version' should exist and be a single line with the Unicode 26# version, like: 27# 28# 5.2.0 29# 30# (without the initial '# ') 31 32# To be 8.3 filesystem friendly, the names of some of the input files have been 33# changed from the values that are in the Unicode DB. Not all of the Test 34# files are currently used, so may not be present, so some of the mv's can 35# fail. The .html Test files are not touched. 36 37mv PropertyValueAliases.txt PropValueAliases.txt 38mv NamedSequencesProv.txt NamedSqProv.txt 39mv NormalizationTest.txt NormTest.txt 40mv DerivedAge.txt DAge.txt 41mv DerivedCoreProperties.txt DCoreProperties.txt 42mv DerivedNormalizationProps.txt DNormalizationProps.txt 43mv IdentifierStatus.txt IdStatus.txt 44mv IdentifierType.txt IdType.txt 45 46# Some early releases don't have the extracted directory, and hence these files 47# should be moved to it. 48mkdir extracted 2>/dev/null 49mv DerivedBidiClass.txt DerivedBinaryProperties.txt extracted 2>/dev/null 50mv DerivedCombiningClass.txt DerivedDecompositionType.txt extracted 2>/dev/null 51mv DerivedEastAsianWidth.txt DerivedGeneralCategory.txt extracted 2>/dev/null 52mv DerivedJoiningGroup.txt DerivedJoiningType.txt extracted 2>/dev/null 53mv DerivedLineBreak.txt DerivedNumericType.txt DerivedNumericValues.txt extracted 2>/dev/null 54 55mv extracted/DerivedBidiClass.txt extracted/DBidiClass.txt 56mv extracted/DerivedBinaryProperties.txt extracted/DBinaryProperties.txt 57mv extracted/DerivedCombiningClass.txt extracted/DCombiningClass.txt 58mv extracted/DerivedDecompositionType.txt extracted/DDecompositionType.txt 59mv extracted/DerivedEastAsianWidth.txt extracted/DEastAsianWidth.txt 60mv extracted/DerivedGeneralCategory.txt extracted/DGeneralCategory.txt 61mv extracted/DerivedJoiningGroup.txt extracted/DJoinGroup.txt 62mv extracted/DerivedJoiningType.txt extracted/DJoinType.txt 63mv extracted/DerivedLineBreak.txt extracted/DLineBreak.txt 64mv extracted/DerivedNumericType.txt extracted/DNumType.txt 65mv extracted/DerivedNumericValues.txt extracted/DNumValues.txt 66mv extracted/DerivedName.txt extracted/DName.txt 67rmdir extracted 2>/dev/null # Will fail if non-empty, but if it is empty 68 # was an early release that didn't have it. 69 70mv auxiliary/GraphemeBreakTest.txt auxiliary/GCBTest.txt 71mv auxiliary/LineBreakTest.txt auxiliary/LBTest.txt 72mv auxiliary/SentenceBreakTest.txt auxiliary/SBTest.txt 73mv auxiliary/WordBreakTest.txt auxiliary/WBTest.txt 74 75# If you have the Unihan database (5.2 and above), you should also do the 76# following: 77 78mv Unihan_DictionaryIndices.txt UnihanIndicesDictionary.txt 79mv Unihan_DictionaryLikeData.txt UnihanDataDictionaryLike.txt 80mv Unihan_IRGSources.txt UnihanIRGSources.txt 81mv Unihan_NumericValues.txt UnihanNumericValues.txt 82mv Unihan_OtherMappings.txt UnihanOtherMappings.txt 83mv Unihan_RadicalStrokeCounts.txt UnihanRadicalStrokeCounts.txt 84mv Unihan_Readings.txt UnihanReadings.txt 85mv Unihan_Variants.txt UnihanVariants.txt 86 87# If you download everything, the names of files that are not used by mktables 88# are not changed by the above, and hence may not work correctly as-is on 8.3 89# filesystems. 90 91# mktables is used to generate the tables used by the rest of Perl. It will 92# warn you about any *.txt and *.html files in the directory substructure that 93# it doesn't know about. You should remove any so-identified, or edit mktables 94# to add them to its lists to process. You can run 95# 96# mktables -globlist 97# 98# to have it try to process these tables generically. 99 100# COMPILING ON OLDER UNICODE VERSIONS 101# 102# To compile perl for use with an older Unicode release, delete everything in 103# the lib/unicore directory except mktables and Makefile. Then download the 104# Unicode-supplied files for the desired version to that directory (A url for 105# these is given earlier in this file). Then create the 'version' file with a 106# single line, like '6.1.0'. Do a 'make test' from the project level. You 107# will get some porting errors for needing to regen. Regenerate what it tells 108# you are needed, and make test again. If you compile an old enough version, 109# you will also have to download a few files from later Unicode versions, 110# following the instructions that will be given if warranted. It should 111# compile in any release without warnings, except for some casing conflicts 112# in Unicode 2.1.8, and some extraneous files will show up in very early 113# releases of the form qr/diff.*\.txt/. If you add Unihan.txt, one line is in error in 114# 115# Other glitches are noted in mktables under 'UNICODE VERSIONS NOTES' 116 117# FOR PUMPKINS 118# 119# The files are inter-related. If you take the latest UnicodeData.txt, for 120# example, but leave the older versions of other files, there can be subtle 121# problems. So get everything available from Unicode, and delete those which 122# aren't needed. 123# 124# When moving to a new version of Unicode, you need to update 'version' by hand 125# 126# p4 edit version 127# ... 128# 129# You should look in the Unicode release notes (which are probably towards the 130# bottom of http://www.unicode.org/reports/tr44/) to see if any properties have 131# newly been moved to be Obsolete, Deprecated, or Stabilized. The full names 132# for these should be added to the respective lists near the beginning of 133# mktables, using an 'if' to add them for just this Unicode version going 134# forward, so that mktables can continue to be used for earlier Unicode 135# versions. 136# 137# When putting out a new Perl release, think about if any of the Deprecated 138# properties should be moved to Suppressed. 139# 140# perlrecharclass.pod has a list of all the characters that are white space, 141# which needs to be updated if there are changes. A quick way to check if 142# there have been changes would be to see if the number of such characters 143# listed in perluniprops.pod (generated by running mktables) for the property 144# \p{White_Space} is no longer 25. Further investigation would then be 145# necessary to classify the new characters as horizontal and vertical. 146# 147# The code in regexec.c for the \X match construct is intimately tied to the 148# regular expression in UAX #29 (http://www.unicode.org/reports/tr29/). You 149# should see if it has changed, and if so, regexec.c should be modified. The 150# current one is 151# ( CRLF 152# | Prepend* ( RI-sequence | Hangul-Syllable | !Control ) 153# ( Grapheme_Extend | SpacingMark )* 154# | . ) 155# 156# mktables has many checks to warn you if there are unexpected or novel things 157# that it doesn't know how to handle. 158# 159# Module::CoreList should be changed to include the new release 160# 161# Also, you should regen l1_char_class_tab.h, by 162# 163# perl regen/mk_L_charclass.pl 164# 165# and, regen charclass_invlists.h by 166# 167# perl regen/mk_invlists.pl 168# 169# Finally: 170# 171# p4 submit 172# 173# -- 174# jhi@iki.fi; updated by nick@ccl4.org, public@khwilliamson.com 175