Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
lib/ | H | - | - | 9,766 | 7,812 | |
LICENSE | H A D | 06-Aug-2015 | 1.4 KiB | 30 | 24 | |
README | H A D | 06-Aug-2015 | 9.6 KiB | 236 | 181 | |
README.DELETED | H A D | 06-Aug-2015 | 303 | 31 | 30 | |
README.DRAGONFLY | H A D | 06-Aug-2015 | 437 | 17 | 12 |
README
1Introduction 2 3 TRE is a lightweight, robust, and efficient POSIX compliant regexp 4 matching library with some exciting features such as approximate 5 (fuzzy) matching. 6 7 The matching algorithm used in TRE uses linear worst-case time in 8 the length of the text being searched, and quadratic worst-case 9 time in the length of the used regular expression. In other words, 10 the time complexity of the algorithm is O(M^2N), where M is the 11 length of the regular expression and N is the length of the 12 text. The used space is also quadratic on the length of the regex, 13 but does not depend on the searched string. This quadratic 14 behaviour occurs only on pathological cases which are probably very 15 rare in practice. 16 17Features 18 19 TRE is not just yet another regexp matcher. TRE has some features 20 which are not there in most free POSIX compatible 21 implementations. Most of these features are not present in non-free 22 implementations either, for that matter. 23 24Approximate matching 25 26 Approximate pattern matching allows matches to be approximate, that 27 is, allows the matches to be close to the searched pattern under 28 some measure of closeness. TRE uses the edit-distance measure (also 29 known as the Levenshtein distance) where characters can be 30 inserted, deleted, or substituted in the searched text in order to 31 get an exact match. Each insertion, deletion, or substitution adds 32 the distance, or cost, of the match. TRE can report the matches 33 which have a cost lower than some given threshold value. TRE can 34 also be used to search for matches with the lowest cost. 35 36 TRE includes a version of the agrep (approximate grep) command line 37 tool for approximate regexp matching in the style of grep. Unlike 38 other agrep implementations (like the one by Sun Wu and Udi Manber 39 from University of Arizona available here) TRE agrep allows full 40 regexps of any length, any number of errors, and non-uniform costs 41 for insertion, deletion and substitution. 42 43Strict standard conformance 44 45 POSIX defines the behaviour of regexp functions precisely. TRE 46 attempts to conform to these specifications as strictly as 47 possible. TRE always returns the correct matches for subpatterns, 48 for example. Very few other implementations do this correctly. In 49 fact, the only other implementations besides TRE that I am aware of 50 (free or not) that get it right are Rx by Tom Lord, Regex++ by John 51 Maddock, and the AT&T ast regex by Glenn Fowler and Doug McIlroy. 52 53 The standard TRE tries to conform to is the IEEE Std 1003.1-2001, 54 or Open Group Base Specifications Issue 6, commonly referred to as 55 "POSIX". It can be found online here. The relevant parts are the 56 base specifications on regular expressions (and the rationale) and 57 the description of the regcomp() API. 58 59 For an excellent survey on POSIX regexp matchers, see the testregex 60 pages by Glenn Fowler of AT&T Labs Research. 61 62Predictable matching speed 63 64 Because of the matching algorithm used in TRE, the maximum time 65 consumed by any regexec() call is always directly proportional to 66 the length of the searched string. There is one exception: if back 67 references are used, the matching may take time that grows 68 exponentially with the length of the string. This is because 69 matching back references is an NP complete problem, and almost 70 certainly requires exponential time to match in the worst case. 71 72Predictable and modest memory consumption 73 74 A regexec() call never allocates memory from the heap. TRE 75 allocates all the memory it needs during a regcomp() call, and some 76 temporary working space from the stack frame for the duration of 77 the regexec() call. The amount of temporary space needed is 78 constant during matching and does not depend on the searched 79 string. For regexps of reasonable size TRE needs less than 50K of 80 dynamically allocated memory during the regcomp() call, less than 81 20K for the compiled pattern buffer, and less than two kilobytes of 82 temporary working space from the stack frame during a regexec() 83 call. There is no time/memory tradeoff. TRE is also small in code 84 size; statically linking with TRE increases the executable size 85 less than 30K (gcc-3.2, x86, GNU/Linux). 86 87Wide character and multibyte character set support 88 89 TRE supports multibyte character sets. This makes it possible to 90 use regexps seamlessly with, for example, Japanese locales. TRE 91 also provides a wide character API. 92 93Binary pattern and data support 94 95 TRE provides APIs which allow binary zero characters both in 96 regexps and searched strings. The standard API cannot be easily 97 used to, for example, search for printable words from binary data 98 (although it is possible with some hacking). Searching for patterns 99 which contain binary zeroes embedded is not possible at all with 100 the standard API. 101 102Completely thread safe 103 104 TRE is completely thread safe. All the exported functions are 105 re-entrant, and a single compiled regexp object can be used 106 simultaneously in multiple contexts; e.g. in main() and a signal 107 handler, or in many threads of a multithreaded application. 108 109Portable 110 111 TRE is portable across multiple platforms. Here's a table of 112 platforms and compilers that have been successfully used to compile 113 and run TRE: 114 115 Platform(s) | Compiler(s) 116 ----------------------------------+------------ 117 AIX 4.3.2 - 5.3.0 | GCC, C for AIX compiler version 5 118 Compaq Tru64 UNIX V5.1A/B | Compaq C V6.4-014 - V6.5-011 119 Cygwin 1.3 - 1.5 | GCC 120 Digital UNIX V4.0 | DEC C V5.9-005 121 FreeBSD 4 and above | GCC 122 GNU/Linux systems on x86, x86_64, | GCC 123 ppc64, s390 | 124 HP-UX 10.20- 11.00 | GCC, HP C Compiler 125 IRIX 6.5 | GCC, MIPSpro Compilers 7.3.1.3m 126 Max OS X | 127 NetBSD 1.5 and above | GCC, egcs 128 OpenBSD 3.3 and above | GCC 129 Solaris 2.7-10 sparc/x86 | GCC, Sun Workshop 6 compilers 130 Windows 98 - XP | Microsoft Visual C++ 6.0 131 132 TRE 0.7.5 should compile without changes on all of the above 133 platforms. Tell me if you are using TRE on a platform that is not 134 listed above, and I'll add it to the list. Also let me know if TRE 135 does not work on a listed platform. 136 137 Depending on the platform, you may need to install libutf8 to get 138 wide character and multibyte character set support. 139 140 Free 141 142 TRE is released under a license which is essentially the same as 143 the "2 clause" BSD-style license used in NetBSD. See the file 144 LICENSE for details. 145 146Roadmap 147 148 There are currently two features, both related to collating 149 elements, missing from 100% POSIX compliance. These are: 150 151 * Support for collating elements (e.g. [[.<X>.]], where <X> is a 152 collating element). It is not possible to support 153 multi-character collating elements portably, since POSIX does 154 not define a way to determine whether a character sequence is a 155 multi-character collating element or not. 156 157 * Support for equivalence classes, for example [[=<X>=]], where 158 <X> is a collating element. An equivalence class matches any 159 character which has the same primary collation weight as 160 <X>. Again, POSIX provides no portable mechanism for 161 determining the primary collation weight of a collating 162 element. 163 164 Note that other portable regexp implementations don't support 165 collating elements either. The single exception is Regex++, which 166 comes with its own database for collating elements for different 167 locales. Support for collating elements and equivalence classes has 168 not been widely requested and is not very high on the TODO list at 169 the moment. 170 171 These are other features I'm planning to implement real soon now: 172 173 * All the missing GNU extensions enabled in GNU regex, such as 174 [[:<:]] and [[:>:]] 175 176 * A REG_SHORTEST regexec() flag for returning the shortest match 177 instead of the longest match. 178 179 * Perl-compatible syntax 180 181 [:^class:] 182 Matches anything but the characters in class. Note that 183 [^[:class:]] works already, this would be just a 184 convenience shorthand. 185 186 \A 187 Match only at beginning of string 188 189 \Z 190 Match only at end of string, or before newline at the end 191 192 \z 193 Match only at end of string 194 195 \l 196 Lowercase next char (think vi) 197 198 \u 199 Uppercase next char (think vi) 200 201 \L 202 Lowercase till \E (think vi) 203 204 \U 205 Uppercase till \E (think vi) 206 207 (?=pattern) 208 Zero-width positive look-ahead assertions. 209 210 (?!pattern) 211 Zero-width negative look-ahead assertions. 212 213 (?<=pattern) 214 Zero-width positive look-behind assertions. 215 216 (?<!pattern) 217 Zero-width negative look-behind assertions. 218 219 Documentation especially for the nonstandard features of TRE, such 220 as approximate matching, is a work in progress (with "progress" 221 loosely defined...) 222 223 Mailing lists 224 225 tre-general@lists.laurikari.net 226 This list is for any discussion on the TRE software, including 227 reporting bugs, feature requests, requests for help, and other 228 things. 229 230 tre-announce@lists.laurikari.net 231 Subscribe to this list to get announcements of new releases of 232 TRE. Alternatively, you can subscribe to the freshmeat.net 233 project and get similar announcements that way, if you prefer. 234 235Ville Laurikari <vl@iki.fi> 236
README.DELETED
1ABOUT-NLS 2AUTHORS 3ChangeLog 4INSTALL 5Makefile.am 6Makefile.in 7NEWS 8THANKS 9TODO 10aclocal.m4 11config.h.in 12configure 13configure.ac 14doc/ 15lib/Makefile.am 16lib/Makefile.in 17lib/regex.h 18lib/tre-config.h.in 19lib/tre.h 20lib/tre-match-approx.c 21lib/xmalloc.c 22m4/ 23po/ 24python/ 25src/ 26tests/ 27tre.pc.in 28tre.spec.in 29utils/ 30win32/ 31
README.DRAGONFLY
1TRE REGEX LIBRARY 2================= 3 4Original source can be downloaded from: 5http://laurikari.net/tre/ 6 7file = tre-0.8.0.tar.bz2 8date = Sep 20 2009 9size = 380714 10sha1 = a41692e64b40ebae3cffe83931ddbf8420a10ae3 11 12A list of files and directories removed is in README.DELETED 13 14Local modifications applied all files. The collation, equivalence, and 15legacy character-name table support came from modifications by Apple 16(opensource.apple.com) 17