1*a7c91847SchristosIt would be nice if the RCS file format (which is implemented by a 2*a7c91847Schristosgreat many tools, both free and non-free, both by calling GNU RCS and 3*a7c91847Schristosby reimplementing access to RCS files) were documented in some 4*a7c91847Schristosstandard separate from any one tool. But as far as I know no such 5*a7c91847Schristosstandard exists. Hence this file. 6*a7c91847Schristos 7*a7c91847SchristosThe place to start is the rcsfile.5 manpage in the GNU RCS 5.7 8*a7c91847Schristosdistribution. Then look at the diff at the end of this file (which 9*a7c91847Schristoscontains a few fixes and clarifications to that manpage). 10*a7c91847Schristos 11*a7c91847SchristosIf you are interested in MKS RCS, src/ci.c in GNU RCS 5.7 has a 12*a7c91847Schristoscomment about their date format. However, as far as we know there 13*a7c91847Schristosisn't really any document describing MKS's changes to the RCS file 14*a7c91847Schristosformat. 15*a7c91847Schristos 16*a7c91847SchristosThe rcsfile.5 manpage does not document what goes in the "text" field 17*a7c91847Schristosfor each revision. The answer is that the head revision contains the 18*a7c91847Schristoscontents of that revision and every other revision contain a bunch of 19*a7c91847Schristosedits to produce that revision ("a" and "d" lines). The GNU diff 20*a7c91847Schristosmanual (the version I looked at was for GNU diff 2.4) documents this 21*a7c91847Schristosformat somewhat (as the "RCS output format"), but the presentation is 22*a7c91847Schristosa bit confusing as it is all tangled up with the documentation of 23*a7c91847Schristosseveral other output formats. If you just want some source code to 24*a7c91847Schristoslook at, the part of CVS which applies these is RCS_deltas in 25*a7c91847Schristossrc/rcs.c. 26*a7c91847Schristos 27*a7c91847SchristosThe rcsfile.5 documentation only _very_ briefly touches on the order 28*a7c91847Schristosof the revisions. The order _is_ important and CVS relies on it. 29*a7c91847SchristosHere is an example of what I was able to find, based on the join3 30*a7c91847Schristossanity.sh testcase (and the behavior I am documenting here seems to be 31*a7c91847Schristosthe same for RCS 5.7 and CVS 1.9.27): 32*a7c91847Schristos 33*a7c91847Schristos 1.1 -----------------> 1.2 34*a7c91847Schristos \---> 1.1.2.1 \---> 1.2.2.1 35*a7c91847Schristos 36*a7c91847SchristosHere is how this shows up in the RCS file (omitting irrelevant parts): 37*a7c91847Schristos 38*a7c91847Schristos admin: head 1.2; 39*a7c91847Schristos deltas: 40*a7c91847Schristos 1.2 branches 1.2.2.1; next 1.1; 41*a7c91847Schristos 1.1 branches 1.1.2.1; next; 42*a7c91847Schristos 1.1.2.1 branches; next; 43*a7c91847Schristos 1.2.2.1 branches; next; 44*a7c91847Schristos deltatexts: 45*a7c91847Schristos 1.2 46*a7c91847Schristos 1.2.2.1 47*a7c91847Schristos 1.1 48*a7c91847Schristos 1.1.2.1 49*a7c91847Schristos 50*a7c91847SchristosYes, the order seems to differ between the deltas and the deltatexts. 51*a7c91847SchristosI have no idea how much of this should actually be considered part of 52*a7c91847Schristosthe RCS file format, and how much programs reading it should expect to 53*a7c91847Schristosencounter any order. 54*a7c91847Schristos 55*a7c91847SchristosThe rcsfile.5 grammar shows the {num} after "next" as optional; if it 56*a7c91847Schristosis omitted then there is no next delta node (for example 1.1 or the 57*a7c91847Schristoshead of a branch will typically have no next). 58*a7c91847Schristos 59*a7c91847SchristosThere is one case where CVS uses CVS-specific, non-compatible changes 60*a7c91847Schristosto the RCS file format, and this is magic branches. See cvs.texinfo 61*a7c91847Schristosfor more information on them. CVS also sets the RCS state to "dead" 62*a7c91847Schristosto indicate that a file does not exist in a given revision (this is 63*a7c91847Schristosstored just as any other RCS state is). 64*a7c91847Schristos 65*a7c91847SchristosThe RCS file format allows quite a variety of extensions to be added 66*a7c91847Schristosin a compatible manner by use of the "newphrase" feature documented in 67*a7c91847Schristosrcsfile.5. We won't try to document extensions not used by CVS in any 68*a7c91847Schristosdetail, but we will briefly list them. Each occurrence of a newphrase 69*a7c91847Schristosbegins with an identifier, which is what we list here. Future 70*a7c91847Schristosdesigners of extensions are strongly encouraged to pick 71*a7c91847Schristosnon-conflicting identifiers. Note that newphrase occurs several 72*a7c91847Schristosplaces in the RCS grammar, and a given extension may not be legal in 73*a7c91847Schristosall locations. However, it seems better to reserve a particular 74*a7c91847Schristosidentifier for all locations, to avoid confusion and complicated 75*a7c91847Schristosrules. 76*a7c91847Schristos 77*a7c91847Schristos Identifier Used by 78*a7c91847Schristos ---------- ------- 79*a7c91847Schristos namespace RCS library done at Silicon Graphics Inc. (SGI) in 1996 80*a7c91847Schristos (a modified RCS 5.7--not sure it has any other name). 81*a7c91847Schristos dead A set of RCS patches developed by Rich Pixley at 82*a7c91847Schristos Cygnus about 1992. These were for CVS, and predated 83*a7c91847Schristos the current CVS death support, which uses a state "dead" 84*a7c91847Schristos rather than a "dead" newphrase. 85*a7c91847Schristos 86*a7c91847SchristosCVS does use newphrases to implement the `PreservePermissions' 87*a7c91847Schristosextension introduced in CVS 1.9.26. The following new keywords are 88*a7c91847Schristosdefined when PreservePermissions=yes: 89*a7c91847Schristos 90*a7c91847Schristos owner 91*a7c91847Schristos group 92*a7c91847Schristos permissions 93*a7c91847Schristos special 94*a7c91847Schristos symlink 95*a7c91847Schristos hardlinks 96*a7c91847Schristos 97*a7c91847SchristosThe contents of the `owner' and `group' field should be a numeric uid 98*a7c91847Schristosand a numeric gid, respectively, representing the user and group who 99*a7c91847Schristosown the file. The `permissions' field contains an octal integer, 100*a7c91847Schristosrepresenting the permissions that should be applied to the file. The 101*a7c91847Schristos`special' field contains two words; the first must be either `block' 102*a7c91847Schristosor `character', and the second is the file's device number. The 103*a7c91847Schristos`symlink' field should be present only in files which are symbolic 104*a7c91847Schristoslinks to other files, and absent on all regular files. The 105*a7c91847Schristos`hardlinks' field contains a list of filenames to which the current 106*a7c91847Schristosfile is linked, in alphabetical order. Because files often contain 107*a7c91847Schristoscharacters special to RCS, like `.' and sometimes even contain spaces 108*a7c91847Schristosor eight-bit characters, the filenames in the hardlinks field will 109*a7c91847Schristosusually be enclosed in RCS strings. For example: 110*a7c91847Schristos 111*a7c91847Schristos hardlinks README @install.txt@ @Installation Notes@; 112*a7c91847Schristos 113*a7c91847SchristosThe hardlinks field should always include the name of the current 114*a7c91847Schristosfile. That is, in the repository file README,v, any hardlinks fields 115*a7c91847Schristosin the delta nodes should include `README'; CVS will not operate 116*a7c91847Schristosproperly if this is not done. 117*a7c91847Schristos 118*a7c91847SchristosNewphrases are also used to implement the 'commitid' feature. The 119*a7c91847Schristosfollowing new keyword is defined: 120*a7c91847Schristos 121*a7c91847Schristos commitid 122*a7c91847Schristos 123*a7c91847SchristosThe rules regarding keyword expansion are not documented along with 124*a7c91847Schristosthe rest of the RCS file format; they are documented in the co(1) 125*a7c91847Schristosmanpage in the RCS 5.7 distribution. See also the "Keyword 126*a7c91847Schristossubstitution" chapter of cvs.texinfo. The co(1) manpage refers to 127*a7c91847Schristosspecial behavior if the log prefix for the $Log keyword is /* or (*. 128*a7c91847SchristosRCS 5.7 produces a warning whenever it behaves that way, and current 129*a7c91847Schristosversions of CVS do not handle this case in a special way (CVS 1.9 and 130*a7c91847Schristosearlier invoke RCS to perform keyword expansion). 131*a7c91847Schristos 132*a7c91847SchristosNote that if the "expand" keyword is omitted from the RCS file, the 133*a7c91847Schristosdefault is "kv". 134*a7c91847Schristos 135*a7c91847SchristosNote that the "comment {string};" syntax from rcsfile.5 specifies a 136*a7c91847Schristoscomment leader, which affects expansion of the $Log keyword for old 137*a7c91847Schristosversions of RCS. The comment leader is not used by RCS 5.7 or current 138*a7c91847Schristosversions of CVS. 139*a7c91847Schristos 140*a7c91847SchristosBoth RCS 5.7 and current versions of CVS handle the $Log keyword in a 141*a7c91847Schristosdifferent way if the log message starts with "checked in with -k by ". 142*a7c91847SchristosI don't think this behavior is documented anywhere. 143*a7c91847Schristos 144*a7c91847SchristosHere is a clarification regarding characters versus bytes in certain 145*a7c91847Schristoscharacter sets like JIS and Big5: 146*a7c91847Schristos 147*a7c91847Schristos The RCS file format, as described in the rcsfile(5) man page, is 148*a7c91847Schristos actually byte-oriented, not character-oriented, despite hints to 149*a7c91847Schristos the contrary in the man page. This distinction is important for 150*a7c91847Schristos multibyte characters. For example, if a multibyte character 151*a7c91847Schristos contains a `@' byte, the `@' must be doubled within strings in RCS 152*a7c91847Schristos files, since RCS uses `@' bytes as escapes. 153*a7c91847Schristos 154*a7c91847Schristos This point is not an issue for encodings like ISO 8859, which do 155*a7c91847Schristos not have multibyte characters. Nor is it an issue for encodings 156*a7c91847Schristos like UTF-8 and EUC-JIS, which never uses ASCII bytes within a 157*a7c91847Schristos multibyte character. It is an issue only for multibyte encodings 158*a7c91847Schristos like JIS and BIG5, which _do_ usurp ASCII bytes. 159*a7c91847Schristos 160*a7c91847Schristos If `@' doubling occurs within a multibyte char, the resulting RCS 161*a7c91847Schristos file is not a properly encoded text file. Instead, it is a byte 162*a7c91847Schristos stream that does not use a consistent character encoding that can 163*a7c91847Schristos be understood by the usual text tools, since doubling `@' messes 164*a7c91847Schristos up the encoding. This point affects only programs that examine 165*a7c91847Schristos the RCS files -- it doesn't affect the external RCS interface, as 166*a7c91847Schristos the RCS commands always give you the properly encoded text files 167*a7c91847Schristos and logs (assuming that you always check in properly encoded 168*a7c91847Schristos text). 169*a7c91847Schristos 170*a7c91847Schristos CVS 1.10 (and earlier) probably has some bugs in this area on 171*a7c91847Schristos systems where a C "char" is signed and where the data contains 172*a7c91847Schristos bytes with the eighth bit set. 173*a7c91847Schristos 174*a7c91847SchristosOne common concern about the RCS file format is the fact that to get 175*a7c91847Schristosthe head of a branch, one must apply deltas from the head of the trunk 176*a7c91847Schristosto the branchpoint, and then from the branchpoint to the head of the 177*a7c91847Schristosbranch. While more detailed analyses might be worth doing, we will 178*a7c91847Schristosnote: 179*a7c91847Schristos 180*a7c91847Schristos * The performance bottleneck for CVS generally is figuring out which 181*a7c91847Schristos files to operate on and that sort of thing, not applying deltas. 182*a7c91847Schristos 183*a7c91847Schristos * Here is one quick test (probably not a very good test; a better test 184*a7c91847Schristos would use a normally sized file (say 50-200K) instead of a small one): 185*a7c91847Schristos 186*a7c91847Schristos I just did a quick test with a small file (on a Sun Ultra 1/170E 187*a7c91847Schristos running Solaris 5.5.1), with 1000 revisions on the main branch and 188*a7c91847Schristos 1000 revisions on branch that forked at the root (i.e., RCS revisions 189*a7c91847Schristos 1.1, 1.2, ..., 1.1000, and branch revisions 1.1.1.1, 1.1.1.2, ..., 190*a7c91847Schristos 1.1.1.1000). It took about 0.15 seconds real time to check in the 191*a7c91847Schristos first revision, and about 0.6 seconds to check in and 0.3 seconds to 192*a7c91847Schristos retrieve revision 1.1.1.1000 (the worst case). 193*a7c91847Schristos 194*a7c91847Schristos * Any attempt to "fix" this problem should be careful not to interfere 195*a7c91847Schristos with other features, such as lightweight creation of branches 196*a7c91847Schristos (particularly using CVS magic branches). 197*a7c91847Schristos 198*a7c91847SchristosDiff follows: 199*a7c91847Schristos 200*a7c91847Schristos(Note that in the following diff the old value for the Id keyword was: 201*a7c91847Schristos Id: rcsfile.5in,v 5.6 1995/06/05 08:28:35 eggert Exp 202*a7c91847Schristosand the new one was: 203*a7c91847Schristos Id: rcsfile.5in,v 5.7 1996/12/09 17:31:44 eggert Exp 204*a7c91847Schristosbut since this file itself might be subject to keyword expansion I 205*a7c91847Schristoshaven't included a diff for that fact). 206*a7c91847Schristos 207*a7c91847Schristos=================================================================== 208*a7c91847SchristosRCS file: RCS/rcsfile.5in,v 209*a7c91847Schristosretrieving revision 5.6 210*a7c91847Schristosretrieving revision 5.7 211*a7c91847Schristosdiff -u -r5.6 -r5.7 212*a7c91847Schristos--- rcsfile.5in 1995/06/05 08:28:35 5.6 213*a7c91847Schristos+++ rcsfile.5in 1996/12/09 17:31:44 5.7 214*a7c91847Schristos@@ -85,7 +85,8 @@ 215*a7c91847Schristos .LP 216*a7c91847Schristos \f2sym\fP ::= {\f2digit\fP}* \f2idchar\fP {\f2idchar\fP | \f2digit\fP}* 217*a7c91847Schristos .LP 218*a7c91847Schristos-\f2idchar\fP ::= any visible graphic character except \f2special\fP 219*a7c91847Schristos+\f2idchar\fP ::= any visible graphic character, 220*a7c91847Schristos+ except \f2digit\fP or \f2special\fP 221*a7c91847Schristos .LP 222*a7c91847Schristos \f2special\fP ::= \f3$\fP | \f3,\fP | \f3.\fP | \f3:\fP | \f3;\fP | \f3@\fP 223*a7c91847Schristos .LP 224*a7c91847Schristos@@ -119,12 +120,23 @@ 225*a7c91847Schristos the minute (00\-59), 226*a7c91847Schristos and 227*a7c91847Schristos .I ss 228*a7c91847Schristos-the second (00\-60). 229*a7c91847Schristos+the second (00\-59). 230*a7c91847Schristos+If 231*a7c91847Schristos .I Y 232*a7c91847Schristos-contains just the last two digits of the year 233*a7c91847Schristos-for years from 1900 through 1999, 234*a7c91847Schristos-and all the digits of years thereafter. 235*a7c91847Schristos-Dates use the Gregorian calendar; times use UTC. 236*a7c91847Schristos+contains exactly two digits, 237*a7c91847Schristos+they are the last two digits of a year from 1900 through 1999; 238*a7c91847Schristos+otherwise, 239*a7c91847Schristos+.I Y 240*a7c91847Schristos+contains all the digits of the year. 241*a7c91847Schristos+Dates use the Gregorian calendar. 242*a7c91847Schristos+Times use UTC, except that for portability's sake leap seconds are not allowed; 243*a7c91847Schristos+implementations that support leap seconds should output 244*a7c91847Schristos+.B 59 245*a7c91847Schristos+for 246*a7c91847Schristos+.I ss 247*a7c91847Schristos+during an inserted leap second, and should accept 248*a7c91847Schristos+.B 59 249*a7c91847Schristos+for a deleted leap second. 250*a7c91847Schristos .PP 251*a7c91847Schristos The 252*a7c91847Schristos .I newphrase 253*a7c91847Schristos@@ -144,16 +156,23 @@ 254*a7c91847Schristos field in order of decreasing numbers. 255*a7c91847Schristos The 256*a7c91847Schristos .B head 257*a7c91847Schristos-field in the 258*a7c91847Schristos-.I admin 259*a7c91847Schristos-node points to the head of that sequence (i.e., contains 260*a7c91847Schristos+field points to the head of that sequence (i.e., contains 261*a7c91847Schristos the highest pair). 262*a7c91847Schristos The 263*a7c91847Schristos .B branch 264*a7c91847Schristos-node in the admin node indicates the default 265*a7c91847Schristos+field indicates the default 266*a7c91847Schristos branch (or revision) for most \*r operations. 267*a7c91847Schristos If empty, the default 268*a7c91847Schristos branch is the highest branch on the trunk. 269*a7c91847Schristos+The 270*a7c91847Schristos+.B symbols 271*a7c91847Schristos+field associates symbolic names with revisions. 272*a7c91847Schristos+For example, if the file contains 273*a7c91847Schristos+.B "symbols rr:1.1;" 274*a7c91847Schristos+then 275*a7c91847Schristos+.B rr 276*a7c91847Schristos+is a name for revision 277*a7c91847Schristos+.BR 1.1 . 278*a7c91847Schristos .PP 279*a7c91847Schristos All 280*a7c91847Schristos .I delta 281*a7c91847Schristos 282