xref: /netbsd-src/external/gpl2/xcvs/dist/doc/RCSFILES (revision a7c918477dd5f12c1da816ba05caf44eab2d06d6)
1*a7c91847SchristosIt would be nice if the RCS file format (which is implemented by a
2*a7c91847Schristosgreat many tools, both free and non-free, both by calling GNU RCS and
3*a7c91847Schristosby reimplementing access to RCS files) were documented in some
4*a7c91847Schristosstandard separate from any one tool.  But as far as I know no such
5*a7c91847Schristosstandard exists.  Hence this file.
6*a7c91847Schristos
7*a7c91847SchristosThe place to start is the rcsfile.5 manpage in the GNU RCS 5.7
8*a7c91847Schristosdistribution.  Then look at the diff at the end of this file (which
9*a7c91847Schristoscontains a few fixes and clarifications to that manpage).
10*a7c91847Schristos
11*a7c91847SchristosIf you are interested in MKS RCS, src/ci.c in GNU RCS 5.7 has a
12*a7c91847Schristoscomment about their date format.  However, as far as we know there
13*a7c91847Schristosisn't really any document describing MKS's changes to the RCS file
14*a7c91847Schristosformat.
15*a7c91847Schristos
16*a7c91847SchristosThe rcsfile.5 manpage does not document what goes in the "text" field
17*a7c91847Schristosfor each revision.  The answer is that the head revision contains the
18*a7c91847Schristoscontents of that revision and every other revision contain a bunch of
19*a7c91847Schristosedits to produce that revision ("a" and "d" lines).  The GNU diff
20*a7c91847Schristosmanual (the version I looked at was for GNU diff 2.4) documents this
21*a7c91847Schristosformat somewhat (as the "RCS output format"), but the presentation is
22*a7c91847Schristosa bit confusing as it is all tangled up with the documentation of
23*a7c91847Schristosseveral other output formats.  If you just want some source code to
24*a7c91847Schristoslook at, the part of CVS which applies these is RCS_deltas in
25*a7c91847Schristossrc/rcs.c.
26*a7c91847Schristos
27*a7c91847SchristosThe rcsfile.5 documentation only _very_ briefly touches on the order
28*a7c91847Schristosof the revisions.  The order _is_ important and CVS relies on it.
29*a7c91847SchristosHere is an example of what I was able to find, based on the join3
30*a7c91847Schristossanity.sh testcase (and the behavior I am documenting here seems to be
31*a7c91847Schristosthe same for RCS 5.7 and CVS 1.9.27):
32*a7c91847Schristos
33*a7c91847Schristos    1.1 ----------------->  1.2
34*a7c91847Schristos     \---> 1.1.2.1           \---> 1.2.2.1
35*a7c91847Schristos
36*a7c91847SchristosHere is how this shows up in the RCS file (omitting irrelevant parts):
37*a7c91847Schristos
38*a7c91847Schristos  admin:  head 1.2;
39*a7c91847Schristos  deltas:
40*a7c91847Schristos    1.2 branches 1.2.2.1; next 1.1;
41*a7c91847Schristos    1.1 branches 1.1.2.1; next;
42*a7c91847Schristos    1.1.2.1 branches; next;
43*a7c91847Schristos    1.2.2.1 branches; next;
44*a7c91847Schristos  deltatexts:
45*a7c91847Schristos    1.2
46*a7c91847Schristos    1.2.2.1
47*a7c91847Schristos    1.1
48*a7c91847Schristos    1.1.2.1
49*a7c91847Schristos
50*a7c91847SchristosYes, the order seems to differ between the deltas and the deltatexts.
51*a7c91847SchristosI have no idea how much of this should actually be considered part of
52*a7c91847Schristosthe RCS file format, and how much programs reading it should expect to
53*a7c91847Schristosencounter any order.
54*a7c91847Schristos
55*a7c91847SchristosThe rcsfile.5 grammar shows the {num} after "next" as optional; if it
56*a7c91847Schristosis omitted then there is no next delta node (for example 1.1 or the
57*a7c91847Schristoshead of a branch will typically have no next).
58*a7c91847Schristos
59*a7c91847SchristosThere is one case where CVS uses CVS-specific, non-compatible changes
60*a7c91847Schristosto the RCS file format, and this is magic branches.  See cvs.texinfo
61*a7c91847Schristosfor more information on them.  CVS also sets the RCS state to "dead"
62*a7c91847Schristosto indicate that a file does not exist in a given revision (this is
63*a7c91847Schristosstored just as any other RCS state is).
64*a7c91847Schristos
65*a7c91847SchristosThe RCS file format allows quite a variety of extensions to be added
66*a7c91847Schristosin a compatible manner by use of the "newphrase" feature documented in
67*a7c91847Schristosrcsfile.5.  We won't try to document extensions not used by CVS in any
68*a7c91847Schristosdetail, but we will briefly list them.  Each occurrence of a newphrase
69*a7c91847Schristosbegins with an identifier, which is what we list here.  Future
70*a7c91847Schristosdesigners of extensions are strongly encouraged to pick
71*a7c91847Schristosnon-conflicting identifiers.  Note that newphrase occurs several
72*a7c91847Schristosplaces in the RCS grammar, and a given extension may not be legal in
73*a7c91847Schristosall locations.  However, it seems better to reserve a particular
74*a7c91847Schristosidentifier for all locations, to avoid confusion and complicated
75*a7c91847Schristosrules.
76*a7c91847Schristos
77*a7c91847Schristos   Identifier   Used by
78*a7c91847Schristos   ----------   -------
79*a7c91847Schristos   namespace    RCS library done at Silicon Graphics Inc. (SGI) in 1996
80*a7c91847Schristos                (a modified RCS 5.7--not sure it has any other name).
81*a7c91847Schristos   dead         A set of RCS patches developed by Rich Pixley at
82*a7c91847Schristos                Cygnus about 1992.  These were for CVS, and predated
83*a7c91847Schristos                the current CVS death support, which uses a state "dead"
84*a7c91847Schristos                rather than a "dead" newphrase.
85*a7c91847Schristos
86*a7c91847SchristosCVS does use newphrases to implement the `PreservePermissions'
87*a7c91847Schristosextension introduced in CVS 1.9.26.  The following new keywords are
88*a7c91847Schristosdefined when PreservePermissions=yes:
89*a7c91847Schristos
90*a7c91847Schristos   owner
91*a7c91847Schristos   group
92*a7c91847Schristos   permissions
93*a7c91847Schristos   special
94*a7c91847Schristos   symlink
95*a7c91847Schristos   hardlinks
96*a7c91847Schristos
97*a7c91847SchristosThe contents of the `owner' and `group' field should be a numeric uid
98*a7c91847Schristosand a numeric gid, respectively, representing the user and group who
99*a7c91847Schristosown the file.  The `permissions' field contains an octal integer,
100*a7c91847Schristosrepresenting the permissions that should be applied to the file.  The
101*a7c91847Schristos`special' field contains two words; the first must be either `block'
102*a7c91847Schristosor `character', and the second is the file's device number.  The
103*a7c91847Schristos`symlink' field should be present only in files which are symbolic
104*a7c91847Schristoslinks to other files, and absent on all regular files.  The
105*a7c91847Schristos`hardlinks' field contains a list of filenames to which the current
106*a7c91847Schristosfile is linked, in alphabetical order.  Because files often contain
107*a7c91847Schristoscharacters special to RCS, like `.' and sometimes even contain spaces
108*a7c91847Schristosor eight-bit characters, the filenames in the hardlinks field will
109*a7c91847Schristosusually be enclosed in RCS strings.  For example:
110*a7c91847Schristos
111*a7c91847Schristos	hardlinks	README @install.txt@ @Installation Notes@;
112*a7c91847Schristos
113*a7c91847SchristosThe hardlinks field should always include the name of the current
114*a7c91847Schristosfile.  That is, in the repository file README,v, any hardlinks fields
115*a7c91847Schristosin the delta nodes should include `README'; CVS will not operate
116*a7c91847Schristosproperly if this is not done.
117*a7c91847Schristos
118*a7c91847SchristosNewphrases are also used to implement the 'commitid' feature. The
119*a7c91847Schristosfollowing new keyword is defined:
120*a7c91847Schristos
121*a7c91847Schristos   commitid
122*a7c91847Schristos
123*a7c91847SchristosThe rules regarding keyword expansion are not documented along with
124*a7c91847Schristosthe rest of the RCS file format; they are documented in the co(1)
125*a7c91847Schristosmanpage in the RCS 5.7 distribution.  See also the "Keyword
126*a7c91847Schristossubstitution" chapter of cvs.texinfo.  The co(1) manpage refers to
127*a7c91847Schristosspecial behavior if the log prefix for the $Log keyword is /* or (*.
128*a7c91847SchristosRCS 5.7 produces a warning whenever it behaves that way, and current
129*a7c91847Schristosversions of CVS do not handle this case in a special way (CVS 1.9 and
130*a7c91847Schristosearlier invoke RCS to perform keyword expansion).
131*a7c91847Schristos
132*a7c91847SchristosNote that if the "expand" keyword is omitted from the RCS file, the
133*a7c91847Schristosdefault is "kv".
134*a7c91847Schristos
135*a7c91847SchristosNote that the "comment {string};" syntax from rcsfile.5 specifies a
136*a7c91847Schristoscomment leader, which affects expansion of the $Log keyword for old
137*a7c91847Schristosversions of RCS.  The comment leader is not used by RCS 5.7 or current
138*a7c91847Schristosversions of CVS.
139*a7c91847Schristos
140*a7c91847SchristosBoth RCS 5.7 and current versions of CVS handle the $Log keyword in a
141*a7c91847Schristosdifferent way if the log message starts with "checked in with -k by ".
142*a7c91847SchristosI don't think this behavior is documented anywhere.
143*a7c91847Schristos
144*a7c91847SchristosHere is a clarification regarding characters versus bytes in certain
145*a7c91847Schristoscharacter sets like JIS and Big5:
146*a7c91847Schristos
147*a7c91847Schristos    The RCS file format, as described in the rcsfile(5) man page, is
148*a7c91847Schristos    actually byte-oriented, not character-oriented, despite hints to
149*a7c91847Schristos    the contrary in the man page.  This distinction is important for
150*a7c91847Schristos    multibyte characters.  For example, if a multibyte character
151*a7c91847Schristos    contains a `@' byte, the `@' must be doubled within strings in RCS
152*a7c91847Schristos    files, since RCS uses `@' bytes as escapes.
153*a7c91847Schristos
154*a7c91847Schristos    This point is not an issue for encodings like ISO 8859, which do
155*a7c91847Schristos    not have multibyte characters.  Nor is it an issue for encodings
156*a7c91847Schristos    like UTF-8 and EUC-JIS, which never uses ASCII bytes within a
157*a7c91847Schristos    multibyte character.  It is an issue only for multibyte encodings
158*a7c91847Schristos    like JIS and BIG5, which _do_ usurp ASCII bytes.
159*a7c91847Schristos
160*a7c91847Schristos    If `@' doubling occurs within a multibyte char, the resulting RCS
161*a7c91847Schristos    file is not a properly encoded text file.  Instead, it is a byte
162*a7c91847Schristos    stream that does not use a consistent character encoding that can
163*a7c91847Schristos    be understood by the usual text tools, since doubling `@' messes
164*a7c91847Schristos    up the encoding.  This point affects only programs that examine
165*a7c91847Schristos    the RCS files -- it doesn't affect the external RCS interface, as
166*a7c91847Schristos    the RCS commands always give you the properly encoded text files
167*a7c91847Schristos    and logs (assuming that you always check in properly encoded
168*a7c91847Schristos    text).
169*a7c91847Schristos
170*a7c91847Schristos    CVS 1.10 (and earlier) probably has some bugs in this area on
171*a7c91847Schristos    systems where a C "char" is signed and where the data contains
172*a7c91847Schristos    bytes with the eighth bit set.
173*a7c91847Schristos
174*a7c91847SchristosOne common concern about the RCS file format is the fact that to get
175*a7c91847Schristosthe head of a branch, one must apply deltas from the head of the trunk
176*a7c91847Schristosto the branchpoint, and then from the branchpoint to the head of the
177*a7c91847Schristosbranch.  While more detailed analyses might be worth doing, we will
178*a7c91847Schristosnote:
179*a7c91847Schristos
180*a7c91847Schristos    * The performance bottleneck for CVS generally is figuring out which
181*a7c91847Schristos    files to operate on and that sort of thing, not applying deltas.
182*a7c91847Schristos
183*a7c91847Schristos    * Here is one quick test (probably not a very good test; a better test
184*a7c91847Schristos    would use a normally sized file (say 50-200K) instead of a small one):
185*a7c91847Schristos
186*a7c91847Schristos	I just did a quick test with a small file (on a Sun Ultra 1/170E
187*a7c91847Schristos	running Solaris 5.5.1), with 1000 revisions on the main branch and
188*a7c91847Schristos	1000 revisions on branch that forked at the root (i.e., RCS revisions
189*a7c91847Schristos	1.1, 1.2, ..., 1.1000, and branch revisions 1.1.1.1, 1.1.1.2, ...,
190*a7c91847Schristos	1.1.1.1000).  It took about 0.15 seconds real time to check in the
191*a7c91847Schristos	first revision, and about 0.6 seconds to check in and 0.3 seconds to
192*a7c91847Schristos	retrieve revision 1.1.1.1000 (the worst case).
193*a7c91847Schristos
194*a7c91847Schristos    * Any attempt to "fix" this problem should be careful not to interfere
195*a7c91847Schristos    with other features, such as lightweight creation of branches
196*a7c91847Schristos    (particularly using CVS magic branches).
197*a7c91847Schristos
198*a7c91847SchristosDiff follows:
199*a7c91847Schristos
200*a7c91847Schristos(Note that in the following diff the old value for the Id keyword was:
201*a7c91847Schristos    Id: rcsfile.5in,v 5.6 1995/06/05 08:28:35 eggert Exp
202*a7c91847Schristosand the new one was:
203*a7c91847Schristos    Id: rcsfile.5in,v 5.7 1996/12/09 17:31:44 eggert Exp
204*a7c91847Schristosbut since this file itself might be subject to keyword expansion I
205*a7c91847Schristoshaven't included a diff for that fact).
206*a7c91847Schristos
207*a7c91847Schristos===================================================================
208*a7c91847SchristosRCS file: RCS/rcsfile.5in,v
209*a7c91847Schristosretrieving revision 5.6
210*a7c91847Schristosretrieving revision 5.7
211*a7c91847Schristosdiff -u -r5.6 -r5.7
212*a7c91847Schristos--- rcsfile.5in	1995/06/05 08:28:35	5.6
213*a7c91847Schristos+++ rcsfile.5in	1996/12/09 17:31:44	5.7
214*a7c91847Schristos@@ -85,7 +85,8 @@
215*a7c91847Schristos .LP
216*a7c91847Schristos \f2sym\fP	::=	{\f2digit\fP}* \f2idchar\fP {\f2idchar\fP | \f2digit\fP}*
217*a7c91847Schristos .LP
218*a7c91847Schristos-\f2idchar\fP	::=	any visible graphic character except \f2special\fP
219*a7c91847Schristos+\f2idchar\fP	::=	any visible graphic character,
220*a7c91847Schristos+		except \f2digit\fP or \f2special\fP
221*a7c91847Schristos .LP
222*a7c91847Schristos \f2special\fP	::=	\f3$\fP | \f3,\fP | \f3.\fP | \f3:\fP | \f3;\fP | \f3@\fP
223*a7c91847Schristos .LP
224*a7c91847Schristos@@ -119,12 +120,23 @@
225*a7c91847Schristos the minute (00\-59),
226*a7c91847Schristos and
227*a7c91847Schristos .I ss
228*a7c91847Schristos-the second (00\-60).
229*a7c91847Schristos+the second (00\-59).
230*a7c91847Schristos+If
231*a7c91847Schristos .I Y
232*a7c91847Schristos-contains just the last two digits of the year
233*a7c91847Schristos-for years from 1900 through 1999,
234*a7c91847Schristos-and all the digits of years thereafter.
235*a7c91847Schristos-Dates use the Gregorian calendar; times use UTC.
236*a7c91847Schristos+contains exactly two digits,
237*a7c91847Schristos+they are the last two digits of a year from 1900 through 1999;
238*a7c91847Schristos+otherwise,
239*a7c91847Schristos+.I Y
240*a7c91847Schristos+contains all the digits of the year.
241*a7c91847Schristos+Dates use the Gregorian calendar.
242*a7c91847Schristos+Times use UTC, except that for portability's sake leap seconds are not allowed;
243*a7c91847Schristos+implementations that support leap seconds should output
244*a7c91847Schristos+.B 59
245*a7c91847Schristos+for
246*a7c91847Schristos+.I ss
247*a7c91847Schristos+during an inserted leap second, and should accept
248*a7c91847Schristos+.B 59
249*a7c91847Schristos+for a deleted leap second.
250*a7c91847Schristos .PP
251*a7c91847Schristos The
252*a7c91847Schristos .I newphrase
253*a7c91847Schristos@@ -144,16 +156,23 @@
254*a7c91847Schristos field in order of decreasing numbers.
255*a7c91847Schristos The
256*a7c91847Schristos .B head
257*a7c91847Schristos-field in the
258*a7c91847Schristos-.I admin
259*a7c91847Schristos-node points to the head of that sequence (i.e., contains
260*a7c91847Schristos+field points to the head of that sequence (i.e., contains
261*a7c91847Schristos the highest pair).
262*a7c91847Schristos The
263*a7c91847Schristos .B branch
264*a7c91847Schristos-node in the admin node indicates the default
265*a7c91847Schristos+field indicates the default
266*a7c91847Schristos branch (or revision) for most \*r operations.
267*a7c91847Schristos If empty, the default
268*a7c91847Schristos branch is the highest branch on the trunk.
269*a7c91847Schristos+The
270*a7c91847Schristos+.B symbols
271*a7c91847Schristos+field associates symbolic names with revisions.
272*a7c91847Schristos+For example, if the file contains
273*a7c91847Schristos+.B "symbols rr:1.1;"
274*a7c91847Schristos+then
275*a7c91847Schristos+.B rr
276*a7c91847Schristos+is a name for revision
277*a7c91847Schristos+.BR 1.1 .
278*a7c91847Schristos .PP
279*a7c91847Schristos All
280*a7c91847Schristos .I delta
281*a7c91847Schristos
282