xref: /netbsd-src/external/gpl3/gcc.old/dist/gcc/doc/gty.texi (revision b7b7574d3bf8eeb51a1fa3977b59142ec6434a55)
1@c Copyright (C) 2002, 2003, 2004, 2007, 2008, 2009
2@c Free Software Foundation, Inc.
3@c This is part of the GCC manual.
4@c For copying conditions, see the file gcc.texi.
5
6@node Type Information
7@chapter Memory Management and Type Information
8@cindex GGC
9@findex GTY
10
11GCC uses some fairly sophisticated memory management techniques, which
12involve determining information about GCC's data structures from GCC's
13source code and using this information to perform garbage collection and
14implement precompiled headers.
15
16A full C parser would be too complicated for this task, so a limited
17subset of C is interpreted and special markers are used to determine
18what parts of the source to look at.  All @code{struct} and
19@code{union} declarations that define data structures that are
20allocated under control of the garbage collector must be marked.  All
21global variables that hold pointers to garbage-collected memory must
22also be marked.  Finally, all global variables that need to be saved
23and restored by a precompiled header must be marked.  (The precompiled
24header mechanism can only save static variables if they're scalar.
25Complex data structures must be allocated in garbage-collected memory
26to be saved in a precompiled header.)
27
28The full format of a marker is
29@smallexample
30GTY (([@var{option}] [(@var{param})], [@var{option}] [(@var{param})] @dots{}))
31@end smallexample
32@noindent
33but in most cases no options are needed.  The outer double parentheses
34are still necessary, though: @code{GTY(())}.  Markers can appear:
35
36@itemize @bullet
37@item
38In a structure definition, before the open brace;
39@item
40In a global variable declaration, after the keyword @code{static} or
41@code{extern}; and
42@item
43In a structure field definition, before the name of the field.
44@end itemize
45
46Here are some examples of marking simple data structures and globals.
47
48@smallexample
49struct GTY(()) @var{tag}
50@{
51  @var{fields}@dots{}
52@};
53
54typedef struct GTY(()) @var{tag}
55@{
56  @var{fields}@dots{}
57@} *@var{typename};
58
59static GTY(()) struct @var{tag} *@var{list};   /* @r{points to GC memory} */
60static GTY(()) int @var{counter};        /* @r{save counter in a PCH} */
61@end smallexample
62
63The parser understands simple typedefs such as
64@code{typedef struct @var{tag} *@var{name};} and
65@code{typedef int @var{name};}.
66These don't need to be marked.
67
68@menu
69* GTY Options::         What goes inside a @code{GTY(())}.
70* GGC Roots::           Making global variables GGC roots.
71* Files::               How the generated files work.
72* Invoking the garbage collector::   How to invoke the garbage collector.
73@end menu
74
75@node GTY Options
76@section The Inside of a @code{GTY(())}
77
78Sometimes the C code is not enough to fully describe the type
79structure.  Extra information can be provided with @code{GTY} options
80and additional markers.  Some options take a parameter, which may be
81either a string or a type name, depending on the parameter.  If an
82option takes no parameter, it is acceptable either to omit the
83parameter entirely, or to provide an empty string as a parameter.  For
84example, @code{@w{GTY ((skip))}} and @code{@w{GTY ((skip ("")))}} are
85equivalent.
86
87When the parameter is a string, often it is a fragment of C code.  Four
88special escapes may be used in these strings, to refer to pieces of
89the data structure being marked:
90
91@cindex % in GTY option
92@table @code
93@item %h
94The current structure.
95@item %1
96The structure that immediately contains the current structure.
97@item %0
98The outermost structure that contains the current structure.
99@item %a
100A partial expression of the form @code{[i1][i2]@dots{}} that indexes
101the array item currently being marked.
102@end table
103
104For instance, suppose that you have a structure of the form
105@smallexample
106struct A @{
107  @dots{}
108@};
109struct B @{
110  struct A foo[12];
111@};
112@end smallexample
113@noindent
114and @code{b} is a variable of type @code{struct B}.  When marking
115@samp{b.foo[11]}, @code{%h} would expand to @samp{b.foo[11]},
116@code{%0} and @code{%1} would both expand to @samp{b}, and @code{%a}
117would expand to @samp{[11]}.
118
119As in ordinary C, adjacent strings will be concatenated; this is
120helpful when you have a complicated expression.
121@smallexample
122@group
123GTY ((chain_next ("TREE_CODE (&%h.generic) == INTEGER_TYPE"
124                  " ? TYPE_NEXT_VARIANT (&%h.generic)"
125                  " : TREE_CHAIN (&%h.generic)")))
126@end group
127@end smallexample
128
129The available options are:
130
131@table @code
132@findex length
133@item length ("@var{expression}")
134
135There are two places the type machinery will need to be explicitly told
136the length of an array.  The first case is when a structure ends in a
137variable-length array, like this:
138@smallexample
139struct GTY(()) rtvec_def @{
140  int num_elem;         /* @r{number of elements} */
141  rtx GTY ((length ("%h.num_elem"))) elem[1];
142@};
143@end smallexample
144
145In this case, the @code{length} option is used to override the specified
146array length (which should usually be @code{1}).  The parameter of the
147option is a fragment of C code that calculates the length.
148
149The second case is when a structure or a global variable contains a
150pointer to an array, like this:
151@smallexample
152tree *
153  GTY ((length ("%h.regno_pointer_align_length"))) regno_decl;
154@end smallexample
155In this case, @code{regno_decl} has been allocated by writing something like
156@smallexample
157  x->regno_decl =
158    ggc_alloc (x->regno_pointer_align_length * sizeof (tree));
159@end smallexample
160and the @code{length} provides the length of the field.
161
162This second use of @code{length} also works on global variables, like:
163@verbatim
164  static GTY((length ("reg_base_value_size")))
165    rtx *reg_base_value;
166@end verbatim
167
168@findex skip
169@item skip
170
171If @code{skip} is applied to a field, the type machinery will ignore it.
172This is somewhat dangerous; the only safe use is in a union when one
173field really isn't ever used.
174
175@findex desc
176@findex tag
177@findex default
178@item desc ("@var{expression}")
179@itemx tag ("@var{constant}")
180@itemx default
181
182The type machinery needs to be told which field of a @code{union} is
183currently active.  This is done by giving each field a constant
184@code{tag} value, and then specifying a discriminator using @code{desc}.
185The value of the expression given by @code{desc} is compared against
186each @code{tag} value, each of which should be different.  If no
187@code{tag} is matched, the field marked with @code{default} is used if
188there is one, otherwise no field in the union will be marked.
189
190In the @code{desc} option, the ``current structure'' is the union that
191it discriminates.  Use @code{%1} to mean the structure containing it.
192There are no escapes available to the @code{tag} option, since it is a
193constant.
194
195For example,
196@smallexample
197struct GTY(()) tree_binding
198@{
199  struct tree_common common;
200  union tree_binding_u @{
201    tree GTY ((tag ("0"))) scope;
202    struct cp_binding_level * GTY ((tag ("1"))) level;
203  @} GTY ((desc ("BINDING_HAS_LEVEL_P ((tree)&%0)"))) xscope;
204  tree value;
205@};
206@end smallexample
207
208In this example, the value of BINDING_HAS_LEVEL_P when applied to a
209@code{struct tree_binding *} is presumed to be 0 or 1.  If 1, the type
210mechanism will treat the field @code{level} as being present and if 0,
211will treat the field @code{scope} as being present.
212
213@findex param_is
214@findex use_param
215@item param_is (@var{type})
216@itemx use_param
217
218Sometimes it's convenient to define some data structure to work on
219generic pointers (that is, @code{PTR}) and then use it with a specific
220type.  @code{param_is} specifies the real type pointed to, and
221@code{use_param} says where in the generic data structure that type
222should be put.
223
224For instance, to have a @code{htab_t} that points to trees, one would
225write the definition of @code{htab_t} like this:
226@smallexample
227typedef struct GTY(()) @{
228  @dots{}
229  void ** GTY ((use_param, @dots{})) entries;
230  @dots{}
231@} htab_t;
232@end smallexample
233and then declare variables like this:
234@smallexample
235  static htab_t GTY ((param_is (union tree_node))) ict;
236@end smallexample
237
238@findex param@var{n}_is
239@findex use_param@var{n}
240@item param@var{n}_is (@var{type})
241@itemx use_param@var{n}
242
243In more complicated cases, the data structure might need to work on
244several different types, which might not necessarily all be pointers.
245For this, @code{param1_is} through @code{param9_is} may be used to
246specify the real type of a field identified by @code{use_param1} through
247@code{use_param9}.
248
249@findex use_params
250@item use_params
251
252When a structure contains another structure that is parameterized,
253there's no need to do anything special, the inner structure inherits the
254parameters of the outer one.  When a structure contains a pointer to a
255parameterized structure, the type machinery won't automatically detect
256this (it could, it just doesn't yet), so it's necessary to tell it that
257the pointed-to structure should use the same parameters as the outer
258structure.  This is done by marking the pointer with the
259@code{use_params} option.
260
261@findex deletable
262@item deletable
263
264@code{deletable}, when applied to a global variable, indicates that when
265garbage collection runs, there's no need to mark anything pointed to
266by this variable, it can just be set to @code{NULL} instead.  This is used
267to keep a list of free structures around for re-use.
268
269@findex if_marked
270@item if_marked ("@var{expression}")
271
272Suppose you want some kinds of object to be unique, and so you put them
273in a hash table.  If garbage collection marks the hash table, these
274objects will never be freed, even if the last other reference to them
275goes away.  GGC has special handling to deal with this: if you use the
276@code{if_marked} option on a global hash table, GGC will call the
277routine whose name is the parameter to the option on each hash table
278entry.  If the routine returns nonzero, the hash table entry will
279be marked as usual.  If the routine returns zero, the hash table entry
280will be deleted.
281
282The routine @code{ggc_marked_p} can be used to determine if an element
283has been marked already; in fact, the usual case is to use
284@code{if_marked ("ggc_marked_p")}.
285
286@findex mark_hook
287@item mark_hook ("@var{hook-routine-name}")
288
289If provided for a structure or union type, the given
290@var{hook-routine-name} (between double-quotes) is the name of a
291routine called when the garbage collector has just marked the data as
292reachable. This routine should not change the data, or call any ggc
293routine. Its only argument is a pointer to the just marked (const)
294structure or union.
295
296@findex maybe_undef
297@item maybe_undef
298
299When applied to a field, @code{maybe_undef} indicates that it's OK if
300the structure that this fields points to is never defined, so long as
301this field is always @code{NULL}.  This is used to avoid requiring
302backends to define certain optional structures.  It doesn't work with
303language frontends.
304
305@findex nested_ptr
306@item nested_ptr (@var{type}, "@var{to expression}", "@var{from expression}")
307
308The type machinery expects all pointers to point to the start of an
309object.  Sometimes for abstraction purposes it's convenient to have
310a pointer which points inside an object.  So long as it's possible to
311convert the original object to and from the pointer, such pointers
312can still be used.  @var{type} is the type of the original object,
313the @var{to expression} returns the pointer given the original object,
314and the @var{from expression} returns the original object given
315the pointer.  The pointer will be available using the @code{%h}
316escape.
317
318@findex chain_next
319@findex chain_prev
320@findex chain_circular
321@item chain_next ("@var{expression}")
322@itemx chain_prev ("@var{expression}")
323@itemx chain_circular ("@var{expression}")
324
325It's helpful for the type machinery to know if objects are often
326chained together in long lists; this lets it generate code that uses
327less stack space by iterating along the list instead of recursing down
328it.  @code{chain_next} is an expression for the next item in the list,
329@code{chain_prev} is an expression for the previous item.  For singly
330linked lists, use only @code{chain_next}; for doubly linked lists, use
331both.  The machinery requires that taking the next item of the
332previous item gives the original item.  @code{chain_circular} is similar
333to @code{chain_next}, but can be used for circular single linked lists.
334
335@findex reorder
336@item reorder ("@var{function name}")
337
338Some data structures depend on the relative ordering of pointers.  If
339the precompiled header machinery needs to change that ordering, it
340will call the function referenced by the @code{reorder} option, before
341changing the pointers in the object that's pointed to by the field the
342option applies to.  The function must take four arguments, with the
343signature @samp{@w{void *, void *, gt_pointer_operator, void *}}.
344The first parameter is a pointer to the structure that contains the
345object being updated, or the object itself if there is no containing
346structure.  The second parameter is a cookie that should be ignored.
347The third parameter is a routine that, given a pointer, will update it
348to its correct new value.  The fourth parameter is a cookie that must
349be passed to the second parameter.
350
351PCH cannot handle data structures that depend on the absolute values
352of pointers.  @code{reorder} functions can be expensive.  When
353possible, it is better to depend on properties of the data, like an ID
354number or the hash of a string instead.
355
356@findex special
357@item special ("@var{name}")
358
359The @code{special} option is used to mark types that have to be dealt
360with by special case machinery.  The parameter is the name of the
361special case.  See @file{gengtype.c} for further details.  Avoid
362adding new special cases unless there is no other alternative.
363@end table
364
365@node GGC Roots
366@section Marking Roots for the Garbage Collector
367@cindex roots, marking
368@cindex marking roots
369
370In addition to keeping track of types, the type machinery also locates
371the global variables (@dfn{roots}) that the garbage collector starts
372at.  Roots must be declared using one of the following syntaxes:
373
374@itemize @bullet
375@item
376@code{extern GTY(([@var{options}])) @var{type} @var{name};}
377@item
378@code{static GTY(([@var{options}])) @var{type} @var{name};}
379@end itemize
380@noindent
381The syntax
382@itemize @bullet
383@item
384@code{GTY(([@var{options}])) @var{type} @var{name};}
385@end itemize
386@noindent
387is @emph{not} accepted.  There should be an @code{extern} declaration
388of such a variable in a header somewhere---mark that, not the
389definition.  Or, if the variable is only used in one file, make it
390@code{static}.
391
392@node Files
393@section Source Files Containing Type Information
394@cindex generated files
395@cindex files, generated
396
397Whenever you add @code{GTY} markers to a source file that previously
398had none, or create a new source file containing @code{GTY} markers,
399there are three things you need to do:
400
401@enumerate
402@item
403You need to add the file to the list of source files the type
404machinery scans.  There are four cases:
405
406@enumerate a
407@item
408For a back-end file, this is usually done
409automatically; if not, you should add it to @code{target_gtfiles} in
410the appropriate port's entries in @file{config.gcc}.
411
412@item
413For files shared by all front ends, add the filename to the
414@code{GTFILES} variable in @file{Makefile.in}.
415
416@item
417For files that are part of one front end, add the filename to the
418@code{gtfiles} variable defined in the appropriate
419@file{config-lang.in}.  For C, the file is @file{c-config-lang.in}.
420Headers should appear before non-headers in this list.
421
422@item
423For files that are part of some but not all front ends, add the
424filename to the @code{gtfiles} variable of @emph{all} the front ends
425that use it.
426@end enumerate
427
428@item
429If the file was a header file, you'll need to check that it's included
430in the right place to be visible to the generated files.  For a back-end
431header file, this should be done automatically.  For a front-end header
432file, it needs to be included by the same file that includes
433@file{gtype-@var{lang}.h}.  For other header files, it needs to be
434included in @file{gtype-desc.c}, which is a generated file, so add it to
435@code{ifiles} in @code{open_base_file} in @file{gengtype.c}.
436
437For source files that aren't header files, the machinery will generate a
438header file that should be included in the source file you just changed.
439The file will be called @file{gt-@var{path}.h} where @var{path} is the
440pathname relative to the @file{gcc} directory with slashes replaced by
441@verb{|-|}, so for example the header file to be included in
442@file{cp/parser.c} is called @file{gt-cp-parser.c}.  The
443generated header file should be included after everything else in the
444source file.  Don't forget to mention this file as a dependency in the
445@file{Makefile}!
446
447@end enumerate
448
449For language frontends, there is another file that needs to be included
450somewhere.  It will be called @file{gtype-@var{lang}.h}, where
451@var{lang} is the name of the subdirectory the language is contained in.
452
453Plugins can add additional root tables.  Run the @code{gengtype}
454utility in plugin mode as @code{gengtype -P pluginout.h @var{source-dir}
455@var{file-list} @var{plugin*.c}} with your plugin files
456@var{plugin*.c} using @code{GTY} to generate the @var{pluginout.h} file.
457The GCC build tree is needed to be present in that mode.
458
459
460@node Invoking the garbage collector
461@section How to invoke the garbage collector
462@cindex garbage collector, invocation
463@findex ggc_collect
464
465The GCC garbage collector GGC is only invoked explicitly. In contrast
466with many other garbage collectors, it is not implicitly invoked by
467allocation routines when a lot of memory has been consumed. So the
468only way to have GGC reclaim storage it to call the @code{ggc_collect}
469function explicitly. This call is an expensive operation, as it may
470have to scan the entire heap. Beware that local variables (on the GCC
471call stack) are not followed by such an invocation (as many other
472garbage collectors do): you should reference all your data from static
473or external @code{GTY}-ed variables, and it is advised to call
474@code{ggc_collect} with a shallow call stack. The GGC is an exact mark
475and sweep garbage collector (so it does not scan the call stack for
476pointers). In practice GCC passes don't often call @code{ggc_collect}
477themselves, because it is called by the pass manager between passes.
478