1\input texinfo @c -*-texinfo-*- 2 3@c NOTE THIS IS NOT A GOOD EXAMPLE OF HOW TO DO A MANUAL. FIXME!!! 4@c NOTE THIS IS NOT A GOOD EXAMPLE OF HOW TO DO A MANUAL. FIXME!!! 5@c NOTE THIS IS NOT A GOOD EXAMPLE OF HOW TO DO A MANUAL. FIXME!!! 6 7 8@c %**start of header 9@setfilename treelang.info 10 11@include gcc-common.texi 12 13@set version-treelang 1.0 14 15@set last-update 2001-07-30 16@set copyrights-treelang 1995,1996,1997,1998,1999,2000,2001,2002 17 18@set email-general gcc@@gcc.gnu.org 19@set email-bugs gcc-bugs@@gcc.gnu.org or bug-gcc@@gnu.org 20@set email-patches gcc-patches@@gcc.gnu.org 21@set path-treelang gcc/gcc/treelang 22 23@set which-treelang GCC-@value{version-GCC} 24@set which-GCC GCC 25 26@set email-josling tej@@melbpc.org.au 27@set www-josling http://www.geocities.com/timjosling 28 29@c This tells @include'd files that they're part of the overall TREELANG doc 30@c set. (They might be part of a higher-level doc set too.) 31@set DOC-TREELANG 32 33@c @setfilename usetreelang.info 34@c @setfilename maintaintreelang.info 35@c To produce the full manual, use the "treelang.info" setfilename, and 36@c make sure the following do NOT begin with '@c' (and the @clear lines DO) 37@set INTERNALS 38@set USING 39@c To produce a user-only manual, use the "usetreelang.info" setfilename, and 40@c make sure the following does NOT begin with '@c': 41@c @clear INTERNALS 42@c To produce a maintainer-only manual, use the "maintaintreelang.info" setfilename, 43@c and make sure the following does NOT begin with '@c': 44@c @clear USING 45 46@ifset INTERNALS 47@ifset USING 48@settitle Using and Maintaining GNU Treelang 49@end ifset 50@end ifset 51@c seems reasonable to assume at least one of INTERNALS or USING is set... 52@ifclear INTERNALS 53@settitle Using GNU Treelang 54@end ifclear 55@ifclear USING 56@settitle Maintaining GNU Treelang 57@end ifclear 58@c then again, have some fun 59@ifclear INTERNALS 60@ifclear USING 61@settitle Doing Very Little at all with GNU Treelang 62@end ifclear 63@end ifclear 64 65@syncodeindex fn cp 66@syncodeindex vr cp 67@c %**end of header 68 69@c Cause even numbered pages to be printed on the left hand side of 70@c the page and odd numbered pages to be printed on the right hand 71@c side of the page. Using this, you can print on both sides of a 72@c sheet of paper and have the text on the same part of the sheet. 73 74@c The text on right hand pages is pushed towards the right hand 75@c margin and the text on left hand pages is pushed toward the left 76@c hand margin. 77@c (To provide the reverse effect, set bindingoffset to -0.75in.) 78 79@c @tex 80@c \global\bindingoffset=0.75in 81@c \global\normaloffset =0.75in 82@c @end tex 83 84@copying 85Copyright @copyright{} @value{copyrights-treelang} Free Software Foundation, Inc. 86 87Permission is granted to copy, distribute and/or modify this document 88under the terms of the GNU Free Documentation License, Version 1.2 or 89any later version published by the Free Software Foundation; with the 90Invariant Sections being ``GNU General Public License'', the Front-Cover 91texts being (a) (see below), and with the Back-Cover Texts being (b) 92(see below). A copy of the license is included in the section entitled 93``GNU Free Documentation License''. 94 95(a) The FSF's Front-Cover Text is: 96 97 A GNU Manual 98 99(b) The FSF's Back-Cover Text is: 100 101 You have freedom to copy and modify this GNU Manual, like GNU 102 software. Copies published by the Free Software Foundation raise 103 funds for GNU development. 104@end copying 105 106@ifnottex 107@dircategory Programming 108@direntry 109* treelang: (treelang). The GNU Treelang compiler. 110@end direntry 111@ifset INTERNALS 112@ifset USING 113This file documents the use and the internals of the GNU Treelang 114(@code{treelang}) compiler. At the moment this manual is not 115incorporated into the main GCC manual as it is too incomplete. It 116corresponds to the @value{which-treelang} version of @code{treelang}. 117@end ifset 118@end ifset 119@ifclear USING 120This file documents the internals of the GNU Treelang (@code{treelang}) compiler. 121It corresponds to the @value{which-treelang} version of @code{treelang}. 122@end ifclear 123@ifclear INTERNALS 124This file documents the use of the GNU Treelang (@code{treelang}) compiler. 125It corresponds to the @value{which-treelang} version of @code{treelang}. 126@end ifclear 127 128Published by the Free Software Foundation 12959 Temple Place - Suite 330 130Boston, MA 02111-1307 USA 131 132@insertcopying 133@end ifnottex 134 135treelang was Contributed by Tim Josling (@email{@value{email-josling}}). 136Inspired by and based on the 'toy' language, written by Richard Kenner. 137 138This document was written by Tim Josling, based on the GNU C++ 139documentation. 140 141@setchapternewpage odd 142@c @finalout 143@titlepage 144@ifset INTERNALS 145@ifset USING 146@center @titlefont{Using and Maintaining GNU Treelang} 147 148@end ifset 149@end ifset 150@ifclear INTERNALS 151@title Using GNU Treelang 152@end ifclear 153@ifclear USING 154@title Maintaining GNU Treelang 155@end ifclear 156@sp 2 157@center Tim Josling 158@sp 3 159@center Last updated @value{last-update} 160@sp 1 161@center for version @value{version-treelang} 162@page 163@vskip 0pt plus 1filll 164For the @value{which-treelang} Version* 165@sp 1 166Published by the Free Software Foundation @* 16759 Temple Place - Suite 330@* 168Boston, MA 02111-1307, USA@* 169@c Last printed ??ber, 19??.@* 170@c Printed copies are available for $? each.@* 171@c ISBN ??? 172@sp 1 173@insertcopying 174@end titlepage 175@page 176 177@ifnottex 178 179@node Top, Copying,, (dir) 180@top Introduction 181@cindex Introduction 182 183@ifset INTERNALS 184@ifset USING 185This manual documents how to run, install and maintain @code{treelang}, 186as well as its new features and incompatibilities, 187and how to report bugs. 188It corresponds to the @value{which-treelang} version of @code{treelang}. 189@end ifset 190@end ifset 191 192@ifclear INTERNALS 193This manual documents how to run and install @code{treelang}, 194as well as its new features and incompatibilities, and how to report 195bugs. 196It corresponds to the @value{which-treelang} version of @code{treelang}. 197@end ifclear 198@ifclear USING 199This manual documents how to maintain @code{treelang}, as well as its 200new features and incompatibilities, and how to report bugs. It 201corresponds to the @value{which-treelang} version of @code{treelang}. 202@end ifclear 203 204@end ifnottex 205 206@ifset DEVELOPMENT 207@emph{Warning:} This document is still under development, and might not 208accurately reflect the @code{treelang} code base of which it is a part. 209@end ifset 210 211@menu 212* Copying:: 213* Contributors:: 214* GNU Free Documentation License:: 215* Funding:: 216* Getting Started:: 217* What is GNU Treelang?:: 218* Lexical Syntax:: 219* Parsing Syntax:: 220* Compiler Overview:: 221* TREELANG and GCC:: 222* Compiler:: 223* Other Languages:: 224* treelang internals:: 225* Open Questions:: 226* Bugs:: 227* Service:: 228* Projects:: 229* Index:: 230 231@detailmenu 232 --- The Detailed Node Listing --- 233 234Other Languages 235 236* Interoperating with C and C++:: 237 238treelang internals 239 240* treelang files:: 241* treelang compiler interfaces:: 242* Hints and tips:: 243 244treelang compiler interfaces 245 246* treelang driver:: 247* treelang main compiler:: 248 249treelang main compiler 250 251* Interfacing to toplev.c:: 252* Interfacing to the garbage collection:: 253* Interfacing to the code generation code. :: 254 255Reporting Bugs 256 257* Sending Patches:: 258 259@end detailmenu 260@end menu 261 262@include gpl.texi 263 264@include fdl.texi 265 266@node Contributors 267 268@unnumbered Contributors to GNU Treelang 269@cindex contributors 270@cindex credits 271 272Treelang was based on 'toy' by Richard Kenner, and also uses code from 273the GCC core code tree. Tim Josling first created the language and 274documentation, based on the GCC Fortran compiler's documentation 275framework. 276 277@itemize @bullet 278@item 279The packaging and compiler portions of GNU Treelang are based largely 280on the GCC compiler. 281@xref{Contributors,,Contributors to GCC,GCC,Using and Maintaining GCC}, 282for more information. 283 284@item 285There is no specific run-time library for treelang, other than the 286standard C runtime. 287 288@item 289It would have been difficult to build treelang without access to Joachim 290Nadler's guide to writing a front end to GCC (written in German). A 291translation of this document into English is available via the 292CobolForGCC project or via the documentation links from the GCC home 293page @uref{http://GCC.gnu.org}. 294@end itemize 295 296@include funding.texi 297 298@node Getting Started 299@chapter Getting Started 300@cindex getting started 301@cindex new users 302@cindex newbies 303@cindex beginners 304 305Treelang is a sample language, useful only to help people understand how 306to implement a new language front end to GCC. It is not a useful 307language in itself other than as an example or basis for building a new 308language. Therefore only language developers are likely to have an 309interest in it. 310 311This manual assumes familiarity with GCC, which you can obtain by using 312it and by reading the manual @samp{Using and Porting GCC}. 313 314To install treelang, follow the GCC installation instructions, 315taking care to ensure you specify treelang in the configure step. 316 317If you're generally curious about the future of 318@code{treelang}, see @ref{Projects}. 319If you're curious about its past, 320see @ref{Contributors}. 321 322To see a few of the questions maintainers of @code{treelang} have, 323and that you might be able to answer, 324see @ref{Open Questions}. 325 326@ifset USING 327@node What is GNU Treelang?, Lexical Syntax, Getting Started, Top 328@chapter What is GNU Treelang? 329@cindex concepts, basic 330@cindex basic concepts 331 332GNU Treelang, or @code{treelang}, is designed initially as a free 333replacement for, or alternative to, the 'toy' language, but which is 334amenable to inclusion within the GCC source tree. 335 336@code{treelang} is largely a cut down version of C, designed to showcase 337the features of the GCC code generation back end. Only those features 338that are directly supported by the GCC code generation back end are 339implemented. Features are implemented in a manner which is easiest and 340clearest to implement. Not all or even most code generation back end 341features are implemented. The intention is to add features incrementally 342until most features of the GCC back end are implemented in treelang. 343 344The main features missing are structures, arrays and pointers. 345 346A sample program follows: 347 348@example 349// function prototypes 350// function 'add' taking two ints and returning an int 351external_definition int add(int arg1, int arg2); 352external_definition int subtract(int arg3, int arg4); 353external_definition int first_nonzero(int arg5, int arg6); 354external_definition int double_plus_one(int arg7); 355 356// function definition 357add 358@{ 359// return the sum of arg1 and arg2 360 return arg1 + arg2; 361@} 362 363 364subtract 365@{ 366 return arg3 - arg4; 367@} 368 369double_plus_one 370@{ 371// aaa is a variable, of type integer and allocated at the start of the function 372 automatic int aaa; 373// set aaa to the value returned from aaa, when passed arg7 and arg7 as the two parameters 374 aaa=add(arg7, arg7); 375 aaa=add(aaa, aaa); 376 aaa=subtract(subtract(aaa, arg7), arg7) + 1; 377 return aaa; 378@} 379 380first_nonzero 381@{ 382// C-like if statement 383 if (arg5) 384 @{ 385 return arg5; 386 @} 387 else 388 @{ 389 @} 390 return arg6; 391@} 392@end example 393 394@node Lexical Syntax, Parsing Syntax, What is GNU Treelang?, Top 395@chapter Lexical Syntax 396@cindex Lexical Syntax 397 398Treelang programs consist of whitespace, comments, keywords and names. 399@itemize @bullet 400 401@item 402Whitespace consists of the space character and the end of line 403character. Tabs are not allowed. Line terminations are as defined by the 404standard C library. Whitespace is ignored except within comments, 405and where it separates parts of the program. In the example below, A and 406B are two separate names separated by whitespace. 407 408@smallexample 409A B 410@end smallexample 411 412@item 413Comments consist of @samp{//} followed by any characters up to the end 414of the line. C style comments (/* */) are not supported. For example, 415the assignment below is followed by a not very helpful comment. 416 417@smallexample 418x=1; // Set X to 1 419@end smallexample 420 421@item 422Keywords consist of any reserved words or symbols as described 423later. The list of keywords follows: 424 425@smallexample 426@{ - used to start the statements in a function 427@} - used to end the statements in a function 428( - start list of function arguments, or to change the precedence of operators in an expression 429) - end list or prioritized operators in expression 430, - used to separate parameters in a function prototype or in a function call 431; - used to end a statement 432+ - addition 433- - subtraction 434= - assignment 435== - equality test 436if - begin IF statement 437else - begin 'else' portion of IF statement 438static - indicate variable is permanent, or function has file scope only 439automatic - indicate that variable is allocated for the life of the function 440external_reference - indicate that variable or function is defined in another file 441external_definition - indicate that variable or function is to be accessible from other files 442int - variable is an integer (same as C int) 443char - variable is a character (same as C char) 444unsigned - variable is unsigned. If this is not present, the variable is signed 445return - start function return statement 446void - used as function type to indicate function returns nothing 447@end smallexample 448 449 450@item 451Names consist of any letter or "_" followed by any number of letters or 452numbers or "_". "$" is not allowed in a name. All names must be globally 453unique - the same name may not be used twice in any context - and must 454not be a keyword. Names and keywords are case sensitive. For example: 455 456@smallexample 457a A _a a_ IF_X 458@end smallexample 459 460are all different names. 461 462@end itemize 463 464@node Parsing Syntax, Compiler Overview, Lexical Syntax, Top 465@chapter Parsing Syntax 466@cindex Parsing Syntax 467 468Declarations are built up from the lexical elements described above. A 469file may contain one of more declarations. 470 471@itemize @bullet 472 473@item 474declaration: variable declaration OR function prototype OR function declaration 475 476@item 477Function Prototype: storage type NAME ( parameter_list ) 478 479@smallexample 480static int add (int a, int b) 481@end smallexample 482 483@item 484variable_declaration: storage type NAME initial; 485 486Example: 487 488@smallexample 489int temp1=1; 490@end smallexample 491 492A variable declaration can be outside a function, or at the start of a function. 493 494@item 495storage: automatic OR static OR external_reference OR external_definition 496 497This defines the scope, duration and visibility of a function or variable 498 499@enumerate 1 500 501@item 502automatic: This means a variable is allocated at start of function and 503released when the function returns. This can only be used for variables 504within functions. It cannot be used for functions. 505 506@item 507static: This means a variable is allocated at start of program and 508remains allocated until the program as a whole ends. For a function, it 509means that the function is only visible within the current file. 510 511@item 512external_definition: For a variable, which must be defined outside a 513function, it means that the variable is visible from other files. For a 514function, it means that the function is visible from another file. 515 516@item 517external_reference: For a variable, which must be defined outside a 518function, it means that the variable is defined in another file. For a 519function, it means that the function is defined in another file. 520 521@end enumerate 522 523@item 524type: int OR unsigned int OR char OR unsigned char OR void 525 526This defines the data type of a variable or the return type of a function. 527 528@enumerate a 529 530@item 531int: The variable is a signed integer. The function returns a signed integer. 532 533@item 534unsigned int: The variable is an unsigned integer. The function returns an unsigned integer. 535 536@item 537char: The variable is a signed character. The function returns a signed character. 538 539@item 540unsigned char: The variable is an unsigned character. The function returns an unsigned character. 541 542@end enumerate 543 544@item 545parameter_list OR parameter [, parameter]... 546 547@item 548parameter: variable_declaration , 549 550The variable declarations must not have initialisations. 551 552@item 553initial: = value 554 555@item 556value: integer_constant 557 558@smallexample 559eg 1 +2 -3 560@end smallexample 561 562@item 563function_declaration: name @{variable_declarations statements @} 564 565A function consists of the function name then the declarations (if any) 566and statements (if any) within one pair of braces. 567 568The details of the function arguments come from the function 569prototype. The function prototype must precede the function declaration 570in the file. 571 572@item 573statement: if_statement OR expression_statement OR return_statement 574 575@item 576if_statement: if (expression) @{ statements @} else @{ statements @} 577 578The first lot of statements is executed if the expression is 579nonzero. Otherwise the second lot of statements is executed. Either 580list of statements may be empty, but both sets of braces and the else must be present. 581 582@smallexample 583if (a==b) 584@{ 585// nothing 586@} 587else 588@{ 589a=b; 590@} 591@end smallexample 592 593@item 594expression_statement: expression; 595 596The expression is executed and any side effects, such 597 598@item 599return_statement: return expression_opt; 600 601Returns from the function. If the function is void, the expression must 602be absent, and if the function is not void the expression must be 603present. 604 605@item 606expression: variable OR integer_constant OR expression+expression OR expression-expression 607 OR expression==expression OR (expression) OR variable=expression OR function_call 608 609An expression can be a constant or a variable reference or a 610function_call. Expressions can be combined as a sum of two expressions 611or the difference of two expressions, or an equality test of two 612expresions. An assignment is also an expression. Expresions and operator 613precedence work as in C. 614 615@item 616function_call: function_name (comma_separated_expressions) 617 618This invokes the function, passing to it the values of the expressions 619as actual parameters. 620 621@end itemize 622 623@cindex compilers 624@node Compiler Overview, TREELANG and GCC, Parsing Syntax, Top 625@chapter Compiler Overview 626treelang is run as part of the GCC compiler. 627 628@itemize @bullet 629@cindex source code 630@cindex file, source 631@cindex code, source 632@cindex source file 633@item 634It reads a user's program, stored in a file and containing instructions 635written in the appropriate language (Treelang, C, and so on). This file 636contains @dfn{source code}. 637 638@cindex translation of user programs 639@cindex machine code 640@cindex code, machine 641@cindex mistakes 642@item 643It translates the user's program into instructions a computer can carry 644out more quickly than it takes to translate the instructions in the 645first place. These instructions are called @dfn{machine code}---code 646designed to be efficiently translated and processed by a machine such as 647a computer. Humans usually aren't as good writing machine code as they 648are at writing Treelang or C, because it is easy to make tiny mistakes 649writing machine code. When writing Treelang or C, it is easy to make 650big mistakes. But you can only make one mistake, because the compiler 651stops after it finds any problem. 652 653@cindex debugger 654@cindex bugs, finding 655@cindex @code{gdb}, command 656@cindex commands, @code{gdb} 657@item 658It provides information in the generated machine code 659that can make it easier to find bugs in the program 660(using a debugging tool, called a @dfn{debugger}, 661such as @code{gdb}). 662 663@cindex libraries 664@cindex linking 665@cindex @code{ld} command 666@cindex commands, @code{ld} 667@item 668It locates and gathers machine code already generated to perform actions 669requested by statements in the user's program. This machine code is 670organized into @dfn{libraries} and is located and gathered during the 671@dfn{link} phase of the compilation process. (Linking often is thought 672of as a separate step, because it can be directly invoked via the 673@code{ld} command. However, the @code{gcc} command, as with most 674compiler commands, automatically performs the linking step by calling on 675@code{ld} directly, unless asked to not do so by the user.) 676 677@cindex language, incorrect use of 678@cindex incorrect use of language 679@item 680It attempts to diagnose cases where the user's program contains 681incorrect usages of the language. The @dfn{diagnostics} produced by the 682compiler indicate the problem and the location in the user's source file 683where the problem was first noticed. The user can use this information 684to locate and fix the problem. 685 686The compiler stops after the first error. There are no plans to fix 687this, ever, as it would vastly complicate the implementation of treelang 688to little or no benefit. 689 690@cindex diagnostics, incorrect 691@cindex incorrect diagnostics 692@cindex error messages, incorrect 693@cindex incorrect error messages 694(Sometimes an incorrect usage of the language leads to a situation where 695the compiler can not make any sense of what it reads---while a human 696might be able to---and thus ends up complaining about an incorrect 697``problem'' it encounters that, in fact, reflects a misunderstanding of 698the programmer's intention.) 699 700@cindex warnings 701@cindex questionable instructions 702@item 703There are no warnings in treelang. A program is either correct or in 704error. 705@end itemize 706 707@cindex components of treelang 708@cindex @code{treelang}, components of 709@code{treelang} consists of several components: 710 711@cindex @code{gcc}, command 712@cindex commands, @code{gcc} 713@itemize @bullet 714@item 715A modified version of the @code{gcc} command, which also might be 716installed as the system's @code{cc} command. 717(In many cases, @code{cc} refers to the 718system's ``native'' C compiler, which 719might be a non-GNU compiler, or an older version 720of @code{GCC} considered more stable or that is 721used to build the operating system kernel.) 722 723@cindex @code{treelang}, command 724@cindex commands, @code{treelang} 725@item 726The @code{treelang} command itself. 727 728@item 729The @code{libc} run-time library. This library contains the machine 730code needed to support capabilities of the Treelang language that are 731not directly provided by the machine code generated by the 732@code{treelang} compilation phase. This is the same library that the 733main c compiler uses (libc). 734 735@cindex @code{tree1}, program 736@cindex programs, @code{tree1} 737@cindex assembler 738@cindex @code{as} command 739@cindex commands, @code{as} 740@cindex assembly code 741@cindex code, assembly 742@item 743The compiler itself, is internally named @code{tree1}. 744 745Note that @code{tree1} does not generate machine code directly---it 746generates @dfn{assembly code} that is a more readable form 747of machine code, leaving the conversion to actual machine code 748to an @dfn{assembler}, usually named @code{as}. 749@end itemize 750 751@code{GCC} is often thought of as ``the C compiler'' only, 752but it does more than that. 753Based on command-line options and the names given for files 754on the command line, @code{gcc} determines which actions to perform, including 755preprocessing, compiling (in a variety of possible languages), assembling, 756and linking. 757 758@cindex driver, gcc command as 759@cindex @code{gcc}, command as driver 760@cindex executable file 761@cindex files, executable 762@cindex cc1 program 763@cindex programs, cc1 764@cindex preprocessor 765@cindex cpp program 766@cindex programs, cpp 767For example, the command @samp{gcc foo.c} @dfn{drives} the file 768@file{foo.c} through the preprocessor @code{cpp}, then 769the C compiler (internally named 770@code{cc1}), then the assembler (usually @code{as}), then the linker 771(@code{ld}), producing an executable program named @file{a.out} (on 772UNIX systems). 773 774@cindex treelang program 775@cindex programs, treelang 776As another example, the command @samp{gcc foo.tree} would do much the 777same as @samp{gcc foo.c}, but instead of using the C compiler named 778@code{cc1}, @code{gcc} would use the treelang compiler (named 779@code{tree1}). However there is no preprocessor for treelang. 780 781@cindex @code{tree1}, program 782@cindex programs, @code{tree1} 783In a GNU Treelang installation, @code{gcc} recognizes Treelang source 784files by name just like it does C and C++ source files. It knows to use 785the Treelang compiler named @code{tree1}, instead of @code{cc1} or 786@code{cc1plus}, to compile Treelang files. If a file's name ends in 787@code{.tree} then GCC knows that the program is written in treelang. You 788can also manually override the language. 789 790@cindex @code{gcc}, not recognizing Treelang source 791@cindex unrecognized file format 792@cindex file format not recognized 793Non-Treelang-related operation of @code{gcc} is generally 794unaffected by installing the GNU Treelang version of @code{gcc}. 795However, without the installed version of @code{gcc} being the 796GNU Treelang version, @code{gcc} will not be able to compile 797and link Treelang programs. 798 799@cindex printing version information 800@cindex version information, printing 801The command @samp{gcc -v x.tree} where @samp{x.tree} is a file which 802must exist but whose contents are ignored, is a quick way to display 803version information for the various programs used to compile a typical 804Treelang source file. 805 806The @code{tree1} program represents most of what is unique to GNU 807Treelang; @code{tree1} is a combination of two rather large chunks of 808code. 809 810@cindex GCC Back End (GBE) 811@cindex GBE 812@cindex @code{GCC}, back end 813@cindex back end, GCC 814@cindex code generator 815One chunk is the so-called @dfn{GNU Back End}, or GBE, 816which knows how to generate fast code for a wide variety of processors. 817The same GBE is used by the C, C++, and Treelang compiler programs @code{cc1}, 818@code{cc1plus}, and @code{tree1}, plus others. 819Often the GBE is referred to as the ``GCC back end'' or 820even just ``GCC''---in this manual, the term GBE is used 821whenever the distinction is important. 822 823@cindex GNU Treelang Front End (TFE) 824@cindex tree1 825@cindex @code{treelang}, front end 826@cindex front end, @code{treelang} 827The other chunk of @code{tree1} is the majority of what is unique about 828GNU Treelang---the code that knows how to interpret Treelang programs to 829determine what they are intending to do, and then communicate that 830knowledge to the GBE for actual compilation of those programs. This 831chunk is called the @dfn{Treelang Front End} (TFE). The @code{cc1} and 832@code{cc1plus} programs have their own front ends, for the C and C++ 833languages, respectively. These fronts ends are responsible for 834diagnosing incorrect usage of their respective languages by the programs 835the process, and are responsible for most of the warnings about 836questionable constructs as well. (The GBE in principle handles 837producing some warnings, like those concerning possible references to 838undefined variables, but these warnings should not occur in treelang 839programs as the front end is meant to pick them up first). 840 841Because so much is shared among the compilers for various languages, 842much of the behavior and many of the user-selectable options for these 843compilers are similar. 844For example, diagnostics (error messages and 845warnings) are similar in appearance; command-line 846options like @samp{-Wall} have generally similar effects; and the quality 847of generated code (in terms of speed and size) is roughly similar 848(since that work is done by the shared GBE). 849 850@node TREELANG and GCC, Compiler, Compiler Overview, Top 851@chapter Compile Treelang, C, or Other Programs 852@cindex compiling programs 853@cindex programs, compiling 854 855@cindex @code{gcc}, command 856@cindex commands, @code{gcc} 857A GNU Treelang installation includes a modified version of the @code{gcc} 858command. 859 860In a non-Treelang installation, @code{gcc} recognizes C, C++, 861and Objective-C source files. 862 863In a GNU Treelang installation, @code{gcc} also recognizes Treelang source 864files and accepts Treelang-specific command-line options, plus some 865command-line options that are designed to cater to Treelang users 866but apply to other languages as well. 867 868@xref{G++ and GCC,,Compile C; C++; or Objective-C,GCC,Using and Porting GCC}, 869for information on the way different languages are handled 870by the GCC compiler (@code{gcc}). 871 872You can use this, combined with the output of the @samp{GCC -v x.tree} 873command to get the options applicable to treelang. Treelang programs 874must end with the suffix @samp{.tree}. 875 876@cindex preprocessor 877 878Treelang programs are not by default run through the C 879preprocessor by @code{gcc}. There is no reason why they cannot be run through the 880preprocessor manually, but you would need to prevent the preprocessor 881from generating #line directives, using the @samp{-P} option, otherwise 882tree1 will not accept the input. 883 884@node Compiler, Other Languages, TREELANG and GCC, Top 885@chapter The GNU Treelang Compiler 886 887The GNU Treelang compiler, @code{treelang}, supports programs written 888in the GNU Treelang language. 889 890@node Other Languages, treelang internals, Compiler, Top 891@chapter Other Languages 892 893@menu 894* Interoperating with C and C++:: 895@end menu 896 897@node Interoperating with C and C++, , Other Languages, Other Languages 898@section Tools and advice for interoperating with C and C++ 899 900The output of treelang programs looks like C program code to the linker 901and everybody else, so you should be able to freely mix treelang and C 902(and C++) code, with one proviso. 903 904C promotes small integer types to 'int' when used as function parameters and 905return values. The treelang compiler does not do this, so if you want to interface 906to C, you need to specify the promoted value, not the nominal value. 907 908@ifset INTERNALS 909@node treelang internals, Open Questions, Other Languages, Top 910@chapter treelang internals 911 912@menu 913* treelang files:: 914* treelang compiler interfaces:: 915* Hints and tips:: 916@end menu 917 918@node treelang files, treelang compiler interfaces, treelang internals, treelang internals 919@section treelang files 920 921To create a compiler that integrates into GCC, you need create many 922files. Some of the files are integrated into the main GCC makefile, to 923build the various parts of the compiler and to run the test 924suite. Others are incorporated into various GCC programs such as 925GCC.c. Finally you must provide the actual programs comprising your 926compiler. 927 928@cindex files 929 930The files are: 931 932@enumerate 1 933 934@item 935COPYING. This is the copyright file, assuming you are going to use the 936GNU General Public Licence. You probably need to use the GPL because if 937you use the GCC back end your program and the back end are one program, 938and the back end is GPLed. 939 940This need not be present if the language is incorporated into the main 941GCC tree, as the main GCC directory has this file. 942 943@item 944COPYING.LIB. This is the copyright file for those parts of your program 945that are not to be covered by the GPL, but are instead to be covered by 946the LGPL (Library or Lesser GPL). This licence may be appropriate for 947the library routines associated with your compiler. These are the 948routines that are linked with the @emph{output} of the compiler. Using 949the LGPL for these programs allows programs written using your compiler 950to be closed source. For example LIBC is under the LGPL. 951 952This need not be present if the language is incorporated into the main 953GCC tree, as the main GCC directory has this file. 954 955@item 956ChangeLog. Record all the changes to your compiler. Use the same format 957as used in treelang as it is supported by an emacs editing mode and is 958part of the FSF coding standard. Normally each directory has its own 959changelog. The FSF standard allows but does not require a meaningful 960comment on why the changes were made, above and beyond @emph{why} they 961were made. In the author's opinion it is useful to provide this 962information. 963 964@item 965treelang.texi. The manual, written in texinfo. Your manual would have a 966different file name. You need not write it in texinfo if you don't want 967do, but a lot of GNU software does use texinfo. 968 969@cindex Make-lang.in 970@item 971Make-lang.in. This file is part of the make file which in incorporated 972with the GCC make file skeleton (Makefile.in in the GCC directory) to 973make Makefile, as part of the configuration process. 974 975Makefile in turn is the main instruction to actually build 976everything. The build instructions are held in the main GCC manual and 977web site so they are not repeated here. 978 979There are some comments at the top which will help you understand what 980you need to do. 981 982There are make commands to build things, remove generated files with 983various degrees of thoroughness, count the lines of code (so you know 984how much progress you are making), build info and html files from the 985texinfo source, run the tests etc. 986 987@item 988README. Just a brief informative text file saying what is in this 989directory. 990 991@cindex config-lang.in 992@item 993config-lang.in. This file is read by the configuration progress and must 994be present. You specify the name of your language, the name(s) of the 995compiler(s) incouding preprocessors you are going to build, whether any, 996usually generated, files should be excluded from diffs (ie when making 997diff files to send in patches). Whether the equate 'stagestuff' is used 998is unknown (???). 999 1000@cindex lang-options 1001@item 1002lang-options. This file is included into GCC.c, the main GCC driver, and 1003tells it what options your language supports. This is only used to 1004display help (is this true ???). 1005 1006@cindex lang-specs 1007@item 1008lang-specs. This file is also included in GCC.c. It tells GCC.c when to 1009call your programs and what options to send them. The mini-language 1010'specs' is documented in the source of GCC.c. Do not attempt to write a 1011specs file from scratch - use an existing one as the base and enhance 1012it. 1013 1014@item 1015Your texi files. Texinfo can be used to build documentation in HTML, 1016info, dvi and postscript formats. It is a tagged language, is documented 1017in its own manual, and has its own emacs mode. 1018 1019@item 1020Your programs. The relationships between all the programs are explained 1021in the next section. You need to write or use the following programs: 1022 1023@itemize @bullet 1024 1025@item 1026lexer. This breaks the input into words and passes these to the 1027parser. This is lex.l in treelang, which is passed through flex, a lex 1028variant, to produce C code lex.c. Note there is a school of thought that 1029says real men hand code their own lexers, however you may prefer to 1030write far less code and use flex, as was done with treelang. 1031 1032@item 1033parser. This breaks the program into recognizable constructs such as 1034expressions, statements etc. This is parse.y in treelang, which is 1035passed through bison, which is a yacc variant, to produce C code parse.c. 1036 1037@item 1038back end interface. This interfaces to the code generation back end. In 1039treelang, this is tree1.c which mainly interfaces to toplev.c and 1040treetree.c which mainly interfaces to everything else. Many languages 1041mix up the back end interface with the parser, as in the C compiler for 1042example. It is a matter of taste which way to do it, but with treelang 1043it is separated out to make the back end interface cleaner and easier to 1044understand. 1045 1046@item 1047header files. For function prototypes and common data items. One point 1048to note here is that bison can generate a header files with all the 1049numbers is has assigned to the keywords and symbols, and you can include 1050the same header in your lexer. This technique is demonstrated in 1051treelang. 1052 1053@item 1054compiler main file. GCC comes with a program toplev.c which is a 1055perfectly serviceable main program for your compiler. treelang uses 1056toplev.c but other languages have been known to replace it with their 1057own main program. Again this is a matter of taste and how much code you 1058want to write. 1059 1060@end itemize 1061 1062@end enumerate 1063 1064@node treelang compiler interfaces, Hints and tips, treelang files, treelang internals 1065@section treelang compiler interfaces 1066 1067@cindex driver 1068@cindex toplev.c 1069 1070@menu 1071* treelang driver:: 1072* treelang main compiler:: 1073@end menu 1074 1075@node treelang driver, treelang main compiler, treelang compiler interfaces, treelang compiler interfaces 1076@subsection treelang driver 1077 1078The GCC compiler consists of a driver, which then executes the various 1079compiler phases based on the instructions in the specs files. 1080 1081Typically a program's language will be identified from its suffix (eg 1082.tree) for treelang programs. 1083 1084The driver (gcc.c) will then drive (exec) in turn a preprocessor, the main 1085compiler, the assembler and the link editor. Options to GCC allow you to 1086override all of this. In the case of treelang programs there is no 1087preprocessor, and mostly these days the C preprocessor is run within the 1088main C compiler rather than as a separate process, apparently for reasons of speed. 1089 1090You will be using the standard assembler and linkage editor so these are 1091ignored from now on. 1092 1093You have to write your own preprocessor if you want one. This is usually 1094totally language specific. The main point to be aware of is to ensure 1095that you find some way to pass file name and line number information 1096through to the main compiler so that it can tell the back end this 1097information and so the debugger can find the right source line for each 1098piece of code. That is all there is to say about the preprocessor except 1099that the preprocessor will probably not be the slowest part of the 1100compiler and will probably not use the most memory so don't waste too 1101much time tuning it until you know you need to do so. 1102 1103@node treelang main compiler, , treelang driver, treelang compiler interfaces 1104@subsection treelang main compiler 1105 1106The main compiler for treelang consists of toplev.c from the main GCC 1107compiler, the parser, lexer and back end interface routines, and the 1108back end routines themselves, of which there are many. 1109 1110toplev.c does a lot of work for you and you should almost certainly use it, 1111 1112Writing this code is the hard part of creating a compiler using GCC. The 1113back end interface documentation is incomplete and the interface is 1114complex. 1115 1116There are three main aspects to interfacing to the other GCC code. 1117 1118@menu 1119* Interfacing to toplev.c:: 1120* Interfacing to the garbage collection:: 1121* Interfacing to the code generation code. :: 1122@end menu 1123 1124@node Interfacing to toplev.c, Interfacing to the garbage collection, treelang main compiler, treelang main compiler 1125@subsubsection Interfacing to toplev.c 1126 1127In treelang this is handled mainly in tree1.c 1128and partly in treetree.c. Peruse toplev.c for details of what you need 1129to do. 1130 1131@node Interfacing to the garbage collection, Interfacing to the code generation code. , Interfacing to toplev.c, treelang main compiler 1132@subsubsection Interfacing to the garbage collection 1133 1134Interfacing to the garbage collection. In treelang this is mainly in 1135tree1.c. 1136 1137Memory allocation in the compiler should be done using the ggc_alloc and 1138kindred routines in ggc*.*. At the end of every 'function' in your language, toplev.c calls 1139the garbage collection several times. The garbage collection calls mark 1140routines which go through the memory which is still used, telling the 1141garbage collection not to free it. Then all the memory not used is 1142freed. 1143 1144What this means is that you need a way to hook into this marking 1145process. This is done by calling ggc_add_root. This provides the address 1146of a callback routine which will be called duing garbage collection and 1147which can call ggc_mark to save the storage. If storage is only 1148used within the parsing of a function, you do not need to provide a way 1149to mark it. 1150 1151Note that you can also call ggc_mark_tree to mark any of the back end 1152internal 'tree' nodes. This routine will follow the branches of the 1153trees and mark all the subordinate structures. This is useful for 1154example when you have created a variable declaaration that will be used 1155across multiple functions, or for a function declaration (from a 1156prototype) that may be used later on. See the next item for more on the 1157tree nodes. 1158 1159@node Interfacing to the code generation code. , , Interfacing to the garbage collection, treelang main compiler 1160@subsubsection Interfacing to the code generation code. 1161 1162In treelang this is done in treetree.c. A typedef called 'tree' which is 1163defined in tree.h and tree.def in the GCC directory and largely 1164implemented in tree.c and stmt.c forms the basic interface to the 1165compiler back end. 1166 1167In general you call various tree routines to generate code, either 1168directly or through toplev.c. You build up data structures and 1169expressions in similar ways. 1170 1171You can read some documentation on this which can be found via the GCC 1172main web page. In particular, the documentation produced by Joachim 1173Nadler and translated by Tim Josling can be quite useful. the C compiler 1174also has documentation in the main GCC manual (particularly the current 1175CVS version) which is useful on a lot of the details. 1176 1177In time it is hoped to enhance this document to provide a more 1178comprehensive overview of this topic. The main gap is in explaining how 1179it all works together. 1180 1181@node Hints and tips, , treelang compiler interfaces, treelang internals 1182@section Hints and tips 1183 1184@itemize @bullet 1185 1186@item 1187TAGS: Use the make ETAGS commands to create TAGS files which can be used in 1188emacs to jump to any symbol quickly. 1189 1190@item 1191GREP: grep is also a useful way to find all uses of a symbol. 1192 1193@item 1194TREE: The main routines to look at are tree.h and tree.def. You will 1195probably want a hardcopy of these. 1196 1197@item 1198SAMPLE: look at the sample interfacing code in treetree.c. You can use 1199gdb to trace through the code and learn about how it all works. 1200 1201@item 1202GDB: the GCC back end works well with gdb. It traps abort() and allows 1203you to trace back what went wrong. 1204 1205@item 1206Error Checking: The compiler back end does some error and consistency 1207checking. Often the result of an error is just no code being 1208generated. You will then need to trace through and find out what is 1209going wrong. The rtl dump files can help here also. 1210 1211@item 1212rtl dump files: The main compiler documents these files which are dumps 1213of the rtl (intermediate code) which is manipulated doing the code 1214generation process. This can provide useful clues about what is going 1215wrong. The rtl 'language' is documented in the main GCC manual. 1216 1217@end itemize 1218 1219@end ifset 1220 1221@node Open Questions, Bugs, treelang internals, Top 1222@chapter Open Questions 1223 1224If you know GCC well, please consider looking at the file treetree.c and 1225resolving any questions marked "???". 1226 1227@node Bugs, Service, Open Questions, Top 1228@chapter Reporting Bugs 1229@cindex bugs 1230@cindex reporting bugs 1231 1232You can report bugs to @email{@value{email-bugs}}. Please make 1233sure bugs are real before reporting them. Follow the guidelines in the 1234main GCC manual for submitting bug reports. 1235 1236@menu 1237* Sending Patches:: 1238@end menu 1239 1240@node Sending Patches, , Bugs, Bugs 1241@section Sending Patches for GNU Treelang 1242 1243If you would like to write bug fixes or improvements for the GNU 1244Treelang compiler, that is very helpful. Send suggested fixes to 1245@email{@value{email-patches}}. 1246 1247@node Service, Projects, Bugs, Top 1248@chapter How To Get Help with GNU Treelang 1249 1250If you need help installing, using or changing GNU Treelang, there are two 1251ways to find it: 1252 1253@itemize @bullet 1254 1255@item 1256Look in the service directory for someone who might help you for a fee. 1257The service directory is found in the file named @file{SERVICE} in the 1258GCC distribution. 1259 1260@item 1261Send a message to @email{@value{email-general}}. 1262 1263@end itemize 1264 1265@end ifset 1266@ifset INTERNALS 1267 1268@node Projects, Index, Service, Top 1269@chapter Projects 1270@cindex projects 1271 1272If you want to contribute to @code{treelang} by doing research, 1273design, specification, documentation, coding, or testing, 1274the following information should give you some ideas. 1275 1276Send a message to @email{@value{email-general}} if you plan to add a 1277feature. 1278 1279The main requirement for treelang is to add features and to add 1280documentation. Features are things that the GCC back end can do but 1281which are not reflected in treelang. Examples include structures, 1282unions, pointers, arrays. 1283 1284@end ifset 1285 1286@node Index, , Projects, Top 1287@unnumbered Index 1288 1289@printindex cp 1290@summarycontents 1291@contents 1292@bye 1293