xref: /openbsd-src/gnu/llvm/clang/docs/StandardCPlusPlusModules.rst (revision 12c855180aad702bbcca06e0398d774beeafb155)
1====================
2Standard C++ Modules
3====================
4
5.. contents::
6   :local:
7
8Introduction
9============
10
11The term ``modules`` has a lot of meanings. For the users of Clang, modules may
12refer to ``Objective-C Modules``, ``Clang C++ Modules`` (or ``Clang Header Modules``,
13etc.) or ``Standard C++ Modules``. The implementation of all these kinds of modules in Clang
14has a lot of shared code, but from the perspective of users, their semantics and
15command line interfaces are very different. This document focuses on
16an introduction of how to use standard C++ modules in Clang.
17
18There is already a detailed document about `Clang modules <Modules.html>`_, it
19should be helpful to read `Clang modules <Modules.html>`_ if you want to know
20more about the general idea of modules. Since standard C++ modules have different semantics
21(and work flows) from `Clang modules`, this page describes the background and use of
22Clang with standard C++ modules.
23
24Modules exist in two forms in the C++ Language Specification. They can refer to
25either "Named Modules" or to "Header Units". This document covers both forms.
26
27Standard C++ Named modules
28==========================
29
30This document was intended to be a manual first and foremost, however, we consider it helpful to
31introduce some language background here for readers who are not familiar with
32the new language feature. This document is not intended to be a language
33tutorial; it will only introduce necessary concepts about the
34structure and building of the project.
35
36Background and terminology
37--------------------------
38
39Modules
40~~~~~~~
41
42In this document, the term ``Modules``/``modules`` refers to standard C++ modules
43feature if it is not decorated by ``Clang``.
44
45Clang Modules
46~~~~~~~~~~~~~
47
48In this document, the term ``Clang Modules``/``Clang modules`` refer to Clang
49c++ modules extension. These are also known as ``Clang header modules``,
50``Clang module map modules`` or ``Clang c++ modules``.
51
52Module and module unit
53~~~~~~~~~~~~~~~~~~~~~~
54
55A module consists of one or more module units. A module unit is a special
56translation unit. Every module unit must have a module declaration. The syntax
57of the module declaration is:
58
59.. code-block:: c++
60
61  [export] module module_name[:partition_name];
62
63Terms enclosed in ``[]`` are optional. The syntax of ``module_name`` and ``partition_name``
64in regex form corresponds to ``[a-zA-Z_][a-zA-Z_0-9\.]*``. In particular, a literal dot ``.``
65in the name has no semantic meaning (e.g. implying a hierarchy).
66
67In this document, module units are classified into:
68
69* Primary module interface unit.
70
71* Module implementation unit.
72
73* Module interface partition unit.
74
75* Internal module partition unit.
76
77A primary module interface unit is a module unit whose module declaration is
78``export module module_name;``. The ``module_name`` here denotes the name of the
79module. A module should have one and only one primary module interface unit.
80
81A module implementation unit is a module unit whose module declaration is
82``module module_name;``. A module could have multiple module implementation
83units with the same declaration.
84
85A module interface partition unit is a module unit whose module declaration is
86``export module module_name:partition_name;``. The ``partition_name`` should be
87unique within any given module.
88
89An internal module partition unit is a module unit whose module declaration
90is ``module module_name:partition_name;``. The ``partition_name`` should be
91unique within any given module.
92
93In this document, we use the following umbrella terms:
94
95* A ``module interface unit`` refers to either a ``primary module interface unit``
96  or a ``module interface partition unit``.
97
98* An ``importable module unit`` refers to either a ``module interface unit``
99  or a ``internal module partition unit``.
100
101* A ``module partition unit`` refers to either a ``module interface partition unit``
102  or a ``internal module partition unit``.
103
104Built Module Interface file
105~~~~~~~~~~~~~~~~~~~~~~~~~~~
106
107A ``Built Module Interface file`` stands for the precompiled result of an importable module unit.
108It is also called the acronym ``BMI`` genrally.
109
110Global module fragment
111~~~~~~~~~~~~~~~~~~~~~~
112
113In a module unit, the section from ``module;`` to the module declaration is called the global module fragment.
114
115
116How to build projects using modules
117-----------------------------------
118
119Quick Start
120~~~~~~~~~~~
121
122Let's see a "hello world" example that uses modules.
123
124.. code-block:: c++
125
126  // Hello.cppm
127  module;
128  #include <iostream>
129  export module Hello;
130  export void hello() {
131    std::cout << "Hello World!\n";
132  }
133
134  // use.cpp
135  import Hello;
136  int main() {
137    hello();
138    return 0;
139  }
140
141Then we type:
142
143.. code-block:: console
144
145  $ clang++ -std=c++20 Hello.cppm --precompile -o Hello.pcm
146  $ clang++ -std=c++20 use.cpp -fprebuilt-module-path=. Hello.pcm -o Hello.out
147  $ ./Hello.out
148  Hello World!
149
150In this example, we make and use a simple module ``Hello`` which contains only a
151primary module interface unit ``Hello.cppm``.
152
153Then let's see a little bit more complex "hello world" example which uses the 4 kinds of module units.
154
155.. code-block:: c++
156
157  // M.cppm
158  export module M;
159  export import :interface_part;
160  import :impl_part;
161  export void Hello();
162
163  // interface_part.cppm
164  export module M:interface_part;
165  export void World();
166
167  // impl_part.cppm
168  module;
169  #include <iostream>
170  #include <string>
171  module M:impl_part;
172  import :interface_part;
173
174  std::string W = "World.";
175  void World() {
176    std::cout << W << std::endl;
177  }
178
179  // Impl.cpp
180  module;
181  #include <iostream>
182  module M;
183  void Hello() {
184    std::cout << "Hello ";
185  }
186
187  // User.cpp
188  import M;
189  int main() {
190    Hello();
191    World();
192    return 0;
193  }
194
195Then we are able to compile the example by the following command:
196
197.. code-block:: console
198
199  # Precompiling the module
200  $ clang++ -std=c++20 interface_part.cppm --precompile -o M-interface_part.pcm
201  $ clang++ -std=c++20 impl_part.cppm --precompile -fprebuilt-module-path=. -o M-impl_part.pcm
202  $ clang++ -std=c++20 M.cppm --precompile -fprebuilt-module-path=. -o M.pcm
203  $ clang++ -std=c++20 Impl.cpp -fmodule-file=M.pcm -c -o Impl.o
204
205  # Compiling the user
206  $ clang++ -std=c++20 User.cpp -fprebuilt-module-path=. -c -o User.o
207
208  # Compiling the module and linking it together
209  $ clang++ -std=c++20 M-interface_part.pcm -c -o M-interface_part.o
210  $ clang++ -std=c++20 M-impl_part.pcm -c -o M-impl_part.o
211  $ clang++ -std=c++20 M.pcm -c -o M.o
212  $ clang++ User.o M-interface_part.o  M-impl_part.o M.o Impl.o -o a.out
213
214We explain the options in the following sections.
215
216How to enable standard C++ modules
217~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
218
219Currently, standard C++ modules are enabled automatically
220if the language standard is ``-std=c++20`` or newer.
221The ``-fmodules-ts`` option is deprecated and is planned to be removed.
222
223How to produce a BMI
224~~~~~~~~~~~~~~~~~~~~
225
226We can generate a BMI for an importable module unit by either ``--precompile``
227or ``-fmodule-output`` flags.
228
229The ``--precompile`` option generates the BMI as the output of the compilation and the output path
230can be specified using the ``-o`` option.
231
232The ``-fmodule-output`` option generates the BMI as a by-product of the compilation.
233If ``-fmodule-output=`` is specified, the BMI will be emitted the specified location. Then if
234``-fmodule-output`` and ``-c`` are specified, the BMI will be emitted in the directory of the
235output file with the name of the input file with the new extension ``.pcm``. Otherwise, the BMI
236will be emitted in the working directory with the name of the input file with the new extension
237``.pcm``.
238
239The style to generate BMIs by ``--precompile`` is called two-phase compilation since it takes
2402 steps to compile a source file to an object file. The style to generate BMIs by ``-fmodule-output``
241is called one-phase compilation respectively. The one-phase compilation model is simpler
242for build systems to implement and the two-phase compilation has the potential to compile faster due
243to higher parallelism. As an example, if there are two module units A and B, and B depends on A, the
244one-phase compilation model would need to compile them serially, whereas the two-phase compilation
245model may be able to compile them simultaneously if the compilation from A.pcm to A.o takes a long
246time.
247
248File name requirement
249~~~~~~~~~~~~~~~~~~~~~
250
251The file name of an ``importable module unit`` should end with ``.cppm``
252(or ``.ccm``, ``.cxxm``, ``.c++m``). The file name of a ``module implementation unit``
253should end with ``.cpp`` (or ``.cc``, ``.cxx``, ``.c++``).
254
255The file name of BMIs should end with ``.pcm``.
256The file name of the BMI of a ``primary module interface unit`` should be ``module_name.pcm``.
257The file name of BMIs of ``module partition unit`` should be ``module_name-partition_name.pcm``.
258
259If the file names use different extensions, Clang may fail to build the module.
260For example, if the filename of an ``importable module unit`` ends with ``.cpp`` instead of ``.cppm``,
261then we can't generate a BMI for the ``importable module unit`` by ``--precompile`` option
262since ``--precompile`` option now would only run preprocessor, which is equal to `-E` now.
263If we want the filename of an ``importable module unit`` ends with other suffixes instead of ``.cppm``,
264we could put ``-x c++-module`` in front of the file. For example,
265
266.. code-block:: c++
267
268  // Hello.cpp
269  module;
270  #include <iostream>
271  export module Hello;
272  export void hello() {
273    std::cout << "Hello World!\n";
274  }
275
276  // use.cpp
277  import Hello;
278  int main() {
279    hello();
280    return 0;
281  }
282
283Now the filename of the ``module interface`` ends with ``.cpp`` instead of ``.cppm``,
284we can't compile them by the original command lines. But we are still able to do it by:
285
286.. code-block:: console
287
288  $ clang++ -std=c++20 -x c++-module Hello.cpp --precompile -o Hello.pcm
289  $ clang++ -std=c++20 use.cpp -fprebuilt-module-path=. Hello.pcm -o Hello.out
290  $ ./Hello.out
291  Hello World!
292
293Module name requirement
294~~~~~~~~~~~~~~~~~~~~~~~
295
296[module.unit]p1 says:
297
298.. code-block:: text
299
300  All module-names either beginning with an identifier consisting of std followed by zero
301  or more digits or containing a reserved identifier ([lex.name]) are reserved and shall not
302  be specified in a module-declaration; no diagnostic is required. If any identifier in a reserved
303  module-name is a reserved identifier, the module name is reserved for use by C++ implementations;
304  otherwise it is reserved for future standardization.
305
306So all of the following name is not valid by default:
307
308.. code-block:: text
309
310    std
311    std1
312    std.foo
313    __test
314    // and so on ...
315
316If you still want to use the reserved module names for any reason, currently you can add a special line marker
317in the front of the module declaration like:
318
319.. code-block:: c++
320
321  # __LINE_NUMBER__ __FILE__ 1 3
322  export module std;
323
324Here the `__LINE_NUMBER__` is the actual line number of the corresponding line. The `__FILE__` means the filename
325of the translation unit. The `1` means the following is a new file. And `3` means this is a system header/file so
326the certain warnings should be suppressed. You could find more details at:
327https://gcc.gnu.org/onlinedocs/gcc-3.0.2/cpp_9.html.
328
329How to specify the dependent BMIs
330~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
331
332There are 3 methods to specify the dependent BMIs:
333
334* (1) ``-fprebuilt-module-path=<path/to/direcotry>``.
335* (2) ``-fmodule-file=<path/to/BMI>``.
336* (3) ``-fmodule-file=<module-name>=<path/to/BMI>``.
337
338The option ``-fprebuilt-module-path`` tells the compiler the path where to search for dependent BMIs.
339It may be used multiple times just like ``-I`` for specifying paths for header files. The look up rule here is:
340
341* (1) When we import module M. The compiler would look up M.pcm in the directories specified
342  by ``-fprebuilt-module-path``.
343* (2) When we import partition module unit M:P. The compiler would look up M-P.pcm in the
344  directories specified by ``-fprebuilt-module-path``.
345
346The option ``-fmodule-file=<path/to/BMI>`` tells the compiler to load the specified BMI directly.
347The option ``-fmodule-file=<module-name>=<path/to/BMI>`` tells the compiler to load the specified BMI
348for the module specified by ``<module-name>`` when necessary. The main difference is that
349``-fmodule-file=<path/to/BMI>`` will load the BMI eagerly, whereas
350``-fmodule-file=<module-name>=<path/to/BMI>`` will only load the BMI lazily, which is similar
351with ``-fprebuilt-module-path``.
352
353In case all ``-fprebuilt-module-path=<path/to/direcotry>``, ``-fmodule-file=<path/to/BMI>`` and
354``-fmodule-file=<module-name>=<path/to/BMI>`` exist, the ``-fmodule-file=<path/to/BMI>`` option
355takes highest precedence and ``-fmodule-file=<module-name>=<path/to/BMI>`` will take the second
356highest precedence.
357
358When we compile a ``module implementation unit``, we must specify the BMI of the corresponding
359``primary module interface unit``.
360Since the language specification says a module implementation unit implicitly imports
361the primary module interface unit.
362
363  [module.unit]p8
364
365  A module-declaration that contains neither an export-keyword nor a module-partition implicitly
366  imports the primary module interface unit of the module as if by a module-import-declaration.
367
368All of the 3 options ``-fprebuilt-module-path=<path/to/direcotry>``, ``-fmodule-file=<path/to/BMI>``
369and ``-fmodule-file=<module-name>=<path/to/BMI>`` may occur multiple times.
370For example, the command line to compile ``M.cppm`` in
371the above example could be rewritten into:
372
373.. code-block:: console
374
375  $ clang++ -std=c++20 M.cppm --precompile -fmodule-file=M-interface_part.pcm -fmodule-file=M-impl_part.pcm -o M.pcm
376  $ clang++ -std=c++20 M.cppm --precompile -fmodule-file=M:interface_part=M-interface_part.pcm -fmodule-file=M:impl_part=M-impl_part.pcm -o M.pcm
377
378``-fprebuilt-module-path`` is more convenient and ``-fmodule-file`` is faster since
379it saves time for file lookup.
380
381Remember that module units still have an object counterpart to the BMI
382~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
383
384It is easy to forget to compile BMIs at first since we may envision module interfaces like headers.
385However, this is not true.
386Module units are translation units. We need to compile them to object files
387and link the object files like the example shows.
388
389For example, the traditional compilation processes for headers are like:
390
391.. code-block:: text
392
393  src1.cpp -+> clang++ src1.cpp --> src1.o ---,
394  hdr1.h  --'                                 +-> clang++ src1.o src2.o ->  executable
395  hdr2.h  --,                                 |
396  src2.cpp -+> clang++ src2.cpp --> src2.o ---'
397
398And the compilation process for module units are like:
399
400.. code-block:: text
401
402                src1.cpp ----------------------------------------+> clang++ src1.cpp -------> src1.o -,
403  (header unit) hdr1.h    -> clang++ hdr1.h ...    -> hdr1.pcm --'                                    +-> clang++ src1.o mod1.o src2.o ->  executable
404                mod1.cppm -> clang++ mod1.cppm ... -> mod1.pcm --,--> clang++ mod1.pcm ... -> mod1.o -+
405                src2.cpp ----------------------------------------+> clang++ src2.cpp -------> src2.o -'
406
407As the diagrams show, we need to compile the BMI from module units to object files and link the object files.
408(But we can't do this for the BMI from header units. See the later section for the definition of header units)
409
410If we want to create a module library, we can't just ship the BMIs in an archive.
411We must compile these BMIs(``*.pcm``) into object files(``*.o``) and add those object files to the archive instead.
412
413Consistency Requirement
414~~~~~~~~~~~~~~~~~~~~~~~
415
416If we envision modules as a cache to speed up compilation, then - as with other caching techniques -
417it is important to keep cache consistency.
418So **currently** Clang will do very strict check for consistency.
419
420Options consistency
421^^^^^^^^^^^^^^^^^^^
422
423The language option of module units and their non-module-unit users should be consistent.
424The following example is not allowed:
425
426.. code-block:: c++
427
428  // M.cppm
429  export module M;
430
431  // Use.cpp
432  import M;
433
434.. code-block:: console
435
436  $ clang++ -std=c++20 M.cppm --precompile -o M.pcm
437  $ clang++ -std=c++2b Use.cpp -fprebuilt-module-path=.
438
439The compiler would reject the example due to the inconsistent language options.
440Not all options are language options.
441For example, the following example is allowed:
442
443.. code-block:: console
444
445  $ clang++ -std=c++20 M.cppm --precompile -o M.pcm
446  # Inconsistent optimization level.
447  $ clang++ -std=c++20 -O3 Use.cpp -fprebuilt-module-path=.
448  # Inconsistent debugging level.
449  $ clang++ -std=c++20 -g Use.cpp -fprebuilt-module-path=.
450
451Although the two examples have inconsistent optimization and debugging level, both of them are accepted.
452
453Note that **currently** the compiler doesn't consider inconsistent macro definition a problem. For example:
454
455.. code-block:: console
456
457  $ clang++ -std=c++20 M.cppm --precompile -o M.pcm
458  # Inconsistent optimization level.
459  $ clang++ -std=c++20 -O3 -DNDEBUG Use.cpp -fprebuilt-module-path=.
460
461Currently Clang would accept the above example. But it may produce surprising results if the
462debugging code depends on consistent use of ``NDEBUG`` also in other translation units.
463
464Source content consistency
465^^^^^^^^^^^^^^^^^^^^^^^^^^
466
467When the compiler reads a BMI, the compiler will check the consistency of the corresponding
468source files. For example:
469
470.. code-block:: c++
471
472  // M.cppm
473  export module M;
474  export template <class T>
475  T foo(T t) {
476    return t;
477  }
478
479  // Use.cpp
480  import M;
481  void bar() {
482    foo(5);
483  }
484
485.. code-block:: console
486
487  $ clang++ -std=c++20 M.cppm --precompile -o M.pcm
488  $ rm M.cppm
489  $ clang++ -std=c++20 Use.cpp -fmodule-file=M.pcm
490
491The compiler would reject the example since the compiler failed to find the source file to check the consistency.
492So the following example would be rejected too.
493
494.. code-block:: console
495
496  $ clang++ -std=c++20 M.cppm --precompile -o M.pcm
497  $ echo "int i=0;" >> M.cppm
498  $ clang++ -std=c++20 Use.cpp -fmodule-file=M.pcm
499
500The compiler would reject it too since the compiler detected the file was changed.
501
502But it is OK to move the BMI as long as the source files remain:
503
504.. code-block:: console
505
506  $ clang++ -std=c++20 M.cppm --precompile -o M.pcm
507  $ mkdir -p tmp
508  $ mv M.pcm tmp/M.pcm
509  $ clang++ -std=c++20 Use.cpp -fmodule-file=tmp/M.pcm
510
511The above example would be accepted.
512
513If the user doesn't want to follow the consistency requirement due to some reasons (e.g., distributing BMI),
514the user could try to use ``-Xclang -fmodules-embed-all-files`` when producing BMI. For example:
515
516.. code-block:: console
517
518  $ clang++ -std=c++20 M.cppm --precompile -Xclang -fmodules-embed-all-files -o M.pcm
519  $ rm M.cppm
520  $ clang++ -std=c++20 Use.cpp -fmodule-file=M.pcm
521
522Now the compiler would accept the above example.
523Important note: Xclang options are intended to be used by compiler internally and its semantics
524are not guaranteed to be preserved in future versions.
525
526Also the compiler will record the path to the header files included in the global module fragment and compare the
527headers when imported. For example,
528
529.. code-block:: c++
530
531  // foo.h
532  #include <iostream>
533  void Hello() {
534    std::cout << "Hello World.\n";
535  }
536
537  // foo.cppm
538  module;
539  #include "foo.h"
540  export module foo;
541  export using ::Hello;
542
543  // Use.cpp
544  import foo;
545  int main() {
546    Hello();
547  }
548
549Then it is problematic if we remove ``foo.h`` before import `foo` module.
550
551.. code-block:: console
552
553  $ clang++ -std=c++20 foo.cppm --precompile  -o foo.pcm
554  $ mv foo.h foo.orig.h
555  # The following one is rejected
556  $ clang++ -std=c++20 Use.cpp -fmodule-file=foo.pcm -c
557
558The above case will rejected. And we're still able to workaround it by ``-Xclang -fmodules-embed-all-files`` option:
559
560.. code-block:: console
561
562  $ clang++ -std=c++20 foo.cppm --precompile  -Xclang -fmodules-embed-all-files -o foo.pcm
563  $ mv foo.h foo.orig.h
564  $ clang++ -std=c++20 Use.cpp -fmodule-file=foo.pcm -c -o Use.o
565  $ clang++ Use.o foo.pcm
566
567ABI Impacts
568-----------
569
570The declarations in a module unit which are not in the global module fragment have new linkage names.
571
572For example,
573
574.. code-block:: c++
575
576  export module M;
577  namespace NS {
578    export int foo();
579  }
580
581The linkage name of ``NS::foo()`` would be ``_ZN2NSW1M3fooEv``.
582This couldn't be demangled by previous versions of the debugger or demangler.
583As of LLVM 15.x, users can utilize ``llvm-cxxfilt`` to demangle this:
584
585.. code-block:: console
586
587  $ llvm-cxxfilt _ZN2NSW1M3fooEv
588
589The result would be ``NS::foo@M()``, which reads as ``NS::foo()`` in module ``M``.
590
591The ABI implies that we can't declare something in a module unit and define it in a non-module unit (or vice-versa),
592as this would result in linking errors.
593
594Known Problems
595--------------
596
597The following describes issues in the current implementation of modules.
598Please see https://github.com/llvm/llvm-project/labels/clang%3Amodules for more issues
599or file a new issue if you don't find an existing one.
600If you're going to create a new issue for standard C++ modules,
601please start the title with ``[C++20] [Modules]`` (or ``[C++2b] [Modules]``, etc)
602and add the label ``clang:modules`` (if you have permissions for that).
603
604For higher level support for proposals, you could visit https://clang.llvm.org/cxx_status.html.
605
606Support for clang-scan-deps
607~~~~~~~~~~~~~~~~~~~~~~~~~~~
608
609The support for clang-scan-deps may be the most urgent problem for modules now.
610Without the support for clang-scan-deps, it's hard to involve build systems.
611This means that users could only play with modules through makefiles or by writing a parser by hand.
612It blocks more uses for modules, which will block more defect reports or requirements.
613
614This is tracked in: https://github.com/llvm/llvm-project/issues/51792.
615
616Ambiguous deduction guide
617~~~~~~~~~~~~~~~~~~~~~~~~~
618
619Currently, when we call deduction guides in global module fragment,
620we may get incorrect diagnosing message like: `ambiguous deduction`.
621
622So if we're using deduction guide from global module fragment, we probably need to write:
623
624.. code-block:: c++
625
626  std::lock_guard<std::mutex> lk(mutex);
627
628instead of
629
630.. code-block:: c++
631
632  std::lock_guard lk(mutex);
633
634This is tracked in: https://github.com/llvm/llvm-project/issues/56916
635
636Ignored PreferredName Attribute
637~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
638
639Due to a tricky problem, when Clang writes BMIs, Clang will ignore the ``preferred_name`` attribute, if any.
640This implies that the ``preferred_name`` wouldn't show in debugger or dumping.
641
642This is tracked in: https://github.com/llvm/llvm-project/issues/56490
643
644Don't emit macros about module declaration
645~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
646
647This is covered by P1857R3. We mention it again here since users may abuse it before we implement it.
648
649Someone may want to write code which could be compiled both by modules or non-modules.
650A direct idea would be use macros like:
651
652.. code-block:: c++
653
654  MODULE
655  IMPORT header_name
656  EXPORT_MODULE MODULE_NAME;
657  IMPORT header_name
658  EXPORT ...
659
660So this file could be triggered like a module unit or a non-module unit depending on the definition
661of some macros.
662However, this kind of usage is forbidden by P1857R3 but we haven't implemented P1857R3 yet.
663This means that is possible to write illegal modules code now, and obviously this will stop working
664once P1857R3 is implemented.
665A simple suggestion would be "Don't play macro tricks with module declarations".
666
667This is tracked in: https://github.com/llvm/llvm-project/issues/56917
668
669In consistent filename suffix requirement for importable module units
670~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
671
672Currently, clang requires the file name of an ``importable module unit`` should end with ``.cppm``
673(or ``.ccm``, ``.cxxm``, ``.c++m``). However, the behavior is inconsistent with other compilers.
674
675This is tracked in: https://github.com/llvm/llvm-project/issues/57416
676
677Header Units
678============
679
680How to build projects using header unit
681---------------------------------------
682
683Quick Start
684~~~~~~~~~~~
685
686For the following example,
687
688.. code-block:: c++
689
690  import <iostream>;
691  int main() {
692    std::cout << "Hello World.\n";
693  }
694
695we could compile it as
696
697.. code-block:: console
698
699  $ clang++ -std=c++20 -xc++-system-header --precompile iostream -o iostream.pcm
700  $ clang++ -std=c++20 -fmodule-file=iostream.pcm main.cpp
701
702How to produce BMIs
703~~~~~~~~~~~~~~~~~~~
704
705Similar to named modules, we could use ``--precompile`` to produce the BMI.
706But we need to specify that the input file is a header by ``-xc++-system-header`` or ``-xc++-user-header``.
707
708Also we could use `-fmodule-header={user,system}` option to produce the BMI for header units
709which has suffix like `.h` or `.hh`.
710The value of `-fmodule-header` means the user search path or the system search path.
711The default value for `-fmodule-header` is `user`.
712For example,
713
714.. code-block:: c++
715
716  // foo.h
717  #include <iostream>
718  void Hello() {
719    std::cout << "Hello World.\n";
720  }
721
722  // use.cpp
723  import "foo.h";
724  int main() {
725    Hello();
726  }
727
728We could compile it as:
729
730.. code-block:: console
731
732  $ clang++ -std=c++20 -fmodule-header foo.h -o foo.pcm
733  $ clang++ -std=c++20 -fmodule-file=foo.pcm use.cpp
734
735For headers which don't have a suffix, we need to pass ``-xc++-header``
736(or ``-xc++-system-header`` or ``-xc++-user-header``) to mark it as a header.
737For example,
738
739.. code-block:: c++
740
741  // use.cpp
742  import "foo.h";
743  int main() {
744    Hello();
745  }
746
747.. code-block:: console
748
749  $ clang++ -std=c++20 -fmodule-header=system -xc++-header iostream -o iostream.pcm
750  $ clang++ -std=c++20 -fmodule-file=iostream.pcm use.cpp
751
752How to specify the dependent BMIs
753~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
754
755We could use ``-fmodule-file`` to specify the BMIs, and this option may occur multiple times as well.
756
757With the existing implementation ``-fprebuilt-module-path`` cannot be used for header units
758(since they are nominally anonymous).
759For header units, use  ``-fmodule-file`` to include the relevant PCM file for each header unit.
760
761This is expect to be solved in future editions of the compiler either by the tooling finding and specifying
762the -fmodule-file or by the use of a module-mapper that understands how to map the header name to their PCMs.
763
764Don't compile the BMI
765~~~~~~~~~~~~~~~~~~~~~
766
767Another difference with modules is that we can't compile the BMI from a header unit.
768For example:
769
770.. code-block:: console
771
772  $ clang++ -std=c++20 -xc++-system-header --precompile iostream -o iostream.pcm
773  # This is not allowed!
774  $ clang++ iostream.pcm -c -o iostream.o
775
776It makes sense due to the semantics of header units, which are just like headers.
777
778Include translation
779~~~~~~~~~~~~~~~~~~~
780
781The C++ spec allows the vendors to convert ``#include header-name`` to ``import header-name;`` when possible.
782Currently, Clang would do this translation for the ``#include`` in the global module fragment.
783
784For example, the following two examples are the same:
785
786.. code-block:: c++
787
788  module;
789  import <iostream>;
790  export module M;
791  export void Hello() {
792    std::cout << "Hello.\n";
793  }
794
795with the following one:
796
797.. code-block:: c++
798
799  module;
800  #include <iostream>
801  export module M;
802  export void Hello() {
803      std::cout << "Hello.\n";
804  }
805
806.. code-block:: console
807
808  $ clang++ -std=c++20 -xc++-system-header --precompile iostream -o iostream.pcm
809  $ clang++ -std=c++20 -fmodule-file=iostream.pcm --precompile M.cppm -o M.cpp
810
811In the latter example, the Clang could find the BMI for the ``<iostream>``
812so it would try to replace the ``#include <iostream>`` to ``import <iostream>;`` automatically.
813
814
815Relationships between Clang modules
816-----------------------------------
817
818Header units have pretty similar semantics with Clang modules.
819The semantics of both of them are like headers.
820
821In fact, we could even "mimic" the sytle of header units by Clang modules:
822
823.. code-block:: c++
824
825  module "iostream" {
826    export *
827    header "/path/to/libstdcxx/iostream"
828  }
829
830.. code-block:: console
831
832  $ clang++ -std=c++20 -fimplicit-modules -fmodule-map-file=.modulemap main.cpp
833
834It would be simpler if we are using libcxx:
835
836.. code-block:: console
837
838  $ clang++ -std=c++20 main.cpp -fimplicit-modules -fimplicit-module-maps
839
840Since there is already one
841`module map <https://github.com/llvm/llvm-project/blob/main/libcxx/include/module.modulemap.in>`_
842in the source of libcxx.
843
844Then immediately leads to the question: why don't we implement header units through Clang header modules?
845
846The main reason for this is that Clang modules have more semantics like hierarchy or
847wrapping multiple headers together as a big module.
848However, these things are not part of Standard C++ Header units,
849and we want to avoid the impression that these additional semantics get interpreted as Standard C++ behavior.
850
851Another reason is that there are proposals to introduce module mappers to the C++ standard
852(for example, https://wg21.link/p1184r2).
853If we decide to reuse Clang's modulemap, we may get in trouble once we need to introduce another module mapper.
854
855So the final answer for why we don't reuse the interface of Clang modules for header units is that
856there are some differences between header units and Clang modules and that ignoring those
857differences now would likely become a problem in the future.
858
859Possible Questions
860==================
861
862How modules speed up compilation
863--------------------------------
864
865A classic theory for the reason why modules speed up the compilation is:
866if there are ``n`` headers and ``m`` source files and each header is included by each source file,
867then the complexity of the compilation is ``O(n*m)``;
868But if there are ``n`` module interfaces and ``m`` source files, the complexity of the compilation is
869``O(n+m)``. So, using modules would be a big win when scaling.
870In a simpler word, we could get rid of many redundant compilations by using modules.
871
872Roughly, this theory is correct. But the problem is that it is too rough.
873The behavior depends on the optimization level, as we will illustrate below.
874
875First is ``O0``. The compilation process is described in the following graph.
876
877.. code-block:: none
878
879  ├-------------frontend----------┼-------------middle end----------------┼----backend----┤
880  │                               │                                       │               │
881  └---parsing----sema----codegen--┴----- transformations ---- codegen ----┴---- codegen --┘
882
883  ┌---------------------------------------------------------------------------------------┐
884  |                                                                                       │
885  |                                     source file                                       │
886  |                                                                                       │
887  └---------------------------------------------------------------------------------------┘
888
889              ┌--------┐
890              │        │
891              │imported│
892              │        │
893              │  code  │
894              │        │
895              └--------┘
896
897Here we can see that the source file (could be a non-module unit or a module unit) would get processed by the
898whole pipeline.
899But the imported code would only get involved in semantic analysis, which is mainly about name lookup,
900overload resolution and template instantiation.
901All of these processes are fast relative to the whole compilation process.
902More importantly, the imported code only needs to be processed once in frontend code generation,
903as well as the whole middle end and backend.
904So we could get a big win for the compilation time in O0.
905
906But with optimizations, things are different:
907
908(we omit ``code generation`` part for each end due to the limited space)
909
910.. code-block:: none
911
912  ├-------- frontend ---------┼--------------- middle end --------------------┼------ backend ----┤
913  │                           │                                               │                   │
914  └--- parsing ---- sema -----┴--- optimizations --- IPO ---- optimizations---┴--- optimizations -┘
915
916  ┌-----------------------------------------------------------------------------------------------┐
917  │                                                                                               │
918  │                                         source file                                           │
919  │                                                                                               │
920  └-----------------------------------------------------------------------------------------------┘
921                ┌---------------------------------------┐
922                │                                       │
923                │                                       │
924                │            imported code              │
925                │                                       │
926                │                                       │
927                └---------------------------------------┘
928
929It would be very unfortunate if we end up with worse performance after using modules.
930The main concern is that when we compile a source file, the compiler needs to see the function body
931of imported module units so that it can perform IPO (InterProcedural Optimization, primarily inlining
932in practice) to optimize functions in current source file with the help of the information provided by
933the imported module units.
934In other words, the imported code would be processed again and again in importee units
935by optimizations (including IPO itself).
936The optimizations before IPO and the IPO itself are the most time-consuming part in whole compilation process.
937So from this perspective, we might not be able to get the improvements described in the theory.
938But we could still save the time for optimizations after IPO and the whole backend.
939
940Overall, at ``O0`` the implementations of functions defined in a module will not impact module users,
941but at higher optimization levels the definitions of such functions are provided to user compilations for the
942purposes of optimization (but definitions of these functions are still not included in the use's object file)-
943this means the build speedup at higher optimization levels may be lower than expected given ``O0`` experience,
944but does provide by more optimization opportunities.
945
946Interoperability with Clang Modules
947-----------------------------------
948
949We **wish** to support clang modules and standard c++ modules at the same time,
950but the mixed using form is not well used/tested yet.
951
952Please file new github issues as you find interoperability problems.
953