1*0fca6ea1SDimitry AndricLarge data sections 2*0fca6ea1SDimitry Andric=================== 3*0fca6ea1SDimitry Andric 4*0fca6ea1SDimitry AndricWhen linking very large binaries, lld may report relocation overflows like 5*0fca6ea1SDimitry Andric 6*0fca6ea1SDimitry Andric:: 7*0fca6ea1SDimitry Andric 8*0fca6ea1SDimitry Andric relocation R_X86_64_PC32 out of range: 2158227201 is not in [-2147483648, 2147483647] 9*0fca6ea1SDimitry Andric 10*0fca6ea1SDimitry AndricThis happens when running into architectural limitations. For example, in x86-64 11*0fca6ea1SDimitry AndricPIC code, a reference to a static global variable is typically done with a 12*0fca6ea1SDimitry Andric``R_X86_64_PC32`` relocation, which is a 32-bit signed offset from the PC. That 13*0fca6ea1SDimitry Andricmeans if the global variable is laid out further than 2GB (2^31 bytes) from the 14*0fca6ea1SDimitry Andricinstruction referencing it, we run into a relocation overflow. 15*0fca6ea1SDimitry Andric 16*0fca6ea1SDimitry Andriclld normally lays out sections as follows: 17*0fca6ea1SDimitry Andric 18*0fca6ea1SDimitry Andric.. image:: section_layout.png 19*0fca6ea1SDimitry Andric 20*0fca6ea1SDimitry AndricThe largest relocation pressure is usually from ``.text`` to the beginning of 21*0fca6ea1SDimitry Andric``.rodata`` or ``.text`` to the end of ``.bss``. 22*0fca6ea1SDimitry Andric 23*0fca6ea1SDimitry AndricSome code models offer a tradeoff between relocation pressure and performance. 24*0fca6ea1SDimitry AndricFor example, x86-64's medium code model splits global variables into small and 25*0fca6ea1SDimitry Andriclarge globals depending on if their size is over a certain threshold. Large 26*0fca6ea1SDimitry Andricglobals are placed further away from text and we use 64-bit references to refer 27*0fca6ea1SDimitry Andricto them. 28*0fca6ea1SDimitry Andric 29*0fca6ea1SDimitry AndricLarge globals are placed in separate sections from small globals, and those 30*0fca6ea1SDimitry Andricsections have a "large" section flag, e.g. ``SHF_X86_64_LARGE`` for x86-64. The 31*0fca6ea1SDimitry Andriclinker places large sections on the outer edges of the binary, making sure they 32*0fca6ea1SDimitry Andricdo not affect affect the distance of small globals to text. The large versions 33*0fca6ea1SDimitry Andricof ``.rodata``, ``.bss``, and ``.data`` are ``.lrodata``, ``.lbss``, and 34*0fca6ea1SDimitry Andric``.ldata``, and they are laid out as follows: 35*0fca6ea1SDimitry Andric 36*0fca6ea1SDimitry Andric.. image:: large_section_layout_pic.png 37*0fca6ea1SDimitry Andric 38*0fca6ea1SDimitry AndricWe try to keep the number of ``PT_LOAD`` segments to a minimum, so we place 39*0fca6ea1SDimitry Andriclarge sections next to the small sections with the same RWX permissions when 40*0fca6ea1SDimitry Andricpossible. 41*0fca6ea1SDimitry Andric 42*0fca6ea1SDimitry Andric``.lbss`` is right after ``.bss`` so that they are merged together and we 43*0fca6ea1SDimitry Andricminimize the number of segments with ``p_memsz > p_filesz``. 44*0fca6ea1SDimitry Andric 45*0fca6ea1SDimitry AndricNote that the above applies to PIC code. For less common non-PIC code with 46*0fca6ea1SDimitry Andricabsolute relocations instead of relative relocations, 32-bit relocations 47*0fca6ea1SDimitry Andrictypically assume that symbols are in the lower 2GB of the address space. So for 48*0fca6ea1SDimitry Andricnon-PIC code, large sections should be placed after all small sections to avoid 49*0fca6ea1SDimitry Andric``.lrodata`` pushing small symbols out of the lower 2GB of the address space. 50*0fca6ea1SDimitry Andric``-z lrodata-after-bss`` changes the layout to be: 51*0fca6ea1SDimitry Andric 52*0fca6ea1SDimitry Andric.. image:: large_section_layout_nopic.png 53