xref: /netbsd-src/libexec/ld.elf_so/README.TLS (revision 32d1c65c71fbdb65a012e8392a62a757dd6853e9)
1Thread-local storage.
2
3Each thread has a thread control block, or TCB.  The TCB is a
4variable-size structure headed by `struct tls_tcb' from <sys/tls.h>,
5with:
6
7(a) static thread-local storage for the TLS data of initial objects,
8    i.e., those loaded at startup rather than those dynamically loaded
9    by dlopen
10
11(b) a pointer to a dynamic thread vector (DTV) for the TLS data
12    pointers of objects that use global-dynamic or local-dynamic models
13    (typically shared libraries or dlopenable modules)
14
15(c) the pthread_t pointer
16
17The per-thread lwp private pointer, also sometimes called TP (thread
18pointer), managed by the _lwp_setprivate and _lwp_setprivate syscalls,
19either points at the TCB directly, or, on some architectures, points at
20
21	tp = tcb + sizeof(struct tls_tcb) + TLS_TP_OFFSET.
22
23This bias is chosen for architectures where signed displacements from
24TP enable twice the range of static TLS offsets when biased like this.
25Architectures with such a tp/tcb offset must provide
26
27void *__lwp_gettcb_fast(void);
28
29in machine/mcontext.h and must define __HAVE___LWP_GETTCB_FAST in
30machine/types.h to reflect this; otherwise they must provide
31__lwp_getprivate_fast to return the TCB pointer.
32
33Each architecture has one of two TLS variants, variant I or variant II.
34Variant I places the static thread-local storage _after_ the fixed
35content of the TCB, at increasing addresses (increasing addresses grow
36down in diagram):
37
38	+---------------+
39	| dtv pointer   |       tcb points here (struct tls_tcb)
40	+---------------+
41	| pthread_t     |
42	+---------------+
43	| obj0 tls      |       obj0->tlsoffset = 0
44	|               |
45	|               |
46	+---------------+
47	| obj1 tls      |       obj1->tlsoffset = 3
48	+---------------+
49	| obj2 tls      |       obj2->tlsoffset = 4
50	|               |
51	.		.
52	.		.
53	.		.
54	|               |
55	+---------------+
56	| objN tls      |       objN->tlsoffset = k
57	+---------------+
58
59Variant II places the static thread-local storage _before_ the fixed
60content of the TCB, at decreasing addresses:
61
62	+---------------+
63	| objN tls      |       objN->tlsoffset = k
64	+---------------+
65	| obj(N-1) tls  |       obj(N-1)->tlsoffset = k - 1
66	.               .
67	.               .
68	.               .
69	|               |
70	+---------------+
71	| obj2 tls      |       obj2->tlsoffset = 4
72	+---------------+
73	| obj1 tls      |       obj1->tlsoffset = 3
74	+---------------+
75	| obj0 tls      |       obj0->tlsoffset = 0
76	|               |
77	|               |
78	+---------------+
79	| tcb pointer   |       tcb points here (struct tls_tcb)
80	+---------------+
81	| dtv pointer   |
82	+---------------+
83	| pthread_t     |
84	+---------------+
85
86See [ELFTLS] Sec. 3 `Run-Time Handling of TLS', Figs 1 and 2, for
87bigger pictures including the DTV and dynamically allocated TLS blocks.
88
89Each architecture also has its own ELF ABI processor supplement with
90the architecture-specific relocations and TLS details.
91
92References:
93
94	[ELFTLS] Ulrich Drepper, `ELF Handling For Thread-Local
95	Storage', Version 0.21, 2023-08-22.
96	https://akkadia.org/drepper/tls.pdf
97	https://web.archive.org/web/20240718081934/https://akkadia.org/drepper/tls.pdf
98
99Steps for adding TLS support for a new platform:
100
101(1) Declare TLS variant in machine/types.h by defining either
102__HAVE_TLS_VARIANT_I or __HAVE_TLS_VARIANT_II.
103
104(2) _lwp_makecontext has to set the reserved register or kernel
105transfer variable in uc_mcontext according to the provided value of
106`private'.  Note that _lwp_makecontext takes tcb, not tp, as an
107argument, so make sure to adjust it if needed for the tp/tcb offset.
108See src/lib/libc/arch/$PLATFORM/gen/_lwp.c.
109
110This is not possible on the VAX as there is no free space in ucontext_t.
111This requires either a special version of _lwp_create or versioning
112everything using ucontext_t. Debug support depends on getting the data from
113ucontext_t, so the second option is possibly required.
114
115(3) _lwp_setprivate(2) has to update the same register as
116_lwp_makecontext uses for the private area pointer. Normally
117cpu_lwp_setprivate is provided by MD to reflect the kernel view and
118enabled by defining __HAVE_CPU_LWP_SETPRIVATE in machine/types.h.
119cpu_setmcontext is responsible for keeping the MI l_private field
120synchronised by calling lwp_setprivate as needed.
121
122cpu_switchto has to update the mapping.
123
124_lwp_setprivate is used for the initial thread, all other threads
125created by libpthread use _lwp_makecontext for this purpose.
126
127(4) Provide __tls_get_addr and possible other MD functions for dynamic
128TLS offset computation. If such alternative entry points exist (currently
129only i386), also add a weak reference to 0 in src/lib/libc/tls/tls.c.
130
131The generic implementation can be found in tls.c and is used with
132__HAVE_COMMON___TLS_GET_ADDR. It depends on __lwp_getprivate_fast
133(see below).
134
135(5) Implement the necessary relocation records in mdreloc.c.  There are
136typically three relocation types found in dynamic binaries:
137
138(a) R_TYPE(TLS_DTPOFF): Offset inside the module.  The common TLS code
139ensures that the DTV vector points to offset 0 inside the module TLS block.
140This is normally def->st_value + rela->r_addend.
141
142(b) R_TYPE(TLS_DTPMOD): Module index.
143
144(c) R_TYPE(TLS_TPOFF): Static TLS offset.  The code has to check whether
145the static TLS offset for this module has been allocated
146(defobj->tls_static) and otherwise call _rtld_tls_offset_allocate().  This
147may fail if no static space is available and the object has been pulled
148in via dlopen(3). It can also fail if the TLS area has already been used
149via a global-dynamic allocation.
150
151For TLS Variant I, this is typically:
152
153def->st_value + rela->r_addend + defobj->tlsoffset + sizeof(struct tls_tcb)
154
155e.g. the relocation doesn't include the fixed TCB.
156
157For TLS Variant II, this is typically:
158
159def->st_value - defobj->tlsoffset + rela->r_addend
160
161e.g. starting offset is counting down from the TCB.
162
163(6) If there is a tp/tcb offset, implement
164
165	__lwp_gettcb_fast()
166	__lwp_settcb()
167
168in machine/mcontext.h and set
169
170	__HAVE___LWP_GETTCB_FAST
171	__HAVE___LWP_SETTCB
172
173in machine/types.h.
174
175Otherwise, implement __lwp_getprivate_fast() in machine/mcontext.h and
176set __HAVE___LWP_GETPRIVATE_FAST in machine/types.h.
177
178(7) Test using src/tests/lib/libc/tls and src/tests/libexec/ld.elf_so.
179Make sure with "objdump -R" that t_tls_dynamic has two TPOFF
180relocations and h_tls_dlopen.so.1 and libh_tls_dynamic.so.1 have both
181two DTPMOD and DTPOFF relocations.
182