#
2a509215 |
| 01-Feb-2024 |
Amy Kwan <amy.kwan1@ibm.com> |
[AIX][TLS] Optimize the small local-exec access sequence for non-zero offsets (#71485)
This patch utilizes the -maix-small-local-exec-tls option to produce a
faster,
non-TOC-based access sequence
[AIX][TLS] Optimize the small local-exec access sequence for non-zero offsets (#71485)
This patch utilizes the -maix-small-local-exec-tls option to produce a
faster,
non-TOC-based access sequence for the local-exec TLS model.
Specifically, for
when the offsets from the TLS variable are non-zero.
In particular, this patch produces either a single:
- addi/la with a displacement off of R13 plus a non-zero offset for when
an address is calculated, or
- load or store off of R13 plus a non-zero offset for when an address is
calculated and used for further
access where R13 is the thread pointer, respectively.
In order to produce a single addi or load/store off of the thread
pointer with a non-zero offset,
this patch also adds the necessary support in the assembly printer when
printing these instructions.
Specifically:
- The non-zero offset is added to the TLS variable address when the
address of the
TLS variable + it's offset is less than 32KB.
- Otherwise, when the address of the TLS variable + its offset is
greater than 32KB, the
non-zero offset (and a multiple of 64KB) is subtracted from the TLS
address.
This handling in the assembly printer is necessary to ensure that the
TLS address + the non-zero offset
is between [-32768, 32768), so that the total displacement can fit
within the addi/load/store instructions.
This patch is meant to be a follow-up to
3f46e5453d9310b15d974e876f6132e3cf50c4b1 (where the
optimization occurs for when the offset is zero).
show more ...
|