|
Revision tags: llvmorg-21-init |
|
| #
3bb969f3 |
| 15-Jan-2025 |
Slava Zakharin <szakharin@nvidia.com> |
[flang] Inline hlfir.matmul[_transpose]. (#122821)
Inlining `hlfir.matmul` as `hlfir.eval_in_mem` does not allow
to get rid of a temporary array in many cases, but it may still be
much better allo
[flang] Inline hlfir.matmul[_transpose]. (#122821)
Inlining `hlfir.matmul` as `hlfir.eval_in_mem` does not allow
to get rid of a temporary array in many cases, but it may still be
much better allowing to:
* Get rid of any overhead related to calling runtime MATMUL
(such as descriptors creation).
* Use CPU-specific vectorization cost model for matmul loops,
which Fortran runtime cannot currently do.
* Optimize matmul of known-size arrays by complete unrolling.
One of the drawbacks of `hlfir.eval_in_mem` inlining is that
the ops inside it with store memory effects block the current
MLIR CSE, so I decided to run this inlining late in the pipeline.
There is a source commen explaining the CSE issue in more detail.
Straightforward inlining of `hlfir.matmul` as an `hlfir.elemental`
is not good for performance, and I got performance regressions
with it comparing to Fortran runtime implementation. I put it
under an enigneering option for experiments.
At the same time, inlining `hlfir.matmul_transpose` as `hlfir.elemental`
seems to be a good approach, e.g. it allows getting rid of a temporay
array in cases like: `A(:)=B(:)+MATMUL(TRANSPOSE(C(:,:)),D(:))`.
This patch improves performance of galgel and tonto a little bit.
show more ...
|
|
Revision tags: llvmorg-19.1.7 |
|
| #
611c96af |
| 07-Jan-2025 |
Slava Zakharin <szakharin@nvidia.com> |
[flang] Schedule InlineHLFIRAssign after BufferizeHLFIR. (#121863)
This helps to get rid of *some* calls to AssignTemporary runtime
that are appearing due to temporary_lhs hlfir.assign produced
in
[flang] Schedule InlineHLFIRAssign after BufferizeHLFIR. (#121863)
This helps to get rid of *some* calls to AssignTemporary runtime
that are appearing due to temporary_lhs hlfir.assign produced
in BufferizeHLFIR. I only tested it on `tonto`, and did not see
any performance changes. I will run more performance testing
before merging this.
show more ...
|
| #
3c700d13 |
| 03-Jan-2025 |
Slava Zakharin <szakharin@nvidia.com> |
[flang] Extract hlfir.assign inlining from opt-bufferization. (#121544)
Optimized bufferization can transform hlfir.assign into a loop
nest doing element per element assignment, but it avoids
doin
[flang] Extract hlfir.assign inlining from opt-bufferization. (#121544)
Optimized bufferization can transform hlfir.assign into a loop
nest doing element per element assignment, but it avoids
doing so for RHS that is hlfir.expr. This is done to let
ElementalAssignBufferization pattern to try to do a better job.
This patch moves the hlfir.assign inlining after opt-bufferization,
and enables it for hlfir.expr RHS.
The hlfir.expr RHS cases are present in tonto, and this patch
results in some nice improvements. Note that those cases
are handled by other compilers also using array temporaries,
so this patch seems to just get rid of the Assign runtime
overhead/inefficiency.
show more ...
|
|
Revision tags: llvmorg-19.1.6 |
|
| #
1d4b5c16 |
| 09-Dec-2024 |
Valentin Clement (バレンタイン クレメン) <clementval@gmail.com> |
[flang][cuda] Change how abstract result pass is scheduled on func.func and gpu.func (#119034)
Use `pm.nest` to schedule the pass on nested `func.func` and `gpu.func`
in the `gpu.module`.
Abstra
[flang][cuda] Change how abstract result pass is scheduled on func.func and gpu.func (#119034)
Use `pm.nest` to schedule the pass on nested `func.func` and `gpu.func`
in the `gpu.module`.
AbstractResult pass is not meant to run on the whole gpu.module at once.
show more ...
|
| #
5522d246 |
| 03-Dec-2024 |
Valentin Clement (バレンタイン クレメン) <clementval@gmail.com> |
[flang][cuda] Allow AbstractResult to run in gpu.module (#118529)
in CUDA Fortran, device function are converted to `gpu.func` inside the
`gpu.module` operation. Update the AbstractResult pass to b
[flang][cuda] Allow AbstractResult to run in gpu.module (#118529)
in CUDA Fortran, device function are converted to `gpu.func` inside the
`gpu.module` operation. Update the AbstractResult pass to be able to run
on `func.func` and `gpu.func` operations inside the `gpu.module`.
show more ...
|
|
Revision tags: llvmorg-19.1.5 |
|
| #
f3cf24fc |
| 28-Nov-2024 |
s-watanabe314 <watanabe.shu-06@fujitsu.com> |
[flang] Apply nocapture attribute to dummy arguments (#116182)
Apply llvm.nocapture attribute to dummy arguments that do not have the
target, asynchronous, volatile, or pointer attributes in a proc
[flang] Apply nocapture attribute to dummy arguments (#116182)
Apply llvm.nocapture attribute to dummy arguments that do not have the
target, asynchronous, volatile, or pointer attributes in a procedure
that is not a bind(c). This was discussed in
https://discourse.llvm.org/t/applying-the-nocapture-attribute-to-reference-passed-arguments-in-fortran-subroutines/81401
show more ...
|
|
Revision tags: llvmorg-19.1.4 |
|
| #
e7e55416 |
| 19-Nov-2024 |
Ivan R. Ivanov <ivanov.i.aa@m.titech.ac.jp> |
[flang] Lower omp.workshare to other omp constructs (#101446)
Add a new pass that lowers an `omp.workshare` with its binding `omp.workshare.loop_wrapper` loop nests into other OpenMP constructs that
[flang] Lower omp.workshare to other omp constructs (#101446)
Add a new pass that lowers an `omp.workshare` with its binding `omp.workshare.loop_wrapper` loop nests into other OpenMP constructs that can be lowered to LLVM.
More specifically, in order to preserve the sequential execution semantics of the code contained, it wraps portions that needs to be executed on a single thread in `omp.single` blocks, converts code that must be parallelized into `omp.wsloop` nests and inserts the appropriate synchronization.
show more ...
|
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
| #
cfd4c180 |
| 21-Aug-2024 |
Slava Zakharin <szakharin@nvidia.com> |
[RFC][flang] Replace special symbols in uniqued global names. (#104859)
This change addresses more "issues" as the one resolved in #71338.
Some targets (e.g. NVPTX) do not accept global names conta
[RFC][flang] Replace special symbols in uniqued global names. (#104859)
This change addresses more "issues" as the one resolved in #71338.
Some targets (e.g. NVPTX) do not accept global names containing
`.`. In particular, the global variables created to represent
the runtime information of derived types use `.` in their names.
A derived type's descriptor object may be used in the device code,
e.g. to initialize a descriptor of a variable of this type.
Thus, the runtime type info objects may need to be compiled
for the device.
Moreover, at least the derived types' descriptor objects
may need to be registered (think of `omp declare target`)
for the host-device association so that the addendum pointer
can be properly mapped to the device for descriptors using
a derived type's descriptor as their addendum pointer.
The registration implies knowing the name of the global variable
in the device image so that proper host code can be created.
So it is better to name the globals the same way for the host
and the device.
CompilerGeneratedNamesConversion pass renames all uniqued globals
such that the special symbols (currently `.`) are replaced
with `X`. The pass is supposed to be run for the host and the device.
An option is added to FIR-to-LLVM conversion pass to indicate
whether the new pass has been run before or not. This setting
affects how the codegen computes the names of the derived types'
descriptors for FIR derived types.
fir::NameUniquer now allows `X` to be part of a name, because
the name deconstruction may be applied to the mangled names
after CompilerGeneratedNamesConversion pass.
show more ...
|
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
| #
e73cf2f0 |
| 15-Jul-2024 |
Matthias Springer <me@m-sp.org> |
[flang] Remove materialization workaround in type converter (#98743)
This change is in preparation of #97903, which adds extra checks for
materializations: it is now enforced that they produce an S
[flang] Remove materialization workaround in type converter (#98743)
This change is in preparation of #97903, which adds extra checks for
materializations: it is now enforced that they produce an SSA value of
the correct type, so the current workaround no longer works.
The original workaround avoided target materializations by directly
returning the to-be-converted SSA value from the materialization
callback. This can be avoided by initializing the lowering patterns that
insert the materializations without a type converter. For
`cg::XEmboxOp`, the existing workaround that skips
`unrealized_conversion_cast` ops is still in place.
Also remove the lowering pattern for `unrealized_conversion_cast`. This
pattern has no effect because `unrealized_conversion_cast` ops that are
inserted by the dialect conversion framework are never matched by the
pattern driver.
show more ...
|
|
Revision tags: llvmorg-18.1.8 |
|
| #
29d857f1 |
| 14-Jun-2024 |
Valentin Clement (バレンタイン クレメン) <clementval@gmail.com> |
[flang] Add stack reclaim pass to reclaim allocas in loop (#95309)
Some passes in the flang pipeline are creating `fir.alloca` operation
like `hlfir.concat`. When these allocas are located in a loo
[flang] Add stack reclaim pass to reclaim allocas in loop (#95309)
Some passes in the flang pipeline are creating `fir.alloca` operation
like `hlfir.concat`. When these allocas are located in a loop, the stack
can quickly be used too much leading to segfaults.
This behavior can be seen in
https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90
This patch insert a call to LLVM stacksave/stackrestore in the body of
the loop to reclaim the alloca in its scope.
This PR is an alternative implementation to #95173
show more ...
|
|
Revision tags: llvmorg-18.1.7 |
|
| #
f1d13bbd |
| 27-May-2024 |
jeanPerier <jperier@nvidia.com> |
[flang] add FIR to FIR pass to lower assumed-rank operations (#93344)
Add pass to lower assumed-rank operations. The current patch adds
codegen for fir.rebox_assumed_rank. It will be the pass lower
[flang] add FIR to FIR pass to lower assumed-rank operations (#93344)
Add pass to lower assumed-rank operations. The current patch adds
codegen for fir.rebox_assumed_rank. It will be the pass lowering
fir.select_rank.
fir.rebox_assumed_rank is lowered to a call to CopyAndUpdateDescriptor
runtime API.
Note that the lowering ends-up allocating two new descriptors at the
LLVM level (one alloca created by the pass for the CopyAndUpdateDescriptor
result descriptor argument, the second one is created by the fir.load
of the result descriptor in codegen).
LLVM is currently unable to properly optimize and merge those allocas.
The "nocapture" attribute added to CopyAndUpdateDescriptor arguments
gives part of the information to LLVM, but the fir.load codegen of
descriptors must be updated to use llvm.memcpy instead of
llvm.load+store to allow LLVM to optimize it. This will be done in later patch.
show more ...
|
| #
9807f25b |
| 22-May-2024 |
Tom Eccles <tom.eccles@arm.com> |
[flang][HLFIR] Adapt OptimizedBufferization to run on all top level ops (#92898)
This means that this pass will also run on hlfir elemental operations
which are not inside of functions.
See RFC:
[flang][HLFIR] Adapt OptimizedBufferization to run on all top level ops (#92898)
This means that this pass will also run on hlfir elemental operations
which are not inside of functions.
See RFC:
https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations
Some of the changes are from moving the declaration and definition of
the constructor into tablegen (as requested during code review of
another pass).
show more ...
|
| #
6ff82363 |
| 21-May-2024 |
Tom Eccles <tom.eccles@arm.com> |
[flang][HLFIR] Adapt InlineElementals to run on all top level ops (#92734)
This means that this pass will also run on hlfir elemental operations
which are not inside of functions.
See RFC:
ht
[flang][HLFIR] Adapt InlineElementals to run on all top level ops (#92734)
This means that this pass will also run on hlfir elemental operations
which are not inside of functions.
See RFC:
https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations
Some of the changes are from moving the declaration and definition of
the constructor into tablegen (as requested during code review of
another pass).
While I was updating the tests I noticed that the optimized
bufferization pass and some cse were missing from the optimized pipeline
in flang/test/Driver/mlir-pass-pipeline.f90. I fixed this in this
commit.
show more ...
|
| #
605ae4e9 |
| 20-May-2024 |
Tom Eccles <tom.eccles@arm.com> |
[flang][HLFIR] Adapt SimplifyHLFIRIntrinsics to run on all top level ops (#92573)
This means that this pass will also run on hlfir intrinsics which are
not inside of functions.
See RFC:
https
[flang][HLFIR] Adapt SimplifyHLFIRIntrinsics to run on all top level ops (#92573)
This means that this pass will also run on hlfir intrinsics which are
not inside of functions.
See RFC:
https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations
Some of the changes are from moving the declaration and definition of
the constructor into tablegen (as requested during code review of
another pass).
show more ...
|
|
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5 |
|
| #
d1b3648e |
| 01-May-2024 |
Tom Eccles <tom.eccles@arm.com> |
[flang] always run PolymorphicOpConversion sequentially (#90721)
It was pointed out in post commit review of
https://github.com/llvm/llvm-project/pull/90597 that the pass should
never have been ru
[flang] always run PolymorphicOpConversion sequentially (#90721)
It was pointed out in post commit review of
https://github.com/llvm/llvm-project/pull/90597 that the pass should
never have been run in parallel over all functions (and now other top
level operations) in the first place. The mutex used in the pass was
ineffective at preventing races since each instance of the pass would
have a different mutex.
show more ...
|
| #
df513f86 |
| 30-Apr-2024 |
Tom Eccles <tom.eccles@arm.com> |
[flang] Adapt PolymorphicOpConversion to run on all top level ops (#90597)
We might use polymorphic ops in top-level operations other than
functions some time in the future. We need to ensure that
[flang] Adapt PolymorphicOpConversion to run on all top level ops (#90597)
We might use polymorphic ops in top-level operations other than
functions some time in the future. We need to ensure that these
operations can be lowered.
See RFC:
https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations
Some of the changes are from moving declaration and definition of the
constructor function into tablegen (as requested in code review when
altering another pass).
show more ...
|
| #
3785d742 |
| 29-Apr-2024 |
Kareem Ergawy <kareem.ergawy@amd.com> |
[flang][OpenMP][LLVMIR] Support CFG and LLVM IR conversion for `omp.p… (#90164)
…rivate`
Adds support for CFG conversion and conversion to LLVM IR for
`omp.private` ops. This bridges a gap betwe
[flang][OpenMP][LLVMIR] Support CFG and LLVM IR conversion for `omp.p… (#90164)
…rivate`
Adds support for CFG conversion and conversion to LLVM IR for
`omp.private` ops. This bridges a gap between FIR and LLVM to provide
more support for lowering `omp.private` ops for things like
allocatables.
show more ...
|
| #
7bc0177f |
| 25-Apr-2024 |
Tom Eccles <tom.eccles@arm.com> |
[flang] run character conversion pass on all top level ops (#89910)
See RFC:
https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations
Some of the changes are from m
[flang] run character conversion pass on all top level ops (#89910)
See RFC:
https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations
Some of the changes are from moving declaration and definition of the
constructor function into tablegen (as requested in code review when
altering another pass).
show more ...
|
| #
ceca5235 |
| 24-Apr-2024 |
Tom Eccles <tom.eccles@arm.com> |
[flang] de-duplicate CFGConversion pass (#89783)
See RFC at
https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations
I previously did the same for the AbstractResult
[flang] de-duplicate CFGConversion pass (#89783)
See RFC at
https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations
I previously did the same for the AbstractResult pass
https://github.com/llvm/llvm-project/pull/88867
show more ...
|
| #
bfd19445 |
| 22-Apr-2024 |
Tom Eccles <tom.eccles@arm.com> |
[flang] de-duplicate AbstractResult pass (#88867)
This is the first proof of concept of the modification of FIR codegen to
fully support a variety of top level operations (beyond just func.func)
p
[flang] de-duplicate AbstractResult pass (#88867)
This is the first proof of concept of the modification of FIR codegen to
fully support a variety of top level operations (beyond just func.func)
proposed in
https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations
show more ...
|
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3 |
|
| #
d84252e0 |
| 20-Mar-2024 |
Sergio Afonso <safonsof@amd.com> |
[MLIR][OpenMP] NFC: Uniformize OpenMP ops names (#85393)
This patch proposes the renaming of certain OpenMP dialect operations with the
goal of improving readability and following a uniform naming
[MLIR][OpenMP] NFC: Uniformize OpenMP ops names (#85393)
This patch proposes the renaming of certain OpenMP dialect operations with the
goal of improving readability and following a uniform naming convention for
MLIR operations and associated classes. In particular, the following operations
are renamed:
- `omp.map_info` -> `omp.map.info`
- `omp.target_update_data` -> `omp.target_update`
- `omp.ordered_region` -> `omp.ordered.region`
- `omp.cancellationpoint` -> `omp.cancellation_point`
- `omp.bounds` -> `omp.map.bounds`
- `omp.reduction.declare` -> `omp.declare_reduction`
Also, the following MLIR operation classes have been renamed:
- `omp::TaskLoopOp` -> `omp::TaskloopOp`
- `omp::TaskGroupOp` -> `omp::TaskgroupOp`
- `omp::DataBoundsOp` -> `omp::MapBoundsOp`
- `omp::DataOp` -> `omp::TargetDataOp`
- `omp::EnterDataOp` -> `omp::TargetEnterDataOp`
- `omp::ExitDataOp` -> `omp::TargetExitDataOp`
- `omp::UpdateDataOp` -> `omp::TargetUpdateOp`
- `omp::ReductionDeclareOp` -> `omp::DeclareReductionOp`
- `omp::WsLoopOp` -> `omp::WsloopOp`
show more ...
|
| #
1f1e0948 |
| 20-Mar-2024 |
Tom Eccles <tom.eccles@arm.com> |
[flang] run CFG conversion on omp reduction declare ops (#84953)
Most FIR passes only look for FIR operations inside of functions (either
because they run only on func.func or they run on the modul
[flang] run CFG conversion on omp reduction declare ops (#84953)
Most FIR passes only look for FIR operations inside of functions (either
because they run only on func.func or they run on the module but iterate
over functions internally). But there can also be FIR operations inside
of fir.global, some OpenMP and OpenACC container operations.
This has worked so far for fir.global and OpenMP reductions because they
only contained very simple FIR code which doesn't need most passes to be
lowered into LLVM IR. I am not sure how OpenACC works.
In the long run, I hope to see a more systematic approach to making sure
that every pass runs on all of these container operations. I will write
an RFC for this soon.
In the meantime, this pass duplicates the CFG conversion pass to also
run on omp reduction operations. This is similar to how the
AbstractResult pass is already duplicated for fir.global operations.
OpenMP array reductions 2/6
Previous PR: https://github.com/llvm/llvm-project/pull/84952
Next PR: https://github.com/llvm/llvm-project/pull/84954
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
show more ...
|
|
Revision tags: llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
| #
374e8288 |
| 04-Dec-2023 |
Tom Eccles <tom.eccles@arm.com> |
[flang] (Re-)Enable alias tags pass by default (#74250)
Enable by default for optimization levels higher than 0 (same behavior
as clang).
For simplicity, only forward the flag to the frontend dr
[flang] (Re-)Enable alias tags pass by default (#74250)
Enable by default for optimization levels higher than 0 (same behavior
as clang).
For simplicity, only forward the flag to the frontend driver when it
contradicts what is implied by the optimization level.
This was first landed in
https://github.com/llvm/llvm-project/pull/73111 but was later reverted
due to a performance regression. That regression was fixed by
https://github.com/llvm/llvm-project/pull/74065.
show more ...
|
| #
5ce5ea37 |
| 29-Nov-2023 |
Tom Eccles <tom.eccles@arm.com> |
Revert "[flang] Enable alias tags pass by default (#73111)" (#73821)
This reverts commit caba0314cf631a3ba3e982cbcdc455224046c7a8.
Serious performance regressions were reported by @vzakhari
http
Revert "[flang] Enable alias tags pass by default (#73111)" (#73821)
This reverts commit caba0314cf631a3ba3e982cbcdc455224046c7a8.
Serious performance regressions were reported by @vzakhari
https://github.com/llvm/llvm-project/issues/58303#issuecomment-1830754173
Fixing this doesn't look quick so I will revert for now.
show more ...
|
|
Revision tags: llvmorg-17.0.6 |
|
| #
caba0314 |
| 27-Nov-2023 |
Tom Eccles <tom.eccles@arm.com> |
[flang] Enable alias tags pass by default (#73111)
Enable by default for optimization levels higher than 0 (same behavior
as clang).
For simplicity, only forward the flag to the frontend driver
[flang] Enable alias tags pass by default (#73111)
Enable by default for optimization levels higher than 0 (same behavior
as clang).
For simplicity, only forward the flag to the frontend driver when it
contradicts what is implied by the optimization level.
Since https://github.com/llvm/llvm-project/pull/72903 there are now no
known performance regressions.
show more ...
|