basic-program.fir - OpenGrok history log for /llvm-project/flang/test/Fir/basic-program.fir

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# 3bb969f3	15-Jan-2025	Slava Zakharin <szakharin@nvidia.com>	[flang] Inline hlfir.matmul[_transpose]. (#122821) Inlining `hlfir.matmul` as `hlfir.eval_in_mem` does not allow to get rid of a temporary array in many cases, but it may still be much better allo [flang] Inline hlfir.matmul[_transpose]. (#122821) Inlining `hlfir.matmul` as `hlfir.eval_in_mem` does not allow to get rid of a temporary array in many cases, but it may still be much better allowing to: * Get rid of any overhead related to calling runtime MATMUL (such as descriptors creation). * Use CPU-specific vectorization cost model for matmul loops, which Fortran runtime cannot currently do. * Optimize matmul of known-size arrays by complete unrolling. One of the drawbacks of `hlfir.eval_in_mem` inlining is that the ops inside it with store memory effects block the current MLIR CSE, so I decided to run this inlining late in the pipeline. There is a source commen explaining the CSE issue in more detail. Straightforward inlining of `hlfir.matmul` as an `hlfir.elemental` is not good for performance, and I got performance regressions with it comparing to Fortran runtime implementation. I put it under an enigneering option for experiments. At the same time, inlining `hlfir.matmul_transpose` as `hlfir.elemental` seems to be a good approach, e.g. it allows getting rid of a temporay array in cases like: `A(:)=B(:)+MATMUL(TRANSPOSE(C(:,:)),D(:))`. This patch improves performance of galgel and tonto a little bit. show more ...
Revision tags: llvmorg-19.1.7
# 611c96af	07-Jan-2025	Slava Zakharin <szakharin@nvidia.com>	[flang] Schedule InlineHLFIRAssign after BufferizeHLFIR. (#121863) This helps to get rid of some calls to AssignTemporary runtime that are appearing due to temporary_lhs hlfir.assign produced in [flang] Schedule InlineHLFIRAssign after BufferizeHLFIR. (#121863) This helps to get rid of some calls to AssignTemporary runtime that are appearing due to temporary_lhs hlfir.assign produced in BufferizeHLFIR. I only tested it on `tonto`, and did not see any performance changes. I will run more performance testing before merging this. show more ...
# 3c700d13	03-Jan-2025	Slava Zakharin <szakharin@nvidia.com>	[flang] Extract hlfir.assign inlining from opt-bufferization. (#121544) Optimized bufferization can transform hlfir.assign into a loop nest doing element per element assignment, but it avoids doin [flang] Extract hlfir.assign inlining from opt-bufferization. (#121544) Optimized bufferization can transform hlfir.assign into a loop nest doing element per element assignment, but it avoids doing so for RHS that is hlfir.expr. This is done to let ElementalAssignBufferization pattern to try to do a better job. This patch moves the hlfir.assign inlining after opt-bufferization, and enables it for hlfir.expr RHS. The hlfir.expr RHS cases are present in tonto, and this patch results in some nice improvements. Note that those cases are handled by other compilers also using array temporaries, so this patch seems to just get rid of the Assign runtime overhead/inefficiency. show more ...
Revision tags: llvmorg-19.1.6
# 1d4b5c16	09-Dec-2024	Valentin Clement (バレンタインクレメン) <clementval@gmail.com>	[flang][cuda] Change how abstract result pass is scheduled on func.func and gpu.func (#119034) Use `pm.nest` to schedule the pass on nested `func.func` and `gpu.func` in the `gpu.module`. Abstra [flang][cuda] Change how abstract result pass is scheduled on func.func and gpu.func (#119034) Use `pm.nest` to schedule the pass on nested `func.func` and `gpu.func` in the `gpu.module`. AbstractResult pass is not meant to run on the whole gpu.module at once. show more ...
# 5522d246	03-Dec-2024	Valentin Clement (バレンタインクレメン) <clementval@gmail.com>	[flang][cuda] Allow AbstractResult to run in gpu.module (#118529) in CUDA Fortran, device function are converted to `gpu.func` inside the `gpu.module` operation. Update the AbstractResult pass to b [flang][cuda] Allow AbstractResult to run in gpu.module (#118529) in CUDA Fortran, device function are converted to `gpu.func` inside the `gpu.module` operation. Update the AbstractResult pass to be able to run on `func.func` and `gpu.func` operations inside the `gpu.module`. show more ...
Revision tags: llvmorg-19.1.5
# f3cf24fc	28-Nov-2024	s-watanabe314 <watanabe.shu-06@fujitsu.com>	[flang] Apply nocapture attribute to dummy arguments (#116182) Apply llvm.nocapture attribute to dummy arguments that do not have the target, asynchronous, volatile, or pointer attributes in a proc [flang] Apply nocapture attribute to dummy arguments (#116182) Apply llvm.nocapture attribute to dummy arguments that do not have the target, asynchronous, volatile, or pointer attributes in a procedure that is not a bind(c). This was discussed in https://discourse.llvm.org/t/applying-the-nocapture-attribute-to-reference-passed-arguments-in-fortran-subroutines/81401 show more ...
Revision tags: llvmorg-19.1.4
# e7e55416	19-Nov-2024	Ivan R. Ivanov <ivanov.i.aa@m.titech.ac.jp>	[flang] Lower omp.workshare to other omp constructs (#101446) Add a new pass that lowers an `omp.workshare` with its binding `omp.workshare.loop_wrapper` loop nests into other OpenMP constructs that [flang] Lower omp.workshare to other omp constructs (#101446) Add a new pass that lowers an `omp.workshare` with its binding `omp.workshare.loop_wrapper` loop nests into other OpenMP constructs that can be lowered to LLVM. More specifically, in order to preserve the sequential execution semantics of the code contained, it wraps portions that needs to be executed on a single thread in `omp.single` blocks, converts code that must be parallelized into `omp.wsloop` nests and inserts the appropriate synchronization. show more ...
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4
# cfd4c180	21-Aug-2024	Slava Zakharin <szakharin@nvidia.com>	[RFC][flang] Replace special symbols in uniqued global names. (#104859) This change addresses more "issues" as the one resolved in #71338. Some targets (e.g. NVPTX) do not accept global names conta [RFC][flang] Replace special symbols in uniqued global names. (#104859) This change addresses more "issues" as the one resolved in #71338. Some targets (e.g. NVPTX) do not accept global names containing `.`. In particular, the global variables created to represent the runtime information of derived types use `.` in their names. A derived type's descriptor object may be used in the device code, e.g. to initialize a descriptor of a variable of this type. Thus, the runtime type info objects may need to be compiled for the device. Moreover, at least the derived types' descriptor objects may need to be registered (think of `omp declare target`) for the host-device association so that the addendum pointer can be properly mapped to the device for descriptors using a derived type's descriptor as their addendum pointer. The registration implies knowing the name of the global variable in the device image so that proper host code can be created. So it is better to name the globals the same way for the host and the device. CompilerGeneratedNamesConversion pass renames all uniqued globals such that the special symbols (currently `.`) are replaced with `X`. The pass is supposed to be run for the host and the device. An option is added to FIR-to-LLVM conversion pass to indicate whether the new pass has been run before or not. This setting affects how the codegen computes the names of the derived types' descriptors for FIR derived types. fir::NameUniquer now allows `X` to be part of a name, because the name deconstruction may be applied to the mangled names after CompilerGeneratedNamesConversion pass. show more ...
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# e73cf2f0	15-Jul-2024	Matthias Springer <me@m-sp.org>	[flang] Remove materialization workaround in type converter (#98743) This change is in preparation of #97903, which adds extra checks for materializations: it is now enforced that they produce an S [flang] Remove materialization workaround in type converter (#98743) This change is in preparation of #97903, which adds extra checks for materializations: it is now enforced that they produce an SSA value of the correct type, so the current workaround no longer works. The original workaround avoided target materializations by directly returning the to-be-converted SSA value from the materialization callback. This can be avoided by initializing the lowering patterns that insert the materializations without a type converter. For `cg::XEmboxOp`, the existing workaround that skips `unrealized_conversion_cast` ops is still in place. Also remove the lowering pattern for `unrealized_conversion_cast`. This pattern has no effect because `unrealized_conversion_cast` ops that are inserted by the dialect conversion framework are never matched by the pattern driver. show more ...
Revision tags: llvmorg-18.1.8
# 29d857f1	14-Jun-2024	Valentin Clement (バレンタインクレメン) <clementval@gmail.com>	[flang] Add stack reclaim pass to reclaim allocas in loop (#95309) Some passes in the flang pipeline are creating `fir.alloca` operation like `hlfir.concat`. When these allocas are located in a loo [flang] Add stack reclaim pass to reclaim allocas in loop (#95309) Some passes in the flang pipeline are creating `fir.alloca` operation like `hlfir.concat`. When these allocas are located in a loop, the stack can quickly be used too much leading to segfaults. This behavior can be seen in https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90 This patch insert a call to LLVM stacksave/stackrestore in the body of the loop to reclaim the alloca in its scope. This PR is an alternative implementation to #95173 show more ...
Revision tags: llvmorg-18.1.7
# f1d13bbd	27-May-2024	jeanPerier <jperier@nvidia.com>	[flang] add FIR to FIR pass to lower assumed-rank operations (#93344) Add pass to lower assumed-rank operations. The current patch adds codegen for fir.rebox_assumed_rank. It will be the pass lower [flang] add FIR to FIR pass to lower assumed-rank operations (#93344) Add pass to lower assumed-rank operations. The current patch adds codegen for fir.rebox_assumed_rank. It will be the pass lowering fir.select_rank. fir.rebox_assumed_rank is lowered to a call to CopyAndUpdateDescriptor runtime API. Note that the lowering ends-up allocating two new descriptors at the LLVM level (one alloca created by the pass for the CopyAndUpdateDescriptor result descriptor argument, the second one is created by the fir.load of the result descriptor in codegen). LLVM is currently unable to properly optimize and merge those allocas. The "nocapture" attribute added to CopyAndUpdateDescriptor arguments gives part of the information to LLVM, but the fir.load codegen of descriptors must be updated to use llvm.memcpy instead of llvm.load+store to allow LLVM to optimize it. This will be done in later patch. show more ...
# 9807f25b	22-May-2024	Tom Eccles <tom.eccles@arm.com>	[flang][HLFIR] Adapt OptimizedBufferization to run on all top level ops (#92898) This means that this pass will also run on hlfir elemental operations which are not inside of functions. See RFC: [flang][HLFIR] Adapt OptimizedBufferization to run on all top level ops (#92898) This means that this pass will also run on hlfir elemental operations which are not inside of functions. See RFC: https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations Some of the changes are from moving the declaration and definition of the constructor into tablegen (as requested during code review of another pass). show more ...
# 6ff82363	21-May-2024	Tom Eccles <tom.eccles@arm.com>	[flang][HLFIR] Adapt InlineElementals to run on all top level ops (#92734) This means that this pass will also run on hlfir elemental operations which are not inside of functions. See RFC: ht [flang][HLFIR] Adapt InlineElementals to run on all top level ops (#92734) This means that this pass will also run on hlfir elemental operations which are not inside of functions. See RFC: https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations Some of the changes are from moving the declaration and definition of the constructor into tablegen (as requested during code review of another pass). While I was updating the tests I noticed that the optimized bufferization pass and some cse were missing from the optimized pipeline in flang/test/Driver/mlir-pass-pipeline.f90. I fixed this in this commit. show more ...
# 605ae4e9	20-May-2024	Tom Eccles <tom.eccles@arm.com>	[flang][HLFIR] Adapt SimplifyHLFIRIntrinsics to run on all top level ops (#92573) This means that this pass will also run on hlfir intrinsics which are not inside of functions. See RFC: https [flang][HLFIR] Adapt SimplifyHLFIRIntrinsics to run on all top level ops (#92573) This means that this pass will also run on hlfir intrinsics which are not inside of functions. See RFC: https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations Some of the changes are from moving the declaration and definition of the constructor into tablegen (as requested during code review of another pass). show more ...
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5
# d1b3648e	01-May-2024	Tom Eccles <tom.eccles@arm.com>	[flang] always run PolymorphicOpConversion sequentially (#90721) It was pointed out in post commit review of https://github.com/llvm/llvm-project/pull/90597 that the pass should never have been ru [flang] always run PolymorphicOpConversion sequentially (#90721) It was pointed out in post commit review of https://github.com/llvm/llvm-project/pull/90597 that the pass should never have been run in parallel over all functions (and now other top level operations) in the first place. The mutex used in the pass was ineffective at preventing races since each instance of the pass would have a different mutex. show more ...
# df513f86	30-Apr-2024	Tom Eccles <tom.eccles@arm.com>	[flang] Adapt PolymorphicOpConversion to run on all top level ops (#90597) We might use polymorphic ops in top-level operations other than functions some time in the future. We need to ensure that [flang] Adapt PolymorphicOpConversion to run on all top level ops (#90597) We might use polymorphic ops in top-level operations other than functions some time in the future. We need to ensure that these operations can be lowered. See RFC: https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations Some of the changes are from moving declaration and definition of the constructor function into tablegen (as requested in code review when altering another pass). show more ...
# 3785d742	29-Apr-2024	Kareem Ergawy <kareem.ergawy@amd.com>	[flang][OpenMP][LLVMIR] Support CFG and LLVM IR conversion for `omp.p… (#90164) …rivate` Adds support for CFG conversion and conversion to LLVM IR for `omp.private` ops. This bridges a gap betwe [flang][OpenMP][LLVMIR] Support CFG and LLVM IR conversion for `omp.p… (#90164) …rivate` Adds support for CFG conversion and conversion to LLVM IR for `omp.private` ops. This bridges a gap between FIR and LLVM to provide more support for lowering `omp.private` ops for things like allocatables. show more ...
# 7bc0177f	25-Apr-2024	Tom Eccles <tom.eccles@arm.com>	[flang] run character conversion pass on all top level ops (#89910) See RFC: https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations Some of the changes are from m [flang] run character conversion pass on all top level ops (#89910) See RFC: https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations Some of the changes are from moving declaration and definition of the constructor function into tablegen (as requested in code review when altering another pass). show more ...
# ceca5235	24-Apr-2024	Tom Eccles <tom.eccles@arm.com>	[flang] de-duplicate CFGConversion pass (#89783) See RFC at https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations I previously did the same for the AbstractResult [flang] de-duplicate CFGConversion pass (#89783) See RFC at https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations I previously did the same for the AbstractResult pass https://github.com/llvm/llvm-project/pull/88867 show more ...
# bfd19445	22-Apr-2024	Tom Eccles <tom.eccles@arm.com>	[flang] de-duplicate AbstractResult pass (#88867) This is the first proof of concept of the modification of FIR codegen to fully support a variety of top level operations (beyond just func.func) p [flang] de-duplicate AbstractResult pass (#88867) This is the first proof of concept of the modification of FIR codegen to fully support a variety of top level operations (beyond just func.func) proposed in https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations show more ...
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3
# d84252e0	20-Mar-2024	Sergio Afonso <safonsof@amd.com>	[MLIR][OpenMP] NFC: Uniformize OpenMP ops names (#85393) This patch proposes the renaming of certain OpenMP dialect operations with the goal of improving readability and following a uniform naming [MLIR][OpenMP] NFC: Uniformize OpenMP ops names (#85393) This patch proposes the renaming of certain OpenMP dialect operations with the goal of improving readability and following a uniform naming convention for MLIR operations and associated classes. In particular, the following operations are renamed: - `omp.map_info` -> `omp.map.info` - `omp.target_update_data` -> `omp.target_update` - `omp.ordered_region` -> `omp.ordered.region` - `omp.cancellationpoint` -> `omp.cancellation_point` - `omp.bounds` -> `omp.map.bounds` - `omp.reduction.declare` -> `omp.declare_reduction` Also, the following MLIR operation classes have been renamed: - `omp::TaskLoopOp` -> `omp::TaskloopOp` - `omp::TaskGroupOp` -> `omp::TaskgroupOp` - `omp::DataBoundsOp` -> `omp::MapBoundsOp` - `omp::DataOp` -> `omp::TargetDataOp` - `omp::EnterDataOp` -> `omp::TargetEnterDataOp` - `omp::ExitDataOp` -> `omp::TargetExitDataOp` - `omp::UpdateDataOp` -> `omp::TargetUpdateOp` - `omp::ReductionDeclareOp` -> `omp::DeclareReductionOp` - `omp::WsLoopOp` -> `omp::WsloopOp` show more ...
# 1f1e0948	20-Mar-2024	Tom Eccles <tom.eccles@arm.com>	[flang] run CFG conversion on omp reduction declare ops (#84953) Most FIR passes only look for FIR operations inside of functions (either because they run only on func.func or they run on the modul [flang] run CFG conversion on omp reduction declare ops (#84953) Most FIR passes only look for FIR operations inside of functions (either because they run only on func.func or they run on the module but iterate over functions internally). But there can also be FIR operations inside of fir.global, some OpenMP and OpenACC container operations. This has worked so far for fir.global and OpenMP reductions because they only contained very simple FIR code which doesn't need most passes to be lowered into LLVM IR. I am not sure how OpenACC works. In the long run, I hope to see a more systematic approach to making sure that every pass runs on all of these container operations. I will write an RFC for this soon. In the meantime, this pass duplicates the CFG conversion pass to also run on omp reduction operations. This is similar to how the AbstractResult pass is already duplicated for fir.global operations. OpenMP array reductions 2/6 Previous PR: https://github.com/llvm/llvm-project/pull/84952 Next PR: https://github.com/llvm/llvm-project/pull/84954 --------- Co-authored-by: Mats Petersson <mats.petersson@arm.com> show more ...
Revision tags: llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 374e8288	04-Dec-2023	Tom Eccles <tom.eccles@arm.com>	[flang] (Re-)Enable alias tags pass by default (#74250) Enable by default for optimization levels higher than 0 (same behavior as clang). For simplicity, only forward the flag to the frontend dr [flang] (Re-)Enable alias tags pass by default (#74250) Enable by default for optimization levels higher than 0 (same behavior as clang). For simplicity, only forward the flag to the frontend driver when it contradicts what is implied by the optimization level. This was first landed in https://github.com/llvm/llvm-project/pull/73111 but was later reverted due to a performance regression. That regression was fixed by https://github.com/llvm/llvm-project/pull/74065. show more ...
# 5ce5ea37	29-Nov-2023	Tom Eccles <tom.eccles@arm.com>	Revert "[flang] Enable alias tags pass by default (#73111)" (#73821) This reverts commit caba0314cf631a3ba3e982cbcdc455224046c7a8. Serious performance regressions were reported by @vzakhari http Revert "[flang] Enable alias tags pass by default (#73111)" (#73821) This reverts commit caba0314cf631a3ba3e982cbcdc455224046c7a8. Serious performance regressions were reported by @vzakhari https://github.com/llvm/llvm-project/issues/58303#issuecomment-1830754173 Fixing this doesn't look quick so I will revert for now. show more ...
Revision tags: llvmorg-17.0.6
# caba0314	27-Nov-2023	Tom Eccles <tom.eccles@arm.com>	[flang] Enable alias tags pass by default (#73111) Enable by default for optimization levels higher than 0 (same behavior as clang). For simplicity, only forward the flag to the frontend driver [flang] Enable alias tags pass by default (#73111) Enable by default for optimization levels higher than 0 (same behavior as clang). For simplicity, only forward the flag to the frontend driver when it contradicts what is implied by the optimization level. Since https://github.com/llvm/llvm-project/pull/72903 there are now no known performance regressions. show more ...
12