Lines Matching defs:compute
53 // // Parallel compute function that executes the parallel body region for
55 // // compute block index.
74 // %block_size = ... compute parallel compute block size
75 // %block_count = ... compute the number of compute blocks
85 // // Call parallel compute function for a single block.
94 // %block_size = ... compute parallel compute block size
95 // %block_count = ... compute the number of compute blocks
138 // Helper struct to parse parallel compute function argument list.
208 // Returns a function type and implicit captures for a parallel compute
232 // call @compute(%lb0, %lb1, ..., %ub0, %ub1, ..., %step0, %step1, ...)
248 // Create a parallel compute fuction from the parallel operation.
332 // Block first and last coordinates can be the same along the outer compute
333 // dimension when inner compute dimension contains multiple blocks.
353 // loop bounds should we use for the nested loops: bounds defined by compute
440 // Creates recursive async dispatch function for the given parallel compute
455 // // Call parallel compute function for a single block.
471 // Compared to the parallel compute function async dispatch function takes
537 // Call parallel compute function inside the async.execute region.
559 // call the parallel compute function for the first block of the range.
575 // Launch async dispatch of the parallel compute function.
583 // Add one more level of indirection to dispatch parallel compute functions
591 // Appends operands shared by async dispatch and parallel compute functions to
610 // Call parallel compute function for the single block.
624 // execution of multiple parallel compute function. First block will be
636 // Wait for the completion of all parallel compute operations.
642 // Dispatch either single block compute function, or launch async dispatch.
646 // Dispatch parallel compute functions by submitting all async compute tasks
655 func::FuncOp compute = parallelComputeFunction.func;
661 // execution of multiple parallel compute function. First block will be
666 // Call parallel compute function for all blocks.
670 // Returns parallel compute function operands to process the given block.
688 // Call parallel compute function inside the async.execute region.
691 executeBuilder.create<func::CallOp>(executeLoc, compute.getSymName(),
692 compute.getResultTypes(),
704 // Iterate over all compute blocks and launch parallel compute operations.
707 // Call parallel compute function for the first block in the caller thread.
708 b.create<func::CallOp>(compute.getSymName(), compute.getResultTypes(),
711 // Wait for the completion of all async compute operations.
730 // reduce the number of parallel compute function arguments.
767 // compute function. LLVM can't always push constants across the non-trivial
779 // product is smaller than the `512`. We align the parallel compute block
812 // With large number of threads the value of creating many compute blocks
862 // Dispatch parallel compute function using async recursive work splitting,
863 // or by submitting compute task sequentially from a caller thread.
866 // Create a parallel compute function that takes a block id and computes
869 // Compute the number of parallel compute blocks.
872 // Dispatch parallel compute function without hints to unroll inner loops.
874 ParallelComputeFunction compute =
878 doDispatch(b, rewriter, compute, op, blockSize, blockCount, tripCounts);
882 // Dispatch parallel compute function with hints for unrolling inner loops.
884 ParallelComputeFunction compute = createParallelComputeFunction(
894 doDispatch(b, rewriter, compute, op, alignedBlockSize, blockCount,
899 // Dispatch to block aligned compute function only if the computed block
916 // Replace the `scf.parallel` operation with the parallel compute function.