xref: /llvm-project/polly/docs/HowToManuallyUseTheIndividualPiecesOfPolly.rst (revision 27f7546f336263b0b9b569e0c8ec99a164c89d8f)
1==================================================
2How to manually use the Individual pieces of Polly
3==================================================
4
5Execute the individual Polly passes manually
6============================================
7
8.. sectionauthor:: Singapuram Sanjay Srivallabh
9
10This example presents the individual passes that are involved when optimizing
11code with Polly. We show how to execute them individually and explain for
12each which analysis is performed or what transformation is applied. In this
13example the polyhedral transformation is user-provided to show how much
14performance improvement can be expected by an optimal automatic optimizer.
15
161. **Create LLVM-IR from the C code**
17-------------------------------------
18        Polly works on LLVM-IR. Hence it is necessary to translate the source
19        files into LLVM-IR. If more than one file should be optimized the
20        files can be combined into a single file with llvm-link.
21
22        .. code-block:: console
23
24                clang -S -emit-llvm matmul.c -Xclang -disable-O0-optnone -o matmul.ll
25
26
272. **Prepare the LLVM-IR for Polly**
28------------------------------------
29
30        Polly is only able to work with code that matches a canonical form.
31        To translate the LLVM-IR into this form we use a set of
32        canonicalization passes. They are scheduled by using
33        '-polly-canonicalize'.
34
35        .. code-block:: console
36
37                opt -S -polly-canonicalize matmul.ll -o matmul.preopt.ll
38
393. **Show the SCoPs detected by Polly (optional)**
40--------------------------------------------------
41
42        To understand if Polly was able to detect SCoPs, we print the structure
43        of the detected SCoPs. In our example two SCoPs are detected. One in
44        'init_array' the other in 'main'.
45
46        .. code-block:: console
47
48                $ opt -basic-aa -polly-ast -analyze matmul.preopt.ll -polly-process-unprofitable -polly-use-llvm-names
49
50        .. code-block:: text
51
52                :: isl ast :: init_array :: %for.cond1.preheader---%for.end19
53
54                if (1)
55
56                    for (int c0 = 0; c0 <= 1535; c0 += 1)
57                      for (int c1 = 0; c1 <= 1535; c1 += 1)
58                        Stmt_for_body3(c0, c1);
59
60                else
61                    {  /* original code */ }
62
63                :: isl ast :: main :: %for.cond1.preheader---%for.end30
64
65                if (1)
66
67                    for (int c0 = 0; c0 <= 1535; c0 += 1)
68                      for (int c1 = 0; c1 <= 1535; c1 += 1) {
69                        Stmt_for_body3(c0, c1);
70                        for (int c2 = 0; c2 <= 1535; c2 += 1)
71                          Stmt_for_body8(c0, c1, c2);
72                      }
73
74                else
75                    {  /* original code */ }
76
774. **Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)**
78----------------------------------------------------------------------------------------
79
80        Polly can use graphviz to graphically show a CFG in which the detected
81        SCoPs are highlighted. It can also create '.dot' files that can be
82        translated by the 'dot' utility into various graphic formats.
83
84
85        .. code-block:: console
86
87                $ opt -polly-use-llvm-names -basic-aa -view-scops -disable-output matmul.preopt.ll
88                $ opt -polly-use-llvm-names -basic-aa -view-scops-only -disable-output matmul.preopt.ll
89
90        The output for the different functions:
91
92        - view-scops : main_, init_array_, print_array_
93        - view-scops-only : main-scopsonly_, init_array-scopsonly_, print_array-scopsonly_
94
95.. _main:  http://polly.llvm.org/experiments/matmul/scops.main.dot.png
96.. _init_array: http://polly.llvm.org/experiments/matmul/scops.init_array.dot.png
97.. _print_array: http://polly.llvm.org/experiments/matmul/scops.print_array.dot.png
98.. _main-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.main.dot.png
99.. _init_array-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.init_array.dot.png
100.. _print_array-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.print_array.dot.png
101
1025. **View the polyhedral representation of the SCoPs**
103------------------------------------------------------
104
105        .. code-block:: console
106
107                $ opt -polly-use-llvm-names -basic-aa -polly-scops -analyze matmul.preopt.ll -polly-process-unprofitable
108
109        .. code-block:: text
110
111                [...]Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'for.cond1.preheader => for.end19' in function 'init_array':
112                    Function: init_array
113                    Region: %for.cond1.preheader---%for.end19
114                    Max Loop Depth:  2
115                        Invariant Accesses: {
116                        }
117                        Context:
118                        {  :  }
119                        Assumed Context:
120                        {  :  }
121                        Invalid Context:
122                        {  : 1 = 0 }
123                        Arrays {
124                            float MemRef_A[*][1536]; // Element size 4
125                            float MemRef_B[*][1536]; // Element size 4
126                        }
127                        Arrays (Bounds as pw_affs) {
128                            float MemRef_A[*][ { [] -> [(1536)] } ]; // Element size 4
129                            float MemRef_B[*][ { [] -> [(1536)] } ]; // Element size 4
130                        }
131                        Alias Groups (0):
132                            n/a
133                        Statements {
134    	                    Stmt_for_body3
135                                Domain :=
136                                    { Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 };
137                                Schedule :=
138                                    { Stmt_for_body3[i0, i1] -> [i0, i1] };
139                                MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
140                                    { Stmt_for_body3[i0, i1] -> MemRef_A[i0, i1] };
141                                MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
142                                    { Stmt_for_body3[i0, i1] -> MemRef_B[i0, i1] };
143                        }
144                [...]Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'for.cond1.preheader => for.end30' in function 'main':
145                    Function: main
146                    Region: %for.cond1.preheader---%for.end30
147                    Max Loop Depth:  3
148                    Invariant Accesses: {
149                    }
150                    Context:
151                    {  :  }
152                    Assumed Context:
153                    {  :  }
154                    Invalid Context:
155                    {  : 1 = 0 }
156                    Arrays {
157                        float MemRef_C[*][1536]; // Element size 4
158                        float MemRef_A[*][1536]; // Element size 4
159                        float MemRef_B[*][1536]; // Element size 4
160                    }
161                    Arrays (Bounds as pw_affs) {
162                        float MemRef_C[*][ { [] -> [(1536)] } ]; // Element size 4
163                        float MemRef_A[*][ { [] -> [(1536)] } ]; // Element size 4
164                        float MemRef_B[*][ { [] -> [(1536)] } ]; // Element size 4
165                    }
166                    Alias Groups (0):
167                        n/a
168                    Statements {
169                    	Stmt_for_body3
170                            Domain :=
171                                { Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 };
172                            Schedule :=
173                                { Stmt_for_body3[i0, i1] -> [i0, i1, 0, 0] };
174                            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
175                                { Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] };
176                    	Stmt_for_body8
177                            Domain :=
178                                { Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 };
179                            Schedule :=
180                                { Stmt_for_body8[i0, i1, i2] -> [i0, i1, 1, i2] };
181                            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
182                                { Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] };
183                            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
184                                { Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] };
185                            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
186                                { Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] };
187                            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
188                                { Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] };
189                    }
190
191
1926. **Show the dependences for the SCoPs**
193-----------------------------------------
194
195        .. code-block:: console
196
197	        $ opt -basic-aa -polly-use-llvm-names -polly-dependences -analyze matmul.preopt.ll -polly-process-unprofitable
198
199        .. code-block:: text
200
201        	[...]Printing analysis 'Polly - Calculate dependences' for region: 'for.cond1.preheader => for.end19' in function 'init_array':
202        		RAW dependences:
203        			{  }
204        		WAR dependences:
205        			{  }
206        		WAW dependences:
207        			{  }
208        		Reduction dependences:
209        			n/a
210        		Transitive closure of reduction dependences:
211        			{  }
212        	[...]Printing analysis 'Polly - Calculate dependences' for region: 'for.cond1.preheader => for.end30' in function 'main':
213        		RAW dependences:
214        			{ Stmt_for_body3[i0, i1] -> Stmt_for_body8[i0, i1, 0] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535; Stmt_for_body8[i0, i1, i2] -> Stmt_for_body8[i0, i1, 1 + i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1534 }
215        		WAR dependences:
216        			{  }
217        		WAW dependences:
218        			{ Stmt_for_body3[i0, i1] -> Stmt_for_body8[i0, i1, 0] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535; Stmt_for_body8[i0, i1, i2] -> Stmt_for_body8[i0, i1, 1 + i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1534 }
219        		Reduction dependences:
220        			n/a
221        		Transitive closure of reduction dependences:
222        			{  }
223
2247. **Export jscop files**
225-------------------------
226
227        .. code-block:: console
228
229        	$ opt -basic-aa -polly-use-llvm-names -polly-export-jscop matmul.preopt.ll -polly-process-unprofitable
230
231        .. code-block:: text
232
233	        [...]Writing JScop '%for.cond1.preheader---%for.end19' in function 'init_array' to './init_array___%for.cond1.preheader---%for.end19.jscop'.
234
235	        Writing JScop '%for.cond1.preheader---%for.end30' in function 'main' to './main___%for.cond1.preheader---%for.end30.jscop'.
236
237
238
2398. **Import the changed jscop files and print the updated SCoP structure (optional)**
240-------------------------------------------------------------------------------------
241
242	Polly can reimport jscop files, in which the schedules of the statements
243        are changed. These changed schedules are used to describe
244        transformations. It is possible to import different jscop files by
245        providing the postfix of the jscop file that is imported.
246
247	We apply three different transformations on the SCoP in the main
248        function. The jscop files describing these transformations are
249        hand written (and available in docs/experiments/matmul).
250
251	**No Polly**
252
253	As a baseline we do not call any Polly code generation, but only apply the normal -O3 optimizations.
254
255	.. code-block:: console
256
257		$ opt -basic-aa -polly-use-llvm-names matmul.preopt.ll -polly-import-jscop -polly-ast -analyze -polly-process-unprofitable
258
259	.. code-block:: c
260
261		[...]
262		:: isl ast :: main :: %for.cond1.preheader---%for.end30
263
264		if (1)
265
266		    for (int c0 = 0; c0 <= 1535; c0 += 1)
267		      for (int c1 = 0; c1 <= 1535; c1 += 1) {
268		        Stmt_for_body3(c0, c1);
269		        for (int c3 = 0; c3 <= 1535; c3 += 1)
270		          Stmt_for_body8(c0, c1, c3);
271		      }
272
273		else
274		    {  /* original code */ }
275		[...]
276
277	**Loop Interchange (and Fission to allow the interchange)**
278
279	We split the loops and can now apply an interchange of the loop dimensions that enumerate Stmt_for_body8.
280
281	.. Although I feel (and have created a .jscop) we can avoid splitting the loops.
282
283	.. code-block:: console
284
285		$ opt -basic-aa -polly-use-llvm-names matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged -polly-ast -analyze -polly-process-unprofitable
286
287	.. code-block:: c
288
289		[...]
290		:: isl ast :: main :: %for.cond1.preheader---%for.end30
291
292		if (1)
293
294		    {
295		      for (int c1 = 0; c1 <= 1535; c1 += 1)
296		        for (int c2 = 0; c2 <= 1535; c2 += 1)
297		          Stmt_for_body3(c1, c2);
298		      for (int c1 = 0; c1 <= 1535; c1 += 1)
299		        for (int c2 = 0; c2 <= 1535; c2 += 1)
300		          for (int c3 = 0; c3 <= 1535; c3 += 1)
301		            Stmt_for_body8(c1, c3, c2);
302		    }
303
304		else
305		    {  /* original code */ }
306		[...]
307
308	**Interchange + Tiling**
309
310	In addition to the interchange we now tile the second loop nest.
311
312	.. code-block:: console
313
314		$ opt -basic-aa -polly-use-llvm-names matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-ast -analyze -polly-process-unprofitable
315
316	.. code-block:: c
317
318		[...]
319		:: isl ast :: main :: %for.cond1.preheader---%for.end30
320
321		if (1)
322
323		    {
324		      for (int c1 = 0; c1 <= 1535; c1 += 1)
325		        for (int c2 = 0; c2 <= 1535; c2 += 1)
326		          Stmt_for_body3(c1, c2);
327		      for (int c1 = 0; c1 <= 1535; c1 += 64)
328		        for (int c2 = 0; c2 <= 1535; c2 += 64)
329		          for (int c3 = 0; c3 <= 1535; c3 += 64)
330		            for (int c4 = c1; c4 <= c1 + 63; c4 += 1)
331		              for (int c5 = c3; c5 <= c3 + 63; c5 += 1)
332		                for (int c6 = c2; c6 <= c2 + 63; c6 += 1)
333		                  Stmt_for_body8(c4, c6, c5);
334		    }
335
336		else
337		    {  /* original code */ }
338		[...]
339
340
341	**Interchange + Tiling + Strip-mining to prepare vectorization**
342
343	To later allow vectorization we create a so called trivially
344        parallelizable loop. It is innermost, parallel and has only four
345        iterations. It can be replaced by 4-element SIMD instructions.
346
347	.. code-block:: console
348
349		$ opt -basic-aa -polly-use-llvm-names matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-ast -analyze -polly-process-unprofitable
350
351	.. code-block:: c
352
353		[...]
354		:: isl ast :: main :: %for.cond1.preheader---%for.end30
355
356		if (1)
357
358		    {
359		      for (int c1 = 0; c1 <= 1535; c1 += 1)
360		        for (int c2 = 0; c2 <= 1535; c2 += 1)
361		          Stmt_for_body3(c1, c2);
362		      for (int c1 = 0; c1 <= 1535; c1 += 64)
363		        for (int c2 = 0; c2 <= 1535; c2 += 64)
364		          for (int c3 = 0; c3 <= 1535; c3 += 64)
365		            for (int c4 = c1; c4 <= c1 + 63; c4 += 1)
366		              for (int c5 = c3; c5 <= c3 + 63; c5 += 1)
367		                for (int c6 = c2; c6 <= c2 + 63; c6 += 4)
368		                  for (int c7 = c6; c7 <= c6 + 3; c7 += 1)
369		                    Stmt_for_body8(c4, c7, c5);
370		    }
371
372		else
373		    {  /* original code */ }
374		[...]
375
3769. **Codegenerate the SCoPs**
377-----------------------------
378
379	This generates new code for the SCoPs detected by polly. If
380        -polly-import-jscop is present, transformations specified in the
381        imported jscop files will be applied.
382
383
384	.. code-block:: console
385
386		$ opt -S matmul.preopt.ll | opt -S -O3 -o matmul.normalopt.ll
387
388	.. code-block:: console
389
390		$ opt -S matmul.preopt.ll -basic-aa -polly-use-llvm-names -polly-import-jscop -polly-import-jscop-postfix=interchanged -polly-codegen -polly-process-unprofitable | opt -S -O3 -o matmul.polly.interchanged.ll
391
392	.. code-block:: text
393
394		Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged'.
395		File could not be read: No such file or directory
396		Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged'.
397
398	.. code-block:: console
399
400		$ opt -S matmul.preopt.ll -basic-aa -polly-use-llvm-names -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-codegen -polly-process-unprofitable | opt -S -O3 -o matmul.polly.interchanged+tiled.ll
401
402	.. code-block:: text
403
404		Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled'.
405		File could not be read: No such file or directory
406		Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled'.
407
408	.. code-block:: console
409
410		$ opt -S matmul.preopt.ll -basic-aa -polly-use-llvm-names -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen -polly-vectorizer=polly -polly-process-unprofitable | opt -S -O3 -o matmul.polly.interchanged+tiled+vector.ll
411
412	.. code-block:: text
413
414		Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled+vector'.
415		File could not be read: No such file or directory
416		Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector'.
417
418	.. code-block:: console
419
420		$ opt -S matmul.preopt.ll -basic-aa -polly-use-llvm-names -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen -polly-vectorizer=polly -polly-parallel -polly-process-unprofitable | opt -S -O3 -o matmul.polly.interchanged+tiled+openmp.ll
421
422	.. code-block:: text
423
424		Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled+vector'.
425		File could not be read: No such file or directory
426		Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector'.
427
428
42910. **Create the executables**
430------------------------------
431
432        .. code-block:: console
433
434	        $ llc matmul.normalopt.ll -o matmul.normalopt.s -relocation-model=pic
435	        $ gcc matmul.normalopt.s -o matmul.normalopt.exe
436	        $ llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s -relocation-model=pic
437	        $ gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe
438	        $ llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s -relocation-model=pic
439	        $ gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe
440	        $ llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s -relocation-model=pic
441	        $ gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe
442        	$ llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s -relocation-model=pic
443        	$ gcc matmul.polly.interchanged+tiled+vector+openmp.s -lgomp -o matmul.polly.interchanged+tiled+vector+openmp.exe
444
44511. **Compare the runtime of the executables**
446----------------------------------------------
447
448	By comparing the runtimes of the different code snippets we see that a
449        simple loop interchange gives here the largest performance boost.
450        However in this case, adding vectorization and using OpenMP degrades
451        the performance.
452
453        .. code-block:: console
454
455	        $ time ./matmul.normalopt.exe
456
457	        real	0m11.295s
458        	user	0m11.288s
459        	sys	0m0.004s
460        	$ time ./matmul.polly.interchanged.exe
461
462        	real	0m0.988s
463	        user	0m0.980s
464	        sys	0m0.008s
465	        $ time ./matmul.polly.interchanged+tiled.exe
466
467	        real	0m0.830s
468	        user	0m0.816s
469	        sys	0m0.012s
470	        $ time ./matmul.polly.interchanged+tiled+vector.exe
471
472        	real	0m5.430s
473        	user	0m5.424s
474        	sys	0m0.004s
475        	$ time ./matmul.polly.interchanged+tiled+vector+openmp.exe
476
477        	real	0m3.184s
478        	user	0m11.972s
479        	sys	0m0.036s
480
481