xref: /llvm-project/llvm/utils/TableGen/jupyter/sql_query_backend.md (revision 232a27b1bf17afeb89297059f52b4feb1545e5c6)
1# Writing a TableGen Backend in Python
2
3This tutorial is going to walk through creating a TableGen backend using Python.
4
5We are using Python to better fit into a notebook, but backends in LLVM are written in C++. The principles you learn here will still apply and you could port this tutorial to any language that has a JSON parser.
6
7This is the process in LLVM, using a C++ backend:
8```
9TableGen source -> llvm-tblgen -> backend (within llvm-tblgen) -> results
10```
11This is what we will be doing:
12```
13TableGen source -> llvm-tblgen -> JSON -> Python -> results
14```
15
16The backend here is ported from one of several in "SQLGen" which was written by Min-Yih Hsu.
17* SQLGen C++ sources - https://github.com/mshockwave/SQLGen
18* LLVM dev presentation - https://www.youtube.com/watch?v=UP-LBRbvI_U
19
20I encourage you to use those resources to supplement this notebook.
21
22## Compiling TableGen
23
24Unlike the other tutorial notebooks we are not using the TableGen kernel. This is an iPython notebook and we're going to run `llvm-tblgen` as a subprocess.
25
26First let's find it, in the same way the TableGen kernel does.
27
28
29```python
30import os
31import shutil
32
33def find_tblgen():
34    path = os.environ.get("LLVM_TBLGEN_EXECUTABLE")
35    if path is not None and os.path.isfile(path) and os.access(path, os.X_OK):
36        return path
37    else:
38        path = shutil.which("llvm-tblgen")
39        if path is None:
40            raise OSError("llvm-tblgen not found")
41        return path
42
43_ = find_tblgen()
44```
45
46If the above cell raises an exception, either put `llvm-tblgen` on your `PATH` or point to it using the `LLVM_TBLGEN_EXECUTABLE` environment variable. Alternatively, edit the code to use whatever path you want.
47
48Then we need to compile some TableGen by passing it to `llvm-tblgen`'s stdin. We will be using the option `--dump-json` and returning the JSON as a Python dictionary if the compilation succeeds. If it fails, we raise an exception.
49
50
51```python
52import subprocess
53import tempfile
54import json
55
56def run_tblgen(src):
57    # Passing to stdin requires a file like object.
58    with tempfile.TemporaryFile("w+") as f:
59        f.write(src)
60        f.seek(0)
61        got = subprocess.run(
62            [find_tblgen(), "--dump-json"],
63            stdin=f,
64            stderr=subprocess.PIPE,
65            stdout=subprocess.PIPE,
66            universal_newlines=True,
67        )
68
69    if got.stderr:
70        raise RuntimeError("llvm-tblgen failed with stderr: " + got.stderr)
71
72    return json.loads(got.stdout)
73
74print(json.dumps(run_tblgen("class Foo {}"), indent=4))
75```
76
77    {
78        "!instanceof": {
79            "Foo": []
80        },
81        "!tablegen_json_version": 1
82    }
83
84
85## Structure of a SQL Query
86
87This backend is going to generate SQL queries. The general form of a SQL query is:
88```
89SELECT <some field names> FROM <table name>
90 WHERE <conditions>
91 ORDER BY <field tags>;
92```
93
94## SQL Query TableGen
95
96
97```python
98query_tblgen = """\
99def all;
100def fields;
101def none;
102
103def eq;
104def ne;
105def gt;
106def ge;
107def and;
108def or;
109"""
110```
111
112Normally you'd write this to a `.td` file but here we have it in a Python string to fit into this notebook. We will add to this string to produce the final source.
113
114This section defines some constants. First are the fields we want to get back from the query:
115* `all` - Return all fields.
116* `fields` - Means that we will provide a list of fields we are interested in.
117
118The second set are the logical operators for what will become the `WHERE` clause (called `condition` in the TableGen). These are string versions of various symbols. For example `ne` means `!=`, which in SQL is `<>`.
119
120Finally `none` is used to mean there is no condition to the query (no `WHERE`).
121
122
123```python
124query_tblgen += """\
125class Query <string table, dag query_fields = (all), dag condition = (none)> {
126  string TableName = table;
127  dag Fields = query_fields;
128  dag WhereClause = condition;
129  list<string> OrderedBy = [];
130}
131"""
132```
133
134Then the Query class. Its arguments are:
135* `table` - The name of the table to query (`FROM <table>`).
136* `query_fields` - The fields you want returned (`SELECT <fields>`).
137    * Defaults to `all` meaning return all fields.
138* `condition` - Logic to select entries (`WHERE <conditions>`).
139    * Defaults to `none` meaning there is no condition, or in other words select all entries in the table.
140
141## Using The Query Class
142
143
144```python
145full_tblgen = query_tblgen + """\
146def : Query<"Customer">;
147
148def : Query<"Orders", (fields "Person", "Amount")>;
149
150def : Query<"Customer", (fields "Affiliation"),
151            (eq "Name", "Mary Blackburn":$str)>;
152
153def : Query<"Orders", (fields "ProductName"),
154            (gt "Amount", 8)>;
155
156def : Query<"Orders", (fields "ProductName":$name, "Person"),
157            (and (gt "Amount", 8), (ne "Person", 1))> {
158  let OrderedBy = ["$name"];
159}
160"""
161```
162
163Now we can define some queries. Let's go go over the last one in detail.
164
165```
166def : Query<"Orders", (fields "ProductName":$name, "Person"),
167            (and (gt "Amount", 8), (ne "Person", 1))> {
168  let OrderedBy = ["$name"];
169}
170```
171
172* It will run on a table called `Orders`.
173* We want to see the fields `ProductName` and `Person`.
174* We have tagged `ProductName` with `$name`.
175* The condition is that `Amount` must be greater than `8` and
176  `Person` must not be equal to `1`.
177* The results of this query should be ordered by the field
178  tagged `$name`, which is `ProductName`.
179
180The condition being of DAG type (Directed Acyclic Graph) allows us to describe nested conditions. You might write this condition in Python as:
181```
182if (Amount > 8) and (Person != 1):
183```
184Putting that into a graph form:
185```
186        |------|and|------|
187        |                 |
188| Amount > 8 |       | Person != 1 |
189```
190Which is what we're describing with the DAG in TableGen.
191
192## The JSON format
193
194
195```python
196full_json = run_tblgen(full_tblgen)
197print(json.dumps(full_json, indent=4))
198```
199
200    {
201        "!instanceof": {
202            "Query": [
203                "anonymous_0",
204                "anonymous_1",
205                "anonymous_2",
206                "anonymous_3",
207                "anonymous_4"
208            ]
209        },
210        "!tablegen_json_version": 1,
211        "all": {
212            "!anonymous": false,
213            "!fields": [],
214            "!name": "all",
215            "!superclasses": []
216        },
217        "and": {
218            "!anonymous": false,
219            "!fields": [],
220            "!name": "and",
221            "!superclasses": []
222        },
223        "anonymous_0": {
224            "!anonymous": true,
225            "!fields": [],
226            "!name": "anonymous_0",
227            "!superclasses": [
228                "Query"
229            ],
230            "Fields": {
231                "args": [],
232                "kind": "dag",
233                "operator": {
234                    "def": "all",
235                    "kind": "def",
236                    "printable": "all"
237                },
238                "printable": "(all)"
239            },
240            "OrderedBy": [],
241            "TableName": "Customer",
242            "WhereClause": {
243                "args": [],
244                "kind": "dag",
245                "operator": {
246                    "def": "none",
247                    "kind": "def",
248                    "printable": "none"
249                },
250                "printable": "(none)"
251            }
252        },
253        "anonymous_1": {
254            "!anonymous": true,
255            "!fields": [],
256            "!name": "anonymous_1",
257            "!superclasses": [
258                "Query"
259            ],
260            "Fields": {
261                "args": [
262                    [
263                        "Person",
264                        null
265                    ],
266                    [
267                        "Amount",
268                        null
269                    ]
270                ],
271                "kind": "dag",
272                "operator": {
273                    "def": "fields",
274                    "kind": "def",
275                    "printable": "fields"
276                },
277                "printable": "(fields \"Person\", \"Amount\")"
278            },
279            "OrderedBy": [],
280            "TableName": "Orders",
281            "WhereClause": {
282                "args": [],
283                "kind": "dag",
284                "operator": {
285                    "def": "none",
286                    "kind": "def",
287                    "printable": "none"
288                },
289                "printable": "(none)"
290            }
291        },
292        "anonymous_2": {
293            "!anonymous": true,
294            "!fields": [],
295            "!name": "anonymous_2",
296            "!superclasses": [
297                "Query"
298            ],
299            "Fields": {
300                "args": [
301                    [
302                        "Affiliation",
303                        null
304                    ]
305                ],
306                "kind": "dag",
307                "operator": {
308                    "def": "fields",
309                    "kind": "def",
310                    "printable": "fields"
311                },
312                "printable": "(fields \"Affiliation\")"
313            },
314            "OrderedBy": [],
315            "TableName": "Customer",
316            "WhereClause": {
317                "args": [
318                    [
319                        "Name",
320                        null
321                    ],
322                    [
323                        "Mary Blackburn",
324                        "str"
325                    ]
326                ],
327                "kind": "dag",
328                "operator": {
329                    "def": "eq",
330                    "kind": "def",
331                    "printable": "eq"
332                },
333                "printable": "(eq \"Name\", \"Mary Blackburn\":$str)"
334            }
335        },
336        "anonymous_3": {
337            "!anonymous": true,
338            "!fields": [],
339            "!name": "anonymous_3",
340            "!superclasses": [
341                "Query"
342            ],
343            "Fields": {
344                "args": [
345                    [
346                        "ProductName",
347                        null
348                    ]
349                ],
350                "kind": "dag",
351                "operator": {
352                    "def": "fields",
353                    "kind": "def",
354                    "printable": "fields"
355                },
356                "printable": "(fields \"ProductName\")"
357            },
358            "OrderedBy": [],
359            "TableName": "Orders",
360            "WhereClause": {
361                "args": [
362                    [
363                        "Amount",
364                        null
365                    ],
366                    [
367                        8,
368                        null
369                    ]
370                ],
371                "kind": "dag",
372                "operator": {
373                    "def": "gt",
374                    "kind": "def",
375                    "printable": "gt"
376                },
377                "printable": "(gt \"Amount\", 8)"
378            }
379        },
380        "anonymous_4": {
381            "!anonymous": true,
382            "!fields": [],
383            "!name": "anonymous_4",
384            "!superclasses": [
385                "Query"
386            ],
387            "Fields": {
388                "args": [
389                    [
390                        "ProductName",
391                        "name"
392                    ],
393                    [
394                        "Person",
395                        null
396                    ]
397                ],
398                "kind": "dag",
399                "operator": {
400                    "def": "fields",
401                    "kind": "def",
402                    "printable": "fields"
403                },
404                "printable": "(fields \"ProductName\":$name, \"Person\")"
405            },
406            "OrderedBy": [
407                "$name"
408            ],
409            "TableName": "Orders",
410            "WhereClause": {
411                "args": [
412                    [
413                        {
414                            "args": [
415                                [
416                                    "Amount",
417                                    null
418                                ],
419                                [
420                                    8,
421                                    null
422                                ]
423                            ],
424                            "kind": "dag",
425                            "operator": {
426                                "def": "gt",
427                                "kind": "def",
428                                "printable": "gt"
429                            },
430                            "printable": "(gt \"Amount\", 8)"
431                        },
432                        null
433                    ],
434                    [
435                        {
436                            "args": [
437                                [
438                                    "Person",
439                                    null
440                                ],
441                                [
442                                    1,
443                                    null
444                                ]
445                            ],
446                            "kind": "dag",
447                            "operator": {
448                                "def": "ne",
449                                "kind": "def",
450                                "printable": "ne"
451                            },
452                            "printable": "(ne \"Person\", 1)"
453                        },
454                        null
455                    ]
456                ],
457                "kind": "dag",
458                "operator": {
459                    "def": "and",
460                    "kind": "def",
461                    "printable": "and"
462                },
463                "printable": "(and (gt \"Amount\", 8), (ne \"Person\", 1))"
464            }
465        },
466        "eq": {
467            "!anonymous": false,
468            "!fields": [],
469            "!name": "eq",
470            "!superclasses": []
471        },
472        "fields": {
473            "!anonymous": false,
474            "!fields": [],
475            "!name": "fields",
476            "!superclasses": []
477        },
478        "ge": {
479            "!anonymous": false,
480            "!fields": [],
481            "!name": "ge",
482            "!superclasses": []
483        },
484        "gt": {
485            "!anonymous": false,
486            "!fields": [],
487            "!name": "gt",
488            "!superclasses": []
489        },
490        "ne": {
491            "!anonymous": false,
492            "!fields": [],
493            "!name": "ne",
494            "!superclasses": []
495        },
496        "none": {
497            "!anonymous": false,
498            "!fields": [],
499            "!name": "none",
500            "!superclasses": []
501        },
502        "or": {
503            "!anonymous": false,
504            "!fields": [],
505            "!name": "or",
506            "!superclasses": []
507        }
508    }
509
510
511The backend is going to walk the JSON we decoded. You can see the full output above in case you want to browse but for now don't read the whole thing. We will highlight the key aspects of it as we go along.
512
513
514```python
515print(full_json["!instanceof"])
516```
517
518    {'Query': ['anonymous_0', 'anonymous_1', 'anonymous_2', 'anonymous_3', 'anonymous_4']}
519
520
521Any key beginning with `!` is some sort of metadata about the other keys. Here this is a list of all instances of certain classes. We just have `Query` which lists all the queries we defined.
522
523
524```python
525print(full_json["anonymous_0"]["!superclasses"])
526```
527
528    ['Query']
529
530
531On each def there is also a `!superclasses` that gives you the same information. Meaning you could use `!instanceof` to get a list of keys to lookup, or you could walk all keys and check `!superclasses`.
532
533
534```python
535print(full_json["anonymous_0"]["Fields"])
536```
537
538    {'args': [], 'kind': 'dag', 'operator': {'def': 'all', 'kind': 'def', 'printable': 'all'}, 'printable': '(all)'}
539
540
541From a def object you can find its attributes. Here we have the fields we want the query to show, which is all of them.
542
543# The Backend
544
545The core of a backend is looping over all defs of a certain class and outputting some text based on their properties.
546
547Here we're going to loop over all defs of type `Query` and emit SQL queries for them.
548
549
550```python
551def find_all_queries(j):
552    queries = []
553    for key in j:
554        # ! means it is some metadata, not a def.
555        if not key.startswith("!"):
556            value = full_json[key]
557            # If we inherit from Query.
558            if "Query" in value["!superclasses"]:
559                queries.append(value)
560    return queries
561
562queries = find_all_queries(full_json)
563
564print([q["!name"] for q in queries])
565```
566
567    ['anonymous_0', 'anonymous_1', 'anonymous_2', 'anonymous_3', 'anonymous_4']
568
569
570Why are the names `anonymous_...`? When we defined them we did `def :` and missed out the name. This is allowed and `llvm-tblgen` just came up with a name for us. For this purpose the names are irrelevant.
571
572Now we have the relevant classes we need to "emit" them. Meaning produce something from them, in this case a SQL query.
573
574
575```python
576def emit_operator(operator):
577    return {
578            'gt': ' > ',
579            'ge': ' >= ',
580            'lt': ' < ',
581            'le': ' <= ',
582            'ne': ' <> ',
583            'eq': ' = ',
584            'or': ' OR ',
585            'and': ' AND '
586            }[operator]
587
588print(emit_operator('and'))
589```
590
591     AND
592
593
594The maps our TableGen constants to the equivalent SQL logical operation.
595
596
597```python
598def emit_fields(args):
599    # Return a comma separated list of arg names.
600    return ", ".join([arg[0] for arg in args])
601
602print(emit_fields([["Abc", None], ["Def", None]]))
603```
604
605    Abc, Def
606
607
608This emits the the fields we are selecting. Each field has a name (`arg[0]`) and an optional tag that we will use later.
609
610
611```python
612from collections.abc import Mapping
613
614def emit_where_clause(where_clause):
615    output = ""
616    num_args = len(where_clause["args"])
617
618    for idx, arg in enumerate(where_clause["args"]):
619        arg_name, arg_type = arg
620
621        if isinstance(arg_name, Mapping):
622            # This is a nested where clause.
623            output += emit_where_clause(arg_name)
624        else:
625            # This is some condition.
626            if arg_type == "str":
627                # String types must be emitted with "" around them.
628                output += '"' + arg_name + '"'
629            else:
630                output += str(arg_name)
631
632        # If this is not the last arg, emit the condition.
633        if idx != (num_args-1):
634            output += emit_operator(where_clause["operator"]["def"])
635
636    return output
637
638print(emit_where_clause({
639"args": [["Name",None],
640        ["Mary Blackburn", "str"]],
641"kind": "dag",
642"operator": {
643    "def": "eq",
644    "kind": "def",
645    "printable": "eq"
646}}))
647```
648
649    Name = "Mary Blackburn"
650
651
652This emits the condition that goes with the `WHERE`. The condition is a DAG, which means that we will find a possible mix of conditions and other DAGS. We recurse to handle the latter case.
653
654For each part of the condition we print the name of the thing you're checking, then the condition (`=`, `<>`, etc.). The value to check against is last and that goes on the end.
655
656
657```python
658def emit_ordered_by(ordered_by, field_tag_map):
659    # No ORDER BY statement to emit.
660    if not ordered_by:
661        return ""
662
663    output = "\n ORDER BY "
664    num_ordered_by = len(ordered_by)
665
666    for idx, field_name in enumerate(ordered_by):
667        # If it is a tag
668        if field_name.startswith('$'):
669            # Find the corresponding field name
670            tag_name = field_name[1:]
671            field_name = field_tag_map.get(tag_name)
672            if field_name is None:
673                raise RuntimeError('Unrecognized tag "{}"'.format(
674                    tag_name))
675
676        # Separate each tag after the first with ", ".
677        if idx != 0:
678            output += ", "
679        output += field_name
680
681    return output
682
683print(emit_ordered_by(["$abc", "$def"], {'abc':"ABC", 'def':"DEF"}))
684```
685
686
687     ORDER BY ABC, DEF
688
689
690`emit_ordered_by` produces the `ORDER BY` text. If there is no ordering return nothing, otherwise loop over all the fields we want to order by and emit their names.
691
692If the name is a tag, we look that up in a map to get the real field name. Here's how we build that map:
693
694
695```python
696def build_tag_map(arguments):
697    # Args are [Name, Tag]. Reverse this so we have [Tag, Name].
698    # Add each one to a dictionary where Tag is the key and Name is the value.
699    return dict([reversed(a) for a in arguments])
700
701print(build_tag_map([["ABC", "abc"], ["DEF", "def"]]))
702```
703
704    {'abc': 'ABC', 'def': 'DEF'}
705
706
707
708```python
709def emit_query(q):
710    fields_init = q["Fields"]
711    field_op_name = fields_init["operator"]["def"]
712    if not field_op_name in ["all", "fields"]:
713        raise RuntimeError("Invalid dag operator " + field_op_name)
714
715    field_tag_map = build_tag_map(fields_init["args"])
716
717    where_clause = q["WhereClause"]
718    has_where = where_clause["operator"]["def"] != "none"
719
720    ret = "SELECT "
721    if field_op_name == "all":
722        ret += "*"
723    ret += emit_fields(fields_init["args"])
724    ret += " FROM " + q["TableName"]
725    if has_where:
726        ret += "\n WHERE " + emit_where_clause(where_clause)
727    ret += emit_ordered_by(q["OrderedBy"], field_tag_map)
728    ret += ";"
729
730    return ret
731```
732
733Finally the main function. It emits the skeleton of the query and calls the helpers we defined earlier to fill in the gaps.
734
735## The Result
736
737
738```python
739for q in queries:
740    print(emit_query(q) + "\n")
741```
742
743    SELECT * FROM Customer;
744
745    SELECT Person, Amount FROM Orders;
746
747    SELECT Affiliation FROM Customer
748     WHERE Name = "Mary Blackburn";
749
750    SELECT ProductName FROM Orders
751     WHERE Amount > 8;
752
753    SELECT ProductName, Person FROM Orders
754     WHERE Amount > 8 AND Person <> 1
755     ORDER BY ProductName;
756
757
758
759Now we run `emit_query` and print out the results. There you have it, that's a TableGen backend!
760
761You've seen the core concepts. Loop over all the defs of a certain class and then emit some other structure based on the fields of each one. In this case it was SQL queries. In LLVM it's most often C++ code but it can be anything you want.
762
763If you want to see the same thing done with a C++ backend (one written in C++ that is, not producing it), check out the links at the start of this notebook.
764