1# Writing a TableGen Backend in Python 2 3This tutorial is going to walk through creating a TableGen backend using Python. 4 5We are using Python to better fit into a notebook, but backends in LLVM are written in C++. The principles you learn here will still apply and you could port this tutorial to any language that has a JSON parser. 6 7This is the process in LLVM, using a C++ backend: 8``` 9TableGen source -> llvm-tblgen -> backend (within llvm-tblgen) -> results 10``` 11This is what we will be doing: 12``` 13TableGen source -> llvm-tblgen -> JSON -> Python -> results 14``` 15 16The backend here is ported from one of several in "SQLGen" which was written by Min-Yih Hsu. 17* SQLGen C++ sources - https://github.com/mshockwave/SQLGen 18* LLVM dev presentation - https://www.youtube.com/watch?v=UP-LBRbvI_U 19 20I encourage you to use those resources to supplement this notebook. 21 22## Compiling TableGen 23 24Unlike the other tutorial notebooks we are not using the TableGen kernel. This is an iPython notebook and we're going to run `llvm-tblgen` as a subprocess. 25 26First let's find it, in the same way the TableGen kernel does. 27 28 29```python 30import os 31import shutil 32 33def find_tblgen(): 34 path = os.environ.get("LLVM_TBLGEN_EXECUTABLE") 35 if path is not None and os.path.isfile(path) and os.access(path, os.X_OK): 36 return path 37 else: 38 path = shutil.which("llvm-tblgen") 39 if path is None: 40 raise OSError("llvm-tblgen not found") 41 return path 42 43_ = find_tblgen() 44``` 45 46If the above cell raises an exception, either put `llvm-tblgen` on your `PATH` or point to it using the `LLVM_TBLGEN_EXECUTABLE` environment variable. Alternatively, edit the code to use whatever path you want. 47 48Then we need to compile some TableGen by passing it to `llvm-tblgen`'s stdin. We will be using the option `--dump-json` and returning the JSON as a Python dictionary if the compilation succeeds. If it fails, we raise an exception. 49 50 51```python 52import subprocess 53import tempfile 54import json 55 56def run_tblgen(src): 57 # Passing to stdin requires a file like object. 58 with tempfile.TemporaryFile("w+") as f: 59 f.write(src) 60 f.seek(0) 61 got = subprocess.run( 62 [find_tblgen(), "--dump-json"], 63 stdin=f, 64 stderr=subprocess.PIPE, 65 stdout=subprocess.PIPE, 66 universal_newlines=True, 67 ) 68 69 if got.stderr: 70 raise RuntimeError("llvm-tblgen failed with stderr: " + got.stderr) 71 72 return json.loads(got.stdout) 73 74print(json.dumps(run_tblgen("class Foo {}"), indent=4)) 75``` 76 77 { 78 "!instanceof": { 79 "Foo": [] 80 }, 81 "!tablegen_json_version": 1 82 } 83 84 85## Structure of a SQL Query 86 87This backend is going to generate SQL queries. The general form of a SQL query is: 88``` 89SELECT <some field names> FROM <table name> 90 WHERE <conditions> 91 ORDER BY <field tags>; 92``` 93 94## SQL Query TableGen 95 96 97```python 98query_tblgen = """\ 99def all; 100def fields; 101def none; 102 103def eq; 104def ne; 105def gt; 106def ge; 107def and; 108def or; 109""" 110``` 111 112Normally you'd write this to a `.td` file but here we have it in a Python string to fit into this notebook. We will add to this string to produce the final source. 113 114This section defines some constants. First are the fields we want to get back from the query: 115* `all` - Return all fields. 116* `fields` - Means that we will provide a list of fields we are interested in. 117 118The second set are the logical operators for what will become the `WHERE` clause (called `condition` in the TableGen). These are string versions of various symbols. For example `ne` means `!=`, which in SQL is `<>`. 119 120Finally `none` is used to mean there is no condition to the query (no `WHERE`). 121 122 123```python 124query_tblgen += """\ 125class Query <string table, dag query_fields = (all), dag condition = (none)> { 126 string TableName = table; 127 dag Fields = query_fields; 128 dag WhereClause = condition; 129 list<string> OrderedBy = []; 130} 131""" 132``` 133 134Then the Query class. Its arguments are: 135* `table` - The name of the table to query (`FROM <table>`). 136* `query_fields` - The fields you want returned (`SELECT <fields>`). 137 * Defaults to `all` meaning return all fields. 138* `condition` - Logic to select entries (`WHERE <conditions>`). 139 * Defaults to `none` meaning there is no condition, or in other words select all entries in the table. 140 141## Using The Query Class 142 143 144```python 145full_tblgen = query_tblgen + """\ 146def : Query<"Customer">; 147 148def : Query<"Orders", (fields "Person", "Amount")>; 149 150def : Query<"Customer", (fields "Affiliation"), 151 (eq "Name", "Mary Blackburn":$str)>; 152 153def : Query<"Orders", (fields "ProductName"), 154 (gt "Amount", 8)>; 155 156def : Query<"Orders", (fields "ProductName":$name, "Person"), 157 (and (gt "Amount", 8), (ne "Person", 1))> { 158 let OrderedBy = ["$name"]; 159} 160""" 161``` 162 163Now we can define some queries. Let's go go over the last one in detail. 164 165``` 166def : Query<"Orders", (fields "ProductName":$name, "Person"), 167 (and (gt "Amount", 8), (ne "Person", 1))> { 168 let OrderedBy = ["$name"]; 169} 170``` 171 172* It will run on a table called `Orders`. 173* We want to see the fields `ProductName` and `Person`. 174* We have tagged `ProductName` with `$name`. 175* The condition is that `Amount` must be greater than `8` and 176 `Person` must not be equal to `1`. 177* The results of this query should be ordered by the field 178 tagged `$name`, which is `ProductName`. 179 180The condition being of DAG type (Directed Acyclic Graph) allows us to describe nested conditions. You might write this condition in Python as: 181``` 182if (Amount > 8) and (Person != 1): 183``` 184Putting that into a graph form: 185``` 186 |------|and|------| 187 | | 188| Amount > 8 | | Person != 1 | 189``` 190Which is what we're describing with the DAG in TableGen. 191 192## The JSON format 193 194 195```python 196full_json = run_tblgen(full_tblgen) 197print(json.dumps(full_json, indent=4)) 198``` 199 200 { 201 "!instanceof": { 202 "Query": [ 203 "anonymous_0", 204 "anonymous_1", 205 "anonymous_2", 206 "anonymous_3", 207 "anonymous_4" 208 ] 209 }, 210 "!tablegen_json_version": 1, 211 "all": { 212 "!anonymous": false, 213 "!fields": [], 214 "!name": "all", 215 "!superclasses": [] 216 }, 217 "and": { 218 "!anonymous": false, 219 "!fields": [], 220 "!name": "and", 221 "!superclasses": [] 222 }, 223 "anonymous_0": { 224 "!anonymous": true, 225 "!fields": [], 226 "!name": "anonymous_0", 227 "!superclasses": [ 228 "Query" 229 ], 230 "Fields": { 231 "args": [], 232 "kind": "dag", 233 "operator": { 234 "def": "all", 235 "kind": "def", 236 "printable": "all" 237 }, 238 "printable": "(all)" 239 }, 240 "OrderedBy": [], 241 "TableName": "Customer", 242 "WhereClause": { 243 "args": [], 244 "kind": "dag", 245 "operator": { 246 "def": "none", 247 "kind": "def", 248 "printable": "none" 249 }, 250 "printable": "(none)" 251 } 252 }, 253 "anonymous_1": { 254 "!anonymous": true, 255 "!fields": [], 256 "!name": "anonymous_1", 257 "!superclasses": [ 258 "Query" 259 ], 260 "Fields": { 261 "args": [ 262 [ 263 "Person", 264 null 265 ], 266 [ 267 "Amount", 268 null 269 ] 270 ], 271 "kind": "dag", 272 "operator": { 273 "def": "fields", 274 "kind": "def", 275 "printable": "fields" 276 }, 277 "printable": "(fields \"Person\", \"Amount\")" 278 }, 279 "OrderedBy": [], 280 "TableName": "Orders", 281 "WhereClause": { 282 "args": [], 283 "kind": "dag", 284 "operator": { 285 "def": "none", 286 "kind": "def", 287 "printable": "none" 288 }, 289 "printable": "(none)" 290 } 291 }, 292 "anonymous_2": { 293 "!anonymous": true, 294 "!fields": [], 295 "!name": "anonymous_2", 296 "!superclasses": [ 297 "Query" 298 ], 299 "Fields": { 300 "args": [ 301 [ 302 "Affiliation", 303 null 304 ] 305 ], 306 "kind": "dag", 307 "operator": { 308 "def": "fields", 309 "kind": "def", 310 "printable": "fields" 311 }, 312 "printable": "(fields \"Affiliation\")" 313 }, 314 "OrderedBy": [], 315 "TableName": "Customer", 316 "WhereClause": { 317 "args": [ 318 [ 319 "Name", 320 null 321 ], 322 [ 323 "Mary Blackburn", 324 "str" 325 ] 326 ], 327 "kind": "dag", 328 "operator": { 329 "def": "eq", 330 "kind": "def", 331 "printable": "eq" 332 }, 333 "printable": "(eq \"Name\", \"Mary Blackburn\":$str)" 334 } 335 }, 336 "anonymous_3": { 337 "!anonymous": true, 338 "!fields": [], 339 "!name": "anonymous_3", 340 "!superclasses": [ 341 "Query" 342 ], 343 "Fields": { 344 "args": [ 345 [ 346 "ProductName", 347 null 348 ] 349 ], 350 "kind": "dag", 351 "operator": { 352 "def": "fields", 353 "kind": "def", 354 "printable": "fields" 355 }, 356 "printable": "(fields \"ProductName\")" 357 }, 358 "OrderedBy": [], 359 "TableName": "Orders", 360 "WhereClause": { 361 "args": [ 362 [ 363 "Amount", 364 null 365 ], 366 [ 367 8, 368 null 369 ] 370 ], 371 "kind": "dag", 372 "operator": { 373 "def": "gt", 374 "kind": "def", 375 "printable": "gt" 376 }, 377 "printable": "(gt \"Amount\", 8)" 378 } 379 }, 380 "anonymous_4": { 381 "!anonymous": true, 382 "!fields": [], 383 "!name": "anonymous_4", 384 "!superclasses": [ 385 "Query" 386 ], 387 "Fields": { 388 "args": [ 389 [ 390 "ProductName", 391 "name" 392 ], 393 [ 394 "Person", 395 null 396 ] 397 ], 398 "kind": "dag", 399 "operator": { 400 "def": "fields", 401 "kind": "def", 402 "printable": "fields" 403 }, 404 "printable": "(fields \"ProductName\":$name, \"Person\")" 405 }, 406 "OrderedBy": [ 407 "$name" 408 ], 409 "TableName": "Orders", 410 "WhereClause": { 411 "args": [ 412 [ 413 { 414 "args": [ 415 [ 416 "Amount", 417 null 418 ], 419 [ 420 8, 421 null 422 ] 423 ], 424 "kind": "dag", 425 "operator": { 426 "def": "gt", 427 "kind": "def", 428 "printable": "gt" 429 }, 430 "printable": "(gt \"Amount\", 8)" 431 }, 432 null 433 ], 434 [ 435 { 436 "args": [ 437 [ 438 "Person", 439 null 440 ], 441 [ 442 1, 443 null 444 ] 445 ], 446 "kind": "dag", 447 "operator": { 448 "def": "ne", 449 "kind": "def", 450 "printable": "ne" 451 }, 452 "printable": "(ne \"Person\", 1)" 453 }, 454 null 455 ] 456 ], 457 "kind": "dag", 458 "operator": { 459 "def": "and", 460 "kind": "def", 461 "printable": "and" 462 }, 463 "printable": "(and (gt \"Amount\", 8), (ne \"Person\", 1))" 464 } 465 }, 466 "eq": { 467 "!anonymous": false, 468 "!fields": [], 469 "!name": "eq", 470 "!superclasses": [] 471 }, 472 "fields": { 473 "!anonymous": false, 474 "!fields": [], 475 "!name": "fields", 476 "!superclasses": [] 477 }, 478 "ge": { 479 "!anonymous": false, 480 "!fields": [], 481 "!name": "ge", 482 "!superclasses": [] 483 }, 484 "gt": { 485 "!anonymous": false, 486 "!fields": [], 487 "!name": "gt", 488 "!superclasses": [] 489 }, 490 "ne": { 491 "!anonymous": false, 492 "!fields": [], 493 "!name": "ne", 494 "!superclasses": [] 495 }, 496 "none": { 497 "!anonymous": false, 498 "!fields": [], 499 "!name": "none", 500 "!superclasses": [] 501 }, 502 "or": { 503 "!anonymous": false, 504 "!fields": [], 505 "!name": "or", 506 "!superclasses": [] 507 } 508 } 509 510 511The backend is going to walk the JSON we decoded. You can see the full output above in case you want to browse but for now don't read the whole thing. We will highlight the key aspects of it as we go along. 512 513 514```python 515print(full_json["!instanceof"]) 516``` 517 518 {'Query': ['anonymous_0', 'anonymous_1', 'anonymous_2', 'anonymous_3', 'anonymous_4']} 519 520 521Any key beginning with `!` is some sort of metadata about the other keys. Here this is a list of all instances of certain classes. We just have `Query` which lists all the queries we defined. 522 523 524```python 525print(full_json["anonymous_0"]["!superclasses"]) 526``` 527 528 ['Query'] 529 530 531On each def there is also a `!superclasses` that gives you the same information. Meaning you could use `!instanceof` to get a list of keys to lookup, or you could walk all keys and check `!superclasses`. 532 533 534```python 535print(full_json["anonymous_0"]["Fields"]) 536``` 537 538 {'args': [], 'kind': 'dag', 'operator': {'def': 'all', 'kind': 'def', 'printable': 'all'}, 'printable': '(all)'} 539 540 541From a def object you can find its attributes. Here we have the fields we want the query to show, which is all of them. 542 543# The Backend 544 545The core of a backend is looping over all defs of a certain class and outputting some text based on their properties. 546 547Here we're going to loop over all defs of type `Query` and emit SQL queries for them. 548 549 550```python 551def find_all_queries(j): 552 queries = [] 553 for key in j: 554 # ! means it is some metadata, not a def. 555 if not key.startswith("!"): 556 value = full_json[key] 557 # If we inherit from Query. 558 if "Query" in value["!superclasses"]: 559 queries.append(value) 560 return queries 561 562queries = find_all_queries(full_json) 563 564print([q["!name"] for q in queries]) 565``` 566 567 ['anonymous_0', 'anonymous_1', 'anonymous_2', 'anonymous_3', 'anonymous_4'] 568 569 570Why are the names `anonymous_...`? When we defined them we did `def :` and missed out the name. This is allowed and `llvm-tblgen` just came up with a name for us. For this purpose the names are irrelevant. 571 572Now we have the relevant classes we need to "emit" them. Meaning produce something from them, in this case a SQL query. 573 574 575```python 576def emit_operator(operator): 577 return { 578 'gt': ' > ', 579 'ge': ' >= ', 580 'lt': ' < ', 581 'le': ' <= ', 582 'ne': ' <> ', 583 'eq': ' = ', 584 'or': ' OR ', 585 'and': ' AND ' 586 }[operator] 587 588print(emit_operator('and')) 589``` 590 591 AND 592 593 594The maps our TableGen constants to the equivalent SQL logical operation. 595 596 597```python 598def emit_fields(args): 599 # Return a comma separated list of arg names. 600 return ", ".join([arg[0] for arg in args]) 601 602print(emit_fields([["Abc", None], ["Def", None]])) 603``` 604 605 Abc, Def 606 607 608This emits the the fields we are selecting. Each field has a name (`arg[0]`) and an optional tag that we will use later. 609 610 611```python 612from collections.abc import Mapping 613 614def emit_where_clause(where_clause): 615 output = "" 616 num_args = len(where_clause["args"]) 617 618 for idx, arg in enumerate(where_clause["args"]): 619 arg_name, arg_type = arg 620 621 if isinstance(arg_name, Mapping): 622 # This is a nested where clause. 623 output += emit_where_clause(arg_name) 624 else: 625 # This is some condition. 626 if arg_type == "str": 627 # String types must be emitted with "" around them. 628 output += '"' + arg_name + '"' 629 else: 630 output += str(arg_name) 631 632 # If this is not the last arg, emit the condition. 633 if idx != (num_args-1): 634 output += emit_operator(where_clause["operator"]["def"]) 635 636 return output 637 638print(emit_where_clause({ 639"args": [["Name",None], 640 ["Mary Blackburn", "str"]], 641"kind": "dag", 642"operator": { 643 "def": "eq", 644 "kind": "def", 645 "printable": "eq" 646}})) 647``` 648 649 Name = "Mary Blackburn" 650 651 652This emits the condition that goes with the `WHERE`. The condition is a DAG, which means that we will find a possible mix of conditions and other DAGS. We recurse to handle the latter case. 653 654For each part of the condition we print the name of the thing you're checking, then the condition (`=`, `<>`, etc.). The value to check against is last and that goes on the end. 655 656 657```python 658def emit_ordered_by(ordered_by, field_tag_map): 659 # No ORDER BY statement to emit. 660 if not ordered_by: 661 return "" 662 663 output = "\n ORDER BY " 664 num_ordered_by = len(ordered_by) 665 666 for idx, field_name in enumerate(ordered_by): 667 # If it is a tag 668 if field_name.startswith('$'): 669 # Find the corresponding field name 670 tag_name = field_name[1:] 671 field_name = field_tag_map.get(tag_name) 672 if field_name is None: 673 raise RuntimeError('Unrecognized tag "{}"'.format( 674 tag_name)) 675 676 # Separate each tag after the first with ", ". 677 if idx != 0: 678 output += ", " 679 output += field_name 680 681 return output 682 683print(emit_ordered_by(["$abc", "$def"], {'abc':"ABC", 'def':"DEF"})) 684``` 685 686 687 ORDER BY ABC, DEF 688 689 690`emit_ordered_by` produces the `ORDER BY` text. If there is no ordering return nothing, otherwise loop over all the fields we want to order by and emit their names. 691 692If the name is a tag, we look that up in a map to get the real field name. Here's how we build that map: 693 694 695```python 696def build_tag_map(arguments): 697 # Args are [Name, Tag]. Reverse this so we have [Tag, Name]. 698 # Add each one to a dictionary where Tag is the key and Name is the value. 699 return dict([reversed(a) for a in arguments]) 700 701print(build_tag_map([["ABC", "abc"], ["DEF", "def"]])) 702``` 703 704 {'abc': 'ABC', 'def': 'DEF'} 705 706 707 708```python 709def emit_query(q): 710 fields_init = q["Fields"] 711 field_op_name = fields_init["operator"]["def"] 712 if not field_op_name in ["all", "fields"]: 713 raise RuntimeError("Invalid dag operator " + field_op_name) 714 715 field_tag_map = build_tag_map(fields_init["args"]) 716 717 where_clause = q["WhereClause"] 718 has_where = where_clause["operator"]["def"] != "none" 719 720 ret = "SELECT " 721 if field_op_name == "all": 722 ret += "*" 723 ret += emit_fields(fields_init["args"]) 724 ret += " FROM " + q["TableName"] 725 if has_where: 726 ret += "\n WHERE " + emit_where_clause(where_clause) 727 ret += emit_ordered_by(q["OrderedBy"], field_tag_map) 728 ret += ";" 729 730 return ret 731``` 732 733Finally the main function. It emits the skeleton of the query and calls the helpers we defined earlier to fill in the gaps. 734 735## The Result 736 737 738```python 739for q in queries: 740 print(emit_query(q) + "\n") 741``` 742 743 SELECT * FROM Customer; 744 745 SELECT Person, Amount FROM Orders; 746 747 SELECT Affiliation FROM Customer 748 WHERE Name = "Mary Blackburn"; 749 750 SELECT ProductName FROM Orders 751 WHERE Amount > 8; 752 753 SELECT ProductName, Person FROM Orders 754 WHERE Amount > 8 AND Person <> 1 755 ORDER BY ProductName; 756 757 758 759Now we run `emit_query` and print out the results. There you have it, that's a TableGen backend! 760 761You've seen the core concepts. Loop over all the defs of a certain class and then emit some other structure based on the fields of each one. In this case it was SQL queries. In LLVM it's most often C++ code but it can be anything you want. 762 763If you want to see the same thing done with a C++ backend (one written in C++ that is, not producing it), check out the links at the start of this notebook. 764