xref: /llvm-project/lldb/docs/use/python.rst (revision 586114510c5fa71d1377c7f53e68a3b12c472aa2)
1Python Scripting
2================
3
4LLDB has been structured from the beginning to be scriptable in two
5ways -- a Unix Python session can initiate/run a debug session
6non-interactively using LLDB; and within the LLDB debugger tool, Python
7scripts can be used to help with many tasks, including inspecting
8program data, iterating over containers and determining if a breakpoint
9should stop execution or continue. This document will show how to do
10some of these things by going through an example, explaining how to use
11Python scripting to find a bug in a program that searches for text in a
12large binary tree.
13
14The Test Program and Input
15--------------------------
16
17We have a simple C program (dictionary.c) that reads in a text file,
18and stores all the words from the file in a Binary Search Tree, sorted
19alphabetically. It then enters a loop prompting the user for a word,
20searching for the word in the tree (using Binary Search), and reporting
21to the user whether or not it found the word in the tree.
22
23The input text file we are using to test our program contains the text
24for William Shakespeare's famous tragedy "Romeo and Juliet".
25
26The Bug
27-------
28
29When we try running our program, we find there is a problem. While it
30successfully finds some of the words we would expect to find, such as
31"love" or "sun", it fails to find the word "Romeo", which MUST be in
32the input text file:
33
34::
35
36   $ ./dictionary Romeo-and-Juliet.txt
37   Dictionary loaded.
38   Enter search word: love
39   Yes!
40   Enter search word: sun
41   Yes!
42   Enter search word: Romeo
43   No!
44   Enter search word: ^D
45   $
46
47Using Depth First Search
48------------------------
49
50Our first job is to determine if the word "Romeo" actually got inserted
51into the tree or not. Since "Romeo and Juliet" has thousands of words,
52trying to examine our binary search tree by hand is completely
53impractical. Therefore we will write a Python script to search the tree
54for us. We will write a recursive Depth First Search function that
55traverses the entire tree searching for a word, and maintaining
56information about the path from the root of the tree to the current
57node. If it finds the word in the tree, it returns the path from the
58root to the node containing the word. This is what our DFS function in
59Python would look like, with line numbers added for easy reference in
60later explanations:
61
62::
63
64   1: def DFS (root, word, cur_path):
65   2:     root_word_ptr = root.GetChildMemberWithName ("word")
66   3:     left_child_ptr = root.GetChildMemberWithName ("left")
67   4:     right_child_ptr = root.GetChildMemberWithName ("right")
68   5:     root_word = root_word_ptr.GetSummary()
69   6:     end = len (root_word) - 1
70   7:     if root_word[0] == '"' and root_word[end] == '"':
71   8:         root_word = root_word[1:end]
72   9:     end = len (root_word) - 1
73   10:     if root_word[0] == '\'' and root_word[end] == '\'':
74   11:        root_word = root_word[1:end]
75   12:     if root_word == word:
76   13:         return cur_path
77   14:     elif word < root_word:
78   15:         if left_child_ptr.GetValue() is None:
79   16:             return ""
80   17:         else:
81   18:             cur_path = cur_path + "L"
82   19:             return DFS (left_child_ptr, word, cur_path)
83   20:     else:
84   21:         if right_child_ptr.GetValue() is None:
85   22:             return ""
86   23:         else:
87   24:             cur_path = cur_path + "R"
88   25:             return DFS (right_child_ptr, word, cur_path)
89
90
91Accessing & Manipulating Program Variables
92------------------------------------------
93
94Before we can call any Python function on any of our program's
95variables, we need to get the variable into a form that Python can
96access. To show you how to do this we will look at the parameters for
97the DFS function. The first parameter is going to be a node in our
98binary search tree, put into a Python variable. The second parameter is
99the word we are searching for (a string), and the third parameter is a
100string representing the path from the root of the tree to our current
101node.
102
103The most interesting parameter is the first one, the Python variable
104that needs to contain a node in our search tree. How can we take a
105variable out of our program and put it into a Python variable? What
106kind of Python variable will it be? The answers are to use the LLDB API
107functions, provided as part of the LLDB Python module. Running Python
108from inside LLDB, LLDB will automatically give us our current frame
109object as a Python variable, "lldb.frame". This variable has the type
110`SBFrame` (see the LLDB API for more information about `SBFrame`
111objects). One of the things we can do with a frame object, is to ask it
112to find and return its local variable. We will call the API function
113`SBFrame.FindVariable` on the lldb.frame object to give us our dictionary
114variable as a Python variable:
115
116::
117
118   root = lldb.frame.FindVariable ("dictionary")
119
120The line above, executed in the Python script interpreter in LLDB, asks the
121current frame to find the variable named "dictionary" and return it. We then
122store the returned value in the Python variable named "root". This answers the
123question of HOW to get the variable, but it still doesn't explain WHAT actually
124gets put into "root". If you examine the LLDB API, you will find that the
125`SBFrame` method "FindVariable" returns an object of type `SBValue`. `SBValue`
126objects are used, among other things, to wrap up program variables and values.
127There are many useful methods defined in the `SBValue` class to allow you to get
128information or children values out of SBValues. For complete information, see
129the header file SBValue.h. The `SBValue` methods that we use in our DFS function
130are ``GetChildMemberWithName()``, ``GetSummary()``, and ``GetValue()``.
131
132
133Explaining DFS Script in Detail
134-------------------------------
135
136Before diving into the details of this code, it would be best to give a
137high-level overview of what it does. The nodes in our binary search tree were
138defined to have type ``tree_node *``, which is defined as:
139
140::
141
142   typedef struct tree_node
143   {
144      const char *word;
145      struct tree_node *left;
146      struct tree_node *right;
147   } tree_node;
148
149Lines 2-11 of DFS are getting data out of the current tree node and getting
150ready to do the actual search; lines 12-25 are the actual depth-first search.
151Lines 2-4 of our DFS function get the word, left and right fields out of the
152current node and store them in Python variables. Since root_word_ptr is a
153pointer to our word, and we want the actual word, line 5 calls GetSummary() to
154get a string containing the value out of the pointer. Since GetSummary() adds
155quotes around its result, lines 6-11 strip surrounding quotes off the word.
156
157Line 12 checks to see if the word in the current node is the one we are
158searching for. If so, we are done, and line 13 returns the current path.
159Otherwise, line 14 checks to see if we should go left (search word comes before
160the current word). If we decide to go left, line 15 checks to see if the left
161pointer child is NULL ("None" is the Python equivalent of NULL). If the left
162pointer is NULL, then the word is not in this tree and we return an empty path
163(line 16). Otherwise, we add an "L" to the end of our current path string, to
164indicate we are going left (line 18), and then recurse on the left child (line
16519). Lines 20-25 are the same as lines 14-19, except for going right rather
166than going left.
167
168One other note: Typing something as long as our DFS function directly into the
169interpreter can be difficult, as making a single typing mistake means having to
170start all over. Therefore we recommend doing as we have done: Writing your
171longer, more complicated script functions in a separate file (in this case
172tree_utils.py) and then importing it into your LLDB Python interpreter.
173
174
175The DFS Script in Action
176------------------------
177
178At this point we are ready to use the DFS function to see if the word "Romeo"
179is in our tree or not. To actually use it in LLDB on our dictionary program,
180you would do something like this:
181
182::
183
184   $ lldb
185   (lldb) process attach -n "dictionary"
186   Architecture set to: x86_64.
187   Process 521 stopped
188   * thread #1: tid = 0x2c03, 0x00007fff86c8bea0 libSystem.B.dylib`read$NOCANCEL + 8, stop reason = signal SIGSTOP
189   frame #0: 0x00007fff86c8bea0 libSystem.B.dylib`read$NOCANCEL + 8
190   (lldb) breakpoint set -n find_word
191   Breakpoint created: 1: name = 'find_word', locations = 1, resolved = 1
192   (lldb) continue
193   Process 521 resuming
194   Process 521 stopped
195   * thread #1: tid = 0x2c03, 0x0000000100001830 dictionary`find_word + 16
196   at dictionary.c:105, stop reason = breakpoint 1.1
197   frame #0: 0x0000000100001830 dictionary`find_word + 16 at dictionary.c:105
198   102 int
199   103 find_word (tree_node *dictionary, char *word)
200   104 {
201   -> 105 if (!word || !dictionary)
202   106 return 0;
203   107
204   108 int compare_value = strcmp (word, dictionary->word);
205   (lldb) script
206   Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.
207   >>> import tree_utils
208   >>> root = lldb.frame.FindVariable ("dictionary")
209   >>> current_path = ""
210   >>> path = tree_utils.DFS (root, "Romeo", current_path)
211   >>> print path
212   LLRRL
213   >>> ^D
214   (lldb)
215
216The first bit of code above shows starting lldb, attaching to the dictionary
217program, and getting to the find_word function in LLDB. The interesting part
218(as far as this example is concerned) begins when we enter the script command
219and drop into the embedded interactive Python interpreter. We will go over this
220Python code line by line. The first line
221
222::
223
224   import tree_utils
225
226
227imports the file where we wrote our DFS function, tree_utils.py, into Python.
228Notice that to import the file we leave off the ".py" extension. We can now
229call any function in that file, giving it the prefix "tree_utils.", so that
230Python knows where to look for the function. The line
231
232::
233
234   root = lldb.frame.FindVariable ("dictionary")
235
236
237gets our program variable "dictionary" (which contains the binary search tree)
238and puts it into the Python variable "root". See Accessing & Manipulating
239Program Variables in Python above for more details about how this works. The
240next line is
241
242::
243
244   current_path = ""
245
246This line initializes the current_path from the root of the tree to our current
247node. Since we are starting at the root of the tree, our current path starts as
248an empty string. As we go right and left through the tree, the DFS function
249will append an 'R' or an 'L' to the current path, as appropriate. The line
250
251::
252
253   path = tree_utils.DFS (root, "Romeo", current_path)
254
255calls our DFS function (prefixing it with the module name so that Python can
256find it). We pass in our binary tree stored in the variable root, the word we
257are searching for, and our current path. We assign whatever path the DFS
258function returns to the Python variable path.
259
260Finally, we want to see if the word was found or not, and if so we want to see
261the path through the tree to the word. So we do
262
263::
264
265   print path
266
267From this we can see that the word "Romeo" was indeed found in the tree, and
268the path from the root of the tree to the node containing "Romeo" is
269left-left-right-right-left.
270
271Using Breakpoint Command Scripts
272--------------------------------
273
274We are halfway to figuring out what the problem is. We know the word we are
275looking for is in the binary tree, and we know exactly where it is in the
276binary tree. Now we need to figure out why our binary search algorithm is not
277finding the word. We will do this using breakpoint command scripts.
278
279The idea is as follows. The binary search algorithm has two main decision
280points: the decision to follow the right branch; and, the decision to follow
281the left branch. We will set a breakpoint at each of these decision points, and
282attach a Python breakpoint command script to each breakpoint. The breakpoint
283commands will use the global path Python variable that we got from our DFS
284function. Each time one of these decision breakpoints is hit, the script will
285compare the actual decision with the decision the front of the path variable
286says should be made (the first character of the path). If the actual decision
287and the path agree, then the front character is stripped off the path, and
288execution is resumed. In this case the user never even sees the breakpoint
289being hit. But if the decision differs from what the path says it should be,
290then the script prints out a message and does NOT resume execution, leaving the
291user sitting at the first point where a wrong decision is being made.
292
293Python Breakpoint Command Scripts Are Not What They Seem
294--------------------------------------------------------
295
296What do we mean by that? When you enter a Python breakpoint command in LLDB, it
297appears that you are entering one or more plain lines of Python. BUT LLDB then
298takes what you entered and wraps it into a Python FUNCTION (just like using the
299"def" Python command). It automatically gives the function an obscure, unique,
300hard-to-stumble-across function name, and gives it two parameters: frame and
301bp_loc. When the breakpoint gets hit, LLDB wraps up the frame object where the
302breakpoint was hit, and the breakpoint location object for the breakpoint that
303was hit, and puts them into Python variables for you. It then calls the Python
304function that was created for the breakpoint command, and passes in the frame
305and breakpoint location objects.
306
307So, being practical, what does this mean for you when you write your Python
308breakpoint commands? It means that there are two things you need to keep in
309mind: 1. If you want to access any Python variables created outside your
310script, you must declare such variables to be global. If you do not declare
311them as global, then the Python function will treat them as local variables,
312and you will get unexpected behavior. 2. All Python breakpoint command scripts
313automatically have a frame and a bp_loc variable. The variables are pre-loaded
314by LLDB with the correct context for the breakpoint. You do not have to use
315these variables, but they are there if you want them.
316
317The Decision Point Breakpoint Commands
318--------------------------------------
319
320This is what the Python breakpoint command script would look like for the
321decision to go right:
322
323::
324
325   global path
326   if path[0] == 'R':
327      path = path[1:]
328      thread = frame.GetThread()
329      process = thread.GetProcess()
330      process.Continue()
331   else:
332      print "Here is the problem; going right, should go left!"
333   Just as a reminder, LLDB is going to take this script and wrap it up in a function, like this:
334
335
336   def some_unique_and_obscure_function_name (frame, bp_loc):
337      global path
338      if path[0] == 'R':
339         path = path[1:]
340         thread = frame.GetThread()
341         process = thread.GetProcess()
342         process.Continue()
343      else:
344         print "Here is the problem; going right, should go left!"
345
346LLDB will call the function, passing in the correct frame and breakpoint
347location whenever the breakpoint gets hit. There are several things to notice
348about this function. The first one is that we are accessing and updating a
349piece of state (the path variable), and actually conditioning our behavior
350based upon this variable. Since the variable was defined outside of our script
351(and therefore outside of the corresponding function) we need to tell Python
352that we are accessing a global variable. That is what the first line of the
353script does. Next we check where the path says we should go and compare it to
354our decision (recall that we are at the breakpoint for the decision to go
355right). If the path agrees with our decision, then we strip the first character
356off of the path.
357
358Since the decision matched the path, we want to resume execution. To do this we
359make use of the frame parameter that LLDB guarantees will be there for us. We
360use LLDB API functions to get the current thread from the current frame, and
361then to get the process from the thread. Once we have the process, we tell it
362to resume execution (using the Continue() API function).
363
364If the decision to go right does not agree with the path, then we do not resume
365execution. We allow the breakpoint to remain stopped (by doing nothing), and we
366print an informational message telling the user we have found the problem, and
367what the problem is.
368
369Actually Using The Breakpoint Commands
370--------------------------------------
371
372Now we will look at what happens when we actually use these breakpoint commands
373on our program. Doing a source list -n find_word shows us the function
374containing our two decision points. Looking at the code below, we see that we
375want to set our breakpoints on lines 113 and 115:
376
377::
378
379   (lldb) source list -n find_word
380   File: /Volumes/Data/HD2/carolinetice/Desktop/LLDB-Web-Examples/dictionary.c.
381   101
382   102 int
383   103 find_word (tree_node *dictionary, char *word)
384   104 {
385   105   if (!word || !dictionary)
386   106     return 0;
387   107
388   108   int compare_value = strcmp (word, dictionary->word);
389   109
390   110   if (compare_value == 0)
391   111     return 1;
392   112   else if (compare_value < 0)
393   113     return find_word (dictionary->left, word);
394   114   else
395   115     return find_word (dictionary->right, word);
396   116 }
397   117
398
399
400So, we set our breakpoints, enter our breakpoint command scripts, and see what happens:
401
402::
403
404   (lldb) breakpoint set -l 113
405   Breakpoint created: 2: file ='dictionary.c', line = 113, locations = 1, resolved = 1
406   (lldb) breakpoint set -l 115
407   Breakpoint created: 3: file ='dictionary.c', line = 115, locations = 1, resolved = 1
408   (lldb) breakpoint command add -s python 2
409   Enter your Python command(s). Type 'DONE' to end.
410   > global path
411   > if (path[0] == 'L'):
412   >     path = path[1:]
413   >     thread = frame.GetThread()
414   >     process = thread.GetProcess()
415   >     process.Continue()
416   > else:
417   >     print "Here is the problem. Going left, should go right!"
418   > DONE
419   (lldb) breakpoint command add -s python 3
420   Enter your Python command(s). Type 'DONE' to end.
421   > global path
422   > if (path[0] == 'R'):
423   >     path = path[1:]
424   >     thread = frame.GetThread()
425   >     process = thread.GetProcess()
426   >     process.Continue()
427   > else:
428   >     print "Here is the problem. Going right, should go left!"
429   > DONE
430   (lldb) continue
431   Process 696 resuming
432   Here is the problem. Going right, should go left!
433   Process 696 stopped
434   * thread #1: tid = 0x2d03, 0x000000010000189f dictionary`find_word + 127 at dictionary.c:115, stop reason = breakpoint 3.1
435   frame #0: 0x000000010000189f dictionary`find_word + 127 at dictionary.c:115
436      112   else if (compare_value < 0)
437      113     return find_word (dictionary->left, word);
438      114   else
439   -> 115     return find_word (dictionary->right, word);
440      116 }
441      117
442      118 void
443   (lldb)
444
445
446After setting our breakpoints, adding our breakpoint commands and continuing,
447we run for a little bit and then hit one of our breakpoints, printing out the
448error message from the breakpoint command. Apparently at this point in the
449tree, our search algorithm decided to go right, but our path says the node we
450want is to the left. Examining the word at the node where we stopped, and our
451search word, we see:
452
453::
454
455   (lldb) expr dictionary->word
456   (const char *) $1 = 0x0000000100100080 "dramatis"
457   (lldb) expr word
458   (char *) $2 = 0x00007fff5fbff108 "romeo"
459
460So the word at our current node is "dramatis", and the word we are searching
461for is "romeo". "romeo" comes after "dramatis" alphabetically, so it seems like
462going right would be the correct decision. Let's ask Python what it thinks the
463path from the current node to our word is:
464
465::
466
467   (lldb) script print path
468   LLRRL
469
470According to Python we need to go left-left-right-right-left from our current
471node to find the word we are looking for. Let's double check our tree, and see
472what word it has at that node:
473
474::
475
476   (lldb) expr dictionary->left->left->right->right->left->word
477   (const char *) $4 = 0x0000000100100880 "Romeo"
478
479So the word we are searching for is "romeo" and the word at our DFS location is
480"Romeo". Aha! One is uppercase and the other is lowercase: We seem to have a
481case conversion problem somewhere in our program (we do).
482
483This is the end of our example on how you might use Python scripting in LLDB to
484help you find bugs in your program.
485
486Source Files for The Example
487----------------------------
488
489The complete code for the Dictionary program (with case-conversion bug), the
490DFS function and other Python script examples (tree_utils.py) used for this
491example are available below.
492
493tree_utils.py - Example Python functions using LLDB's API, including DFS
494
495::
496
497   """
498   # ===-- tree_utils.py ---------------------------------------*- Python -*-===//
499   #
500   #  Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
501   #  See https://llvm.org/LICENSE.txt for license information.
502   #  SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
503   #
504   # ===----------------------------------------------------------------------===//
505
506   tree_utils.py  - A set of functions for examining binary
507   search trees, based on the example search tree defined in
508   dictionary.c.  These functions contain calls to LLDB API
509   functions, and assume that the LLDB Python module has been
510   imported.
511
512   For a thorough explanation of how the DFS function works, and
513   for more information about dictionary.c go to
514   http://lldb.llvm.org/scripting.html
515   """
516
517
518   def DFS(root, word, cur_path):
519      """
520      Recursively traverse a binary search tree containing
521      words sorted alphabetically, searching for a particular
522      word in the tree.  Also maintains a string representing
523      the path from the root of the tree to the current node.
524      If the word is found in the tree, return the path string.
525      Otherwise return an empty string.
526
527      This function assumes the binary search tree is
528      the one defined in dictionary.c  It uses LLDB API
529      functions to examine and traverse the tree nodes.
530      """
531
532      # Get pointer field values out of node 'root'
533
534      root_word_ptr = root.GetChildMemberWithName("word")
535      left_child_ptr = root.GetChildMemberWithName("left")
536      right_child_ptr = root.GetChildMemberWithName("right")
537
538      # Get the word out of the word pointer and strip off
539      # surrounding quotes (added by call to GetSummary).
540
541      root_word = root_word_ptr.GetSummary()
542      end = len(root_word) - 1
543      if root_word[0] == '"' and root_word[end] == '"':
544         root_word = root_word[1:end]
545      end = len(root_word) - 1
546      if root_word[0] == '\'' and root_word[end] == '\'':
547         root_word = root_word[1:end]
548
549      # Main depth first search
550
551      if root_word == word:
552         return cur_path
553      elif word < root_word:
554
555         # Check to see if left child is NULL
556
557         if left_child_ptr.GetValue() is None:
558               return ""
559         else:
560               cur_path = cur_path + "L"
561               return DFS(left_child_ptr, word, cur_path)
562      else:
563
564         # Check to see if right child is NULL
565
566         if right_child_ptr.GetValue() is None:
567               return ""
568         else:
569               cur_path = cur_path + "R"
570               return DFS(right_child_ptr, word, cur_path)
571
572
573   def tree_size(root):
574      """
575      Recursively traverse a binary search tree, counting
576      the nodes in the tree.  Returns the final count.
577
578      This function assumes the binary search tree is
579      the one defined in dictionary.c  It uses LLDB API
580      functions to examine and traverse the tree nodes.
581      """
582      if (root.GetValue is None):
583         return 0
584
585      if (int(root.GetValue(), 16) == 0):
586         return 0
587
588      left_size = tree_size(root.GetChildAtIndex(1))
589      right_size = tree_size(root.GetChildAtIndex(2))
590
591      total_size = left_size + right_size + 1
592      return total_size
593
594
595   def print_tree(root):
596      """
597      Recursively traverse a binary search tree, printing out
598      the words at the nodes in alphabetical order (the
599      search order for the binary tree).
600
601      This function assumes the binary search tree is
602      the one defined in dictionary.c  It uses LLDB API
603      functions to examine and traverse the tree nodes.
604      """
605      if (root.GetChildAtIndex(1).GetValue() is not None) and (
606               int(root.GetChildAtIndex(1).GetValue(), 16) != 0):
607         print_tree(root.GetChildAtIndex(1))
608
609      print root.GetChildAtIndex(0).GetSummary()
610
611      if (root.GetChildAtIndex(2).GetValue() is not None) and (
612               int(root.GetChildAtIndex(2).GetValue(), 16) != 0):
613         print_tree(root.GetChildAtIndex(2))
614
615
616dictionary.c - Sample dictionary program, with bug
617
618::
619
620   //===-- dictionary.c ---------------------------------------------*- C -*-===//
621   //
622   // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
623   // See https://llvm.org/LICENSE.txt for license information.
624   // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
625   //
626   //===----------------------------------------------------------------------===//
627   #include <ctype.h>
628   #include <stdio.h>
629   #include <stdlib.h>
630   #include <string.h>
631
632   typedef struct tree_node {
633   const char *word;
634   struct tree_node *left;
635   struct tree_node *right;
636   } tree_node;
637
638   /* Given a char*, returns a substring that starts at the first
639      alphabet character and ends at the last alphabet character, i.e. it
640      strips off beginning or ending quotes, punctuation, etc. */
641
642   char *strip(char **word) {
643   char *start = *word;
644   int len = strlen(start);
645   char *end = start + len - 1;
646
647   while ((start < end) && (!isalpha(start[0])))
648      start++;
649
650   while ((end > start) && (!isalpha(end[0])))
651      end--;
652
653   if (start > end)
654      return NULL;
655
656   end[1] = '\0';
657   *word = start;
658
659   return start;
660   }
661
662   /* Given a binary search tree (sorted alphabetically by the word at
663      each node), and a new word, inserts the word at the appropriate
664      place in the tree.  */
665
666   void insert(tree_node *root, char *word) {
667   if (root == NULL)
668      return;
669
670   int compare_value = strcmp(word, root->word);
671
672   if (compare_value == 0)
673      return;
674
675   if (compare_value < 0) {
676      if (root->left != NULL)
677         insert(root->left, word);
678      else {
679         tree_node *new_node = (tree_node *)malloc(sizeof(tree_node));
680         new_node->word = strdup(word);
681         new_node->left = NULL;
682         new_node->right = NULL;
683         root->left = new_node;
684      }
685   } else {
686      if (root->right != NULL)
687         insert(root->right, word);
688      else {
689         tree_node *new_node = (tree_node *)malloc(sizeof(tree_node));
690         new_node->word = strdup(word);
691         new_node->left = NULL;
692         new_node->right = NULL;
693         root->right = new_node;
694      }
695   }
696   }
697
698   /* Read in a text file and storea all the words from the file in a
699      binary search tree.  */
700
701   void populate_dictionary(tree_node **dictionary, char *filename) {
702   FILE *in_file;
703   char word[1024];
704
705   in_file = fopen(filename, "r");
706   if (in_file) {
707      while (fscanf(in_file, "%s", word) == 1) {
708         char *new_word = (strdup(word));
709         new_word = strip(&new_word);
710         if (*dictionary == NULL) {
711         tree_node *new_node = (tree_node *)malloc(sizeof(tree_node));
712         new_node->word = new_word;
713         new_node->left = NULL;
714         new_node->right = NULL;
715         *dictionary = new_node;
716         } else
717         insert(*dictionary, new_word);
718      }
719   }
720   }
721
722   /* Given a binary search tree and a word, search for the word
723      in the binary search tree.  */
724
725   int find_word(tree_node *dictionary, char *word) {
726   if (!word || !dictionary)
727      return 0;
728
729   int compare_value = strcmp(word, dictionary->word);
730
731   if (compare_value == 0)
732      return 1;
733   else if (compare_value < 0)
734      return find_word(dictionary->left, word);
735   else
736      return find_word(dictionary->right, word);
737   }
738
739   /* Print out the words in the binary search tree, in sorted order.  */
740
741   void print_tree(tree_node *dictionary) {
742   if (!dictionary)
743      return;
744
745   if (dictionary->left)
746      print_tree(dictionary->left);
747
748   printf("%s\n", dictionary->word);
749
750   if (dictionary->right)
751      print_tree(dictionary->right);
752   }
753
754   int main(int argc, char **argv) {
755   tree_node *dictionary = NULL;
756   char buffer[1024];
757   char *filename;
758   int done = 0;
759
760   if (argc == 2)
761      filename = argv[1];
762
763   if (!filename)
764      return -1;
765
766   populate_dictionary(&dictionary, filename);
767   fprintf(stdout, "Dictionary loaded.\nEnter search word: ");
768   while (!done && fgets(buffer, sizeof(buffer), stdin)) {
769      char *word = buffer;
770      int len = strlen(word);
771      int i;
772
773      for (i = 0; i < len; ++i)
774         word[i] = tolower(word[i]);
775
776      if ((len > 0) && (word[len - 1] == '\n')) {
777         word[len - 1] = '\0';
778         len = len - 1;
779      }
780
781      if (find_word(dictionary, word))
782         fprintf(stdout, "Yes!\n");
783      else
784         fprintf(stdout, "No!\n");
785
786      fprintf(stdout, "Enter search word: ");
787   }
788
789   fprintf(stdout, "\n");
790   return 0;
791   }
792
793
794The text for "Romeo and Juliet" can be obtained from the Gutenberg Project
795(http://www.gutenberg.org).
796
797