Homework4

Lex and Yacc

out:10/12, due:10/20

The goal of this exercise is to acquaint yourself with two very powerful tools for the generation of compilers: lex and yacc. When a program is presented to a compiler, the compiler has to divide the program into meaningful units, called tokens, and then discover the relationship among these units. The algorithm that divides the program into units is called a lexical analyzer, scanner, or lexer.  Once the tokens are identified, the compiler needs to find the expressions, statements, declarations, blocks, and procedures in the program. This task is the function of a parser. The parser requires a list of rules that define the relationships among the tokens. This list of rules is called a grammar

Lex (or the gnu version which is called flex) is a program that takes a set of descriptions of possible tokens and produces a C routine that implements a scanner. Yacc takes a concise description of a grammar and produces a C routine that implements a parser for the grammar. 

Lex and yacc are tools which can be used to solve a variety of interesting problems, even outside the realm of compilers. Once you understand how they work, you will find that the use of lex and yacc will save you many programming hours. Consult the Lex&Yacc Page for lots of information on these useful Unix tools. We also have some comments on running lex and yacc at UMBC.

Calculator

Start by creating a directory called calc and copying the files from this directory into it. The key files are Makefile, calc.h, calc.l, and calc.y . We've also included input as a sample input for calc and output is the result of executing calc <input >output .

Modifying the grammar file

For this exercise you will only need to modify the file, calc.y, which is a yacc file that describes the rules of the grammar that implements the calculator. First you should experiment playing with the calculator. Make sure to run the make clean to remove older versions of the files and then type make:

 make clean
 make
You now have an infix calculator. If you type ``3+2'' the calculator answers with ``5''; if you type ``x=4'' followed by ``3*x'' the calculator responds with ``12''. You invoke the calculator by running the program calc. See the sample session for an example. Now you should play with the calculator and check what works and what does not. You should be able to do addition, subtraction, and multiplication, division, equality and assignment as well as using variable names. We suggest that you look at the files calc.l and calc.y as you play. This way you will understand how the calculator was specified.  Your task is to modify the calculator to convert the grammar to use polish prefix notation and also add some additional operators.

In prefix notation, operators come before their their operands. An advantage, shared with polish postfix notion, is that the notation does not require the concept of operator precedence to make expressions unambiguous as long as all of the operators take a fixed number of arguments. Postfix notation was used in some calculators (e.g., the HP-10C) and is supported in the Haskell programming language. As we will see, the LISP family of languages can be viewed as using a prefix notation with explicit parenthesis -- a notation that some still refer to as "Cambridge Prefix".

Modify the calc grammar to support prefix notation. To solve the requirement that all operators have fixed arity (i.e., number of arguments), use the tilda symbol (~) for unary minus instead of overloading the - operator. Here is an example of what your new prefix calculator should be able to do.

The additional operators include the following.

  • New binary comparison operators: < and >
  • Logical binary operators and and or and the logical unary operator not
  • A conditional operator if that takes three arguments: a test, value to return if the test is true and value to return if the test is false

These changes will be relatively simple, and will require modifying both the lex file calc.l yacc file calc.y. See test_session, test_input and test_output for examples that your prefix calculator should handle. Here are some comments to help you get started.

  • If we want to use prefix notation with just one token look ahead, an operator must take a fixed number of arguments. So we can not use the same operator (-) for both binary and unary minus. Instead, we will use the ~ character to represent unary minus. This is one simple change you will have to make.
  • We will use words for the logical operators and, or and not and the conditional if. Unless we change the calc.l file to treat 'and', 'or', 'not' and 'if' as tokens, they will be seen as NAMEs. So you will have to add for new rules, one for each of these strings that will return the appropriate token for which you might use the symbol AND, OR, NOT and IF. Here is an example lex rule you will have to add.
    and     { return AND; }
    But be careful where you place the rule. The lex rule that recognizes a NAME can also match the sequence 'and'. Since both rules match the same number of characters, the one that comes first will be the one used. We always want to match a keyword like 'and', 'or' and 'not' as a token, so these rules must come before the one that matches a NAME. We do not have the same problem with a token like '<' because that will not be matched by any other rule.
  • Since we have added new tokens, you have to put a %token declaration for each in the first section of the calc.y file. These four new tokens will not have a type associated with them, so add a declaration like "%token AND". Because we are using a polish prefix grammar, we do not have to worry about precedence or associativity. You can leave these declarations in the first section of the calc.y file -- they will not cause a problem. You can also delete them.
  • I recommend doing the modifications in two steps. First try converting the initial calculator from infix to prefix without adding the new operators. Once that is working, save a copy and then work on adding the new operators.

Don't worry if yacc complains about shift/reduce conflicts so long as the parser works properly.

Submit the modified calc.y and calc.l files.