UMBC CMSC 331 Spring 2011
Principles of Programming Languages
Home · About · Schedule · HW · Exams · Notes · Code · Examples · Resources ·

Homework 3
Semantics, regular expressions, lex

out: March 2, due: March 14

This homework is due on Monday March 14. Part should be submitted in class and part via the gl submit system. The last problem must work on gl. We will use the lex command and Makefile we show and check your program with test inputs that include the example given as well as other cases.

Make sure that the work you turn in identifies you!

(1) Axiomatic semantics I (5 points)

Compute the weakest precondition for the following sequence of statements and the given post condition.

{ ?? }
a = b + 1;
c = a + 3;
{c > 5}

Hint: Start by deciding what variable or variables should be mentioned in the precondition. Recall the rule for sequences: the precondition for a statement in a sequence is equal to the postcondition for the statement that preceeds it.

(2) Axiomatic semantics II (5 points)

Compute the weakest precondition for the following statements and the given post condition. .

{ ?? }
if (x < 0) then y = x + 1 else y = x - 1;
{ y > 0 }

Hint: Analyze the two cases that the if-then-else creates: one where x is less than zero and one where it is not. Figure out what are the weakest preconditions for each. Find a single precondition that logically implies both of these.

On Regular expressions

The lexical level of a programming language is usually specified with regular expressions or finite state automata. Consider the definition of strings for simple integers. We specify that an unsigned integer must be either a single zero, or be a string that begins with a non-zero digit and continues with any number of additional digits (e.g., 1, 451, and 0 are all ok, but 001 foo and 3.14 are not ok). This can be described by the deterministic finite automaton (DFA) to the right. Note that we identify the start state and mark each accepting state with an asterisk. The transitions (arrows) are labeled with the input that will cause the transition and we use the common notation [c1-c2] to mean any single character in the ascii range between c1 and c2. We can also write this as a regular expression:

0 | [1-9] [0-9]*

(3) DFA for simple real number strings (30 points)

(a) Draw a DFA for a real number that satisfies the following description, using the conventions above. (10 points)

A real number can start with an optional sign which can be "-" or "+" and consists of an integer part followed by a decimal point followed by a fractional part. The integer part can be a single zero or a non-empty sequence of digits that does not start with a zero. The fractional part is a non-empty sequence of digits. Positive examples include 0.0, +0.0, 0.12, 12.3 and -9.87 . Negative examples are: 0, 01.2, -01.2, 3. and 42 .

Identify the start state and all accepting states and label every arc. Since this is a deterministic and not a non-deterministic finite automaton, every arc must have a label which cannot be epsilon.

(b) Write a regular expression that corresponds to the DFA, making it as simple as possible. Use parentheses to ensure proper scope or for clarity. (10 points)

(c) Create a state transition diagram corresponding to the DFA you created for part (a).Each row corresponds to a state, and each column corresponds to a set of (one or more) input characters. The table should have no blank entries. (10 points)

(4) Parsing HTML with lex (50 points)

As you all know, HTML is the basic language of the WWW. We will create a lexer for a simplified version of HTML. Tokens are defined as ordinary words or numbers, which consist of one or more consecutive alphanumeric characters. In addition, HTML defines a number of tags, of the form "<foo>", without quotes, where foo can be, for example, any string in the set {'p','bold','pre','quote''} among many others. URLs are of the form "http://x.y.z", without quotes. Let a URL be a token in its own right. Tags such as <pre> and <p> are also tokens. Whitespace can be ignored, except for separating one token from another.

Develop a lexical scanner for HTML using the Unix utility Lex or Flex. Start by reading the compact guide to lex and yacc. Then look at and experiment with the the simplescanner example which uses lex to create a simple lexical scanner for a Pascal-like language. The file scanner.l is is the input to lex. Running make with the Makefile will use lex to create scanner.c and compile this to produce an executable named scanner. Invoking this with a simple file input will produce this session.

Your program should take a simplified HTML file such as simple.html, and produce a list of tokens that appear in that file.

For your convenience, the directoriy simplescanner is available as the tar file, hw3.tar. You can easily download and expand this from your gl account, as this session shows.

What to submit

Turn in your answers to problems 1-3 in class on Monday, March 14. Submit your file scanner.l for simplified HTML via the submit system before11:59pm on Monday, March 14.