Phase 5

Semantic Routines


Phase 5 of the compiler project consists of writing routines to handle variable scoping, type checking, type coercions, and detection of any other errors which are not caught in earlier phases.

Description

Each routine operates on the source tree, checking things and modifying as needed. We define five value types to be used in these routines:

enum type_t {
    TYPE_ILLEGAL,       /* initial value */
    TYPE_INT,
    TYPE_FLOAT,
    TYPE_ERROR,
    TYPE_NONE           /* for typeless nodes */
    };

Your variable scoping routine will assign a unique number, as well as an entry in a new DEFINITION TABLE, to each variable entity. The definition table has, for each entity, a type (TYPE_INT or TYPE_FLOAT, or TYPE_ERROR) and an indication of whether the entity is a parameter or a variable. The TYPE_ERROR type will be given to identifiers which were not declared. The data structure for the definition table is as follows:

enum v_kind {
    PARAMETER,
    VARIABLE
    };

typedef struct def
    {
    enum type_t type;
    enum v_kind kind;
    } DEFINITION;

extern DEFINITION def_table[256];
extern int def_cnt;     /* initialize it to zero */

Two fields are added to the existing NODE structure: an index into the definition table (applicable only to IDENT nodes), and the value type of the node as described above. Also, new node types are needed to represent the non-overloaded operators and the type converters. The new types and the new NODE structure are given here .

A new show_tree.c is provided to reflect the new node structure and node types.

You are provided with six test programs:

  1. tst1.c--
  2. tst2.c--
  3. tst3.c--
  4. tst4.c--
  5. tst5.c--
  6. tst6.c--

The first four are valid programs, but the other two have various errors. Turn in your project in the usual way.

The outputs of tst1.c-- and tst2.c-- are

Implementation hints

Each of these routines consists of doing a recursive traversal of the source tree generated by the parser. The tree gets 'decorated' with additional information (the two new fields in the NODE structure) as these routines do their work.

One way to implement the scoping routine is to traverse the tree while maintaining a stack of in-scope variables. The program parameters make up the outermost range. The variables in the compound statement makes up the first inner range. It is valid (although not necessarily useful) to redefine a program parameter as a variable in a compound. Each time a variable declaration is encountered it is assigned a unique number (really an index into the definition table), the entry in the definition table is made, and a pair of numbers is pushed onto the front of the list (the pair is the symbol number and the definition number). After the subtree under a set of variables' scope has been traversed, the in scope identifiers are popped from the stack. Thus the stack always has the variables currently in scope at a given point of the traversal. When use of an identifier is found during the traversal, the stack is scanned from latest to oldest entries; the first symbol match determines the entity represented by the symbol; the 'defval' field of the node is given its value at this time. Two errors can occur: an identifier may not be in scope (ie, undeclared), or there may be a multiple definition of an identifier within a range. You should give appropriate error reports for both errors.

The type checking routine does a post-order traversal, tagging each node with its type and returning that type. For the overloaded operators, the types returned from its children will determine whether it is TYPE_INT or TYPE_FLOAT. Except for assignments, coercions will go only from integer to float, so if any child is a float, then a float operation is selected and any integer children must be coerced to float. The overloaded operator node type should be replaced by the non-overloaded type (eg, change an ADD_OP to ADD_OP_INT). Note that assignment statements return values (ie, they are expressions), so they have a return type as well. Statements have no return value, so they can get TYPE_NONE. The special TYPE_ERROR is returned when a type error has occurred. This is used to prevent avalanching of error messages; it will always be accepted as a valid type.

(Note that the type field on operators whose type is resolved is not necessarily redundant. For example, a GT_OP_FLOAT node will have a type of TYPE_INT.)

Hint

Two functions have been provided that you might find useful in debugging: printNode.cpp and printDefinitionTable.

What You Will Turn In

All project phases will be submitted via the online homework submission system. For this phase you will submit a file called phase5.zip. This WinZip compatable archive file should contain the following items:

  1. In a directory called phase5, the source code for you implementation. Include all .cpp as well as .h files and project files. Do not include the Debug directory. Make sure that it is possible for your instructor to compile your program from the source and project files. Make sure that your .exe will function from the command line.
  2. A one page descrption (called test.txt, and not placed in the phase5 directory) of the testing performed to insure that your program is operating properly. List any limitations that your program has. Admitting program limitations, rather than having your instructor find them will result in fewer points subtracted.


* Based on a project from here.