This is a continuation of Part II. In this part, we will add support for numerical as well as string expressions, support multiple expression in a PRINT statement, and add functionality for deleting program lines.
Let’s start with the Basic class header file, basic.h. All we need to do here is add a member function for deleting lines from the stored program:
void remove(int index); // remove program line
For the class file, basic.cpp, since we already have code in in the add function to remove a line if it already exists, we will move that into the new remove function, and then call remove from add. Here are the changes:
// add a program line at index, overwriting if it exists void Basic::add(int index, const Program *program){ // see if index already exists, if so delete it remove(index); lines.insert(std::pair<int, const Program *>(index, program)); } // remove an existing line from the program void Basic::remove(int index){ map<int, const Program*>::iterator it = lines.find(index); if( it != lines.end() ){ const Program *old = it->second; delete old; lines.erase(index); } }
The Program base class doesn’t change at all. The derived Print class has a minor change: we will be sub-classing Expression to support numerical and string expressions, so we need to make its functions virtual and use pointers in the Print class so the derived class member functions will be called. In the header file, print.h, change the constructor signature to use Expression *:
Print(const std::vector<Expression*> *exprList); // create with a vector of expressions to print
Also change the member variable declaration exprList to match:
const std::vector<Expression*> *exprList; // store the expressions here
In the class file, print.cpp, we have to update the method signatures to use pointers, and make a small change to how we access the elements in the vector. Because the elements are now pointers instead of actual objects, instead of using the array subscript method, we have to use the at() function. Here is the updated class file:
#include #include "print.h" using std::endl; using std::cout; // constructor for Print class Print::Print(const std::vector<Expression*> *exprList){ this->exprList = exprList; } // prints out each expression to std::cout void Print::execute() const{ for( int i = 0; i < exprList->size()-1; i++ ){ cout << exprList->at(i)->value() << ' '; } cout << exprList->at(exprList->size()-1)->value() << endl; } // lists the expressions, as they were originally given void Print::list(ostream& os) const{ os << "PRINT "; for( int i = 0; i < exprList->size()-1; i++ ){ os << exprList->at(i)->print() << ", "; } os << exprList->at(exprList->size()-1)->print(); }
The Expression
class has a few changes. Since we will be creating subclasses of it, its member functions need to be virtual. Also, since we will only be using subclasses, we no longer need the member variable text
. Here is the header file, expression.h:
#ifndef _EXPRESSION_H_ #define _EXPRESSION_H_ #include <string> /* Base class used for storing and evaluating data items */ class Expression { public: virtual const std::string value() const; // return the stored value virtual const std::string print() const; // printable version }; #endif
The class file, expression.cpp, likewise becomes simplified:
#include "expression.h" // return the text value const std::string Expression::value() const{ return std::string("BASE EXPRESSION"); } // return a string for printing const std::string Expression::print() const{ return std::string("BASE EXPRESSION"); }
Our first subclass of Expression
is StringExpression
, which is basically the same as our original implementation of Expression
. Here is the header file, stringexpression.h:
#ifndef _STRINGEXPRESSION_H_ #define _STRINGEXPRESSION_H_ #include "expression.h" /* Class used for storing a text value */ class StringExpression : public Expression { public: StringExpression(const char *text); // take a string as input const std::string value() const; // return the stored value const std::string print() const; // printable version private: std::string text; // data storage }; #endif
Likewise, the implementation in stringexpression.cpp duplicates the functionality of the old Expression
class:
#include "stringexpression.h" // create a new StringExpression, storing its text StringExpression::StringExpression(const char *text){ this->text = std::string(text); } // return the text value const std::string StringExpression::value() const{ return text; } // return a string for printing const std::string StringExpression::print() const{ return '"' + text + '"'; }
The class DoubleExpression
is very similar to StringExpression
, with the only difference being it stores a numerical value instead of a string value. Here is the header file, doubleexpression.h:
#ifndef _DOUBLEEXPRESSION_H_ #define _DOUBLEEXPRESSION_H_ #include "expression.h" /* Class used for storing a numerical value */ class DoubleExpression : public Expression { public: DoubleExpression(double d); // take a double as input const std::string value() const; // return the stored value const std::string print() const; // printable version private: double d; // data storage }; #endif
The implementation file, doublexpression.cpp, is naturally also very similar. Note that since printable version for calling LIST
on a numerical expression is the same as its text value, the print
function simply makes a call to value
:
#include "doubleexpression.h" // create a new DoubleExpression, storing its value DoubleExpression::DoubleExpression(double d){ this->d = d; } // return the text value const std::string DoubleExpression::value() const{ return std::to_string(d); } // return a string for printing const std::string DoubleExpression::print() const{ return value(); }
OK, now that we’ve got our classes set up, let’s update our Bison input file. The first change is to add our two new header files in the C declarations section:
#include "stringexpression.h" #include "doubleexpression.h"
Next, in the Bison declarations, we need to add some new token types to the %union. We need to support reading double
s, Expression *
s, and std::vector<Expression*> *
s:
%union { int iVal; double dVal; char *sVal; Program *progVal; Expression *eVal; std::vector<Expression*> *eList; }
Since we will now be able to enter a comma-separated list of expressions for the PRINT
statement, we need to add a new constant token:
%token COMMA
We also have a new terminal symbol token type for reading doubles:
%token <dVal> DOUBLE
And we will round out the Bison declarations with two new non-terminal symbols to use our single expression and expression list types:
%type <eList> exprList %type <eVal> expr
Now that we’ve got the symbol types defined, we need to update the grammar rules. The input
and line
rules remain the same, but we will change the program
rule and add two new rules for reading expressions: exprList
and expr
. Here are the new rules:
stmt: LINE { Basic::instance()->remove($1); } | LINE program { Basic::instance()->add($1, $2); } | RUN { Basic::instance()->execute(); } | LIST { Basic::instance()->list(); } ; program: PRINT exprList { $$ = new Print($2); } ; exprList: expr { $$ = new std::vector<Expression*>(1, $1); } | exprList COMMA expr { $1->push_back($3); $$ = $1; } ; expr: STRING { $$ = new StringExpression($1); free($1); // malloced in basic.l } | DOUBLE { $$ = new DoubleExpression($1); } ;
We have added one new option to the stmt
rule: a single LINE
terminal symbol can be input to delete that line from the stored program. The program
rule is simpler now: it reads a PRINT
symbol followed by a new non-terminal symbol exprList
, which we defined in our Bison declarations section to be of type std::vector<Expression*> *
. We can pass this symbol to our updated Print
constructor.
Now we have our new rules. The exprList
rule is satisfied by either a single expr
, or by recursion a sequence of expr
s separated by commas (the terminal symbol COMMA
we added). When Bison encounters the first matching expr
, it creates a new std::vector<Expression *>
, which matches the eList
symbol type we defined, and populates it with the single Expression
it read. As additional expression separated by commas are matched, they are pushed onto the end of the vector (the exprList
symbol in the first matching position), and then that vector is set as the result of the rule.
The expr
rule matches either a STRING
or a DOUBLE
terminal symbol, and builds the appropriate Expression
derived class, as appropriate. For the case of the string, we again need to free the terminal symbol value, since we explicitly allocated it in the flex scanner.
Something that would be nice is if we had a prompt character at the beginning of each input line when running our BASIC interpreter, to make the input and output lines a little more distinguishable. So, in the main
function, before the while
loop, print out a welcome statement (we will change the flex scanner so that it will print out another prompt character every time it reads a new line):
std::cout << "Welcome to BASIC!\n>";
OK, on to the flex scanner! We need to add support for the new terminal symbols we added in Bison, as well as a couple other minor changes. First, it gets annoying having to enter the BASIC commands using capital letters. Fortunately for us, flex provides an option to make the input matching case-insensitive, so we don’t need to redefine our rules; simply add the case-insensitive
option to the first line:
%option noyywrap case-insensitive
We’re going to be doing some printing directly from our flex scanner now, so be sure to include iostream
. We also have defined a token type that uses std::vector
, so we need to include that header:
#include <iostream> #include <vector>
The next change is to add a declaration of the Expression
class, since it is used in the symbol type definitions that Bison exports. The flex code never uses this class, so instead of including its header file, we just declare it:
class Program; class Expression;
Since we now need to be able to read in a double-type numerical value, we will be using the pattern [0-9]
many times, so we will create a flex declaration to use DIGIT
in its place. While this isn’t necessary, it can make the patterns a little more clear. Put this line after the end of the C declarations (%}
) and before the start of the pattern matching (%%
):
%} DIGIT [0-9] %%
Now we can use the pattern {DIGIT}
wherever we would use [0-9]
. We will make use of this for both the LINE
symbol and the DOUBLE
symbol. Here are the updated rules:
[ \t]+ ; // skip white space ^{DIGIT}+ { yylval.iVal = atoi(yytext); return LINE; } \"[^\"\n]*\" { // remove the quote marks from either end yylval.sVal = (char *)malloc(sizeof(char)*strlen(yytext)-1); strncpy(yylval.sVal, yytext+1, strlen(yytext)-2); yylval.sVal[strlen(yytext)-2] = '\0'; // terminate the string! return STRING; } (({DIGIT}+"."?)|("."{DIGIT}+)){DIGIT}*(E-?{DIGIT}+)? { yylval.dVal = atof(yytext); return DOUBLE; } PRINT { return PRINT; } RUN { return RUN; } LIST { return LIST; } , { return COMMA; } \n { std::cout << ">"; return ENDL; } . ; // skip any characters that aren't part of a recognized pattern
The updated rule that matches a LINE
symbol now requires the sequence of digits to be at the beginning of a line. This ensure that flex won’t match a numerical expression and return it as a LINE
symbol.
The new rule for matching a DOUBLE
is a little bit tricky. There are two ways to start this pattern: either one or more digits followed by an optional dot, or a starting dot followed by one or more digits. After the start, there may be zero or more digits, optionally followed by a literal E with an optional minus sign followed by one or more digits. Essentially, this rule requires that there be one or more digits, on either side of an optional
dot, and support for exponential notation followed. Here are some examples of inputs that this will match:
123 123. .123 123.123 123E123 123E-123 123.123E-123 .123E-123 etc.
Since we have a new COMMA
terminal symbol defined in Bison to support comma-separated expression lists, we need to match that here (line 32). Since we want to have a prompt character show up at the beginning of each new input line, we updated the ENDL
match to print one out every time we read a new line character. Lastly, we have a new rule, .+ ;
, which silently reads and discards any characters that it can’t match to previous rules; this prevents some unwanted input for reaching Bison, where it could cause a parsing error.
And of course, we need to update our makefile to include our new header and source files. I’ve been compiling on macOS, but on Linux we also need to tell g++ to use the C++11 standard, by adding the -std=c++11
argument. Here is the updated all
rule:
all: basic.tab.c lex.yy.c \ basic.h basic.cpp \ program.h program.cpp \ print.h print.cpp \ expression.h expression.cpp \ stringexpression.h stringexpression.cpp \ doubleexpression.h doubleexpression.cpp g++ -std=c++11 basic.tab.c lex.yy.c program.cpp basic.cpp print.cpp expression.cpp \ stringexpression.cpp doubleexpression.cpp \ -o basic
Here is an example session made from this Part III code:
Welcome to BASIC! >10 print "Hello" >20 print 123, .123, 123.123E123 >30 print "Big Number!" >list 10 PRINT "Hello" 20 PRINT 123.000000, 0.123000, 123122999999999995047514223697997329420208162078904722947470938484432926767618625996126391524302018196895898346062018145943552.000000 30 PRINT "Big Number!" >run Hello 123.000000 0.123000 123122999999999995047514223697997329420208162078904722947470938484432926767618625996126391524302018196895898346062018145943552.000000 Big Number! >20 >list 10 PRINT "Hello" 30 PRINT "Big Number!" >run Hello Big Number! >
As you can see, we’ll want to clean up the numerical printing a bit, but otherwise, things are looking good! The complete source files from this part are available here: https://github.com/VisceralLogic/basic/tree/part-3
Continue to Part IV.
Pingback: BASIC Interpreter, Part II | Visceral Logic Programming
Pingback: BASIC Interpreter, Part IV | Visceral Logic Programming