CSE 305: Language Interpreter Design
Due: July 8 2024, 11:59pm
1 Overview
The goal of this project is to understand and build an interpreter for a small, OCaml-like, stack- based bytecode language. You will be implementing this interpreter in OCaml, like the previous assignments. The project is broken down into three parts. Part 1 is defined in Section 4 and is worth 100 points, Part 2 is defined in Section 5 and is worth 150 points, and Part 3 is defined in Section 6 and is worth 150 points. All parts are due by July 8 2024, 11:59pm, but we recommend you keep to the following time schedule for each part:
1. Part 1 by June 17 2024, 11:59pm(100 points)
2. Part 2 by June 26 2024, 11:59pm(150 points)
3. Part 3 by July 8 2024, 11:59pm(150 points) **this is the last day to submit**
This is an individual project. You will submit a file named interpreter.ml which contains a function, interpreter, with the following type signature:
val interpreter : string * string -> unit
If your program does not match the type signature, it will not compile on Autolab and you will receive 0 points. You may, however, have helper functions defined outside of interpreter—the grader is only explicitly concerned with the type of interpreter.
You must submit a solution for each part. Each part is graded individually, so you may wish, for example, to submit your Part 3 solution for parts 1 and 2. Late submissions will not be accepted and will be given a score of 0. Test cases will also be provided on Piazza for you to test your code locally. These will not be exhaustive, so you are highly encouraged to write your own tests to check your interpreter against all the functionality described in this document.
2 Functionality
Given the following function header:
let interpreter ((input : string), (output : string )) : unit = . . .
input and output will be passed in as strings that represent paths to files just like in the Pangram assignment. Your function should write any output your interpreter produces to the file specified by output. In the examples below, the input file is read from top to bottom and then each com- mand is executed by your interpreter in the order it was read. It is incredibly useful to read in all of the commands into a list prior to executing them, separating input from the actual interpreta- tion of the commands. The input file can be arbitrarily long. You may find the library function String.split__ on__char to be useful for separating a string into a string list.
3 Grammar
The following is a context free grammar for the bytecode language you will be implementing. Terminal symbols are identified by monospace font, and nonterminal symbols are identified by italic font. Anything enclosed in [brackets] denotes an optional character (zero or one occurrences). The form '( set1 | set2 | setn )' means a choice of one character from any one of the n sets. A set enclosed in {braces means zero or more occurrences}.
The set digit is the set of digits {0,1,2,3,4,5,6,7,8,9}, letter is the set of all characters in the English alphabet (lowercase and uppercase), and ASCII is the ASCII character set. The set sim- pleASCII is ASCII without quotation marks and the backslash character. Do note that this neces- sarily implies that escape sequences will not need to be handled in your code.
3.1 Constants
const ::= int | bool | error | string | name | unit
int ::= [−] digit { digit }
bool ::= :true: | :false:
error ::= :error: unit ::= :unit:
string ::= "simpleASCII { simpleASCII } "
simpleASCII ::= ASCII \ {' \ ' , '"'}
name ::= {_} letter {letter | digit | _}
3.2 Programs
prog ::= com {com} quit
com ::= push const | add | sub | mul | div | rem | neg | swap | pop | cat | and | or | not | lessThan | equal | if | bind | let com {com} end | funBind com {com} [ return ] funEnd | call
funBind ::= (fun | inOutFun) name1 name2
4 Part 1: Basic Computation
Suggested Completion Date: June 17 2024, 11:59pm
Your interpreter should be able to handle the following commands:
4.1 push
4.1.1 Pushing Integers to the Stack
push num
where num is an integer, possibly with a ’-’suggesting a negative value. Here ’-0’should be regarded as ’0’. Entering this expression will simply push num onto the stack. For example,
input
|
stack
|
push 5
push -0
|
0
5
|
4.1.2 Pushing Strings to the Stack
push string
where string is a string literal consisting of a sequence of characters enclosed in double quotation marks, as in "this is a string". Executing this command would push the string onto the stack:
input
|
stack
|
push "deadpool"
push "batman"
push "this is a string"
|
this a string
batman
deadpool
|
Spaces are preserved in the string, i.e. any preceding or trailing whitespace must be kept inside
the string that is pushed to the stack:
input
|
stack
|
push " deadpool "
push "this is a string "
|
this␣is␣a␣string␣␣
␣deadp␣ool␣
|
You can assume that the string value would always be legal and not contain quotations or escape sequences within the string itself, i.e. neither double quotes nor backslashes will appear inside a string.
4.2 Pushing Names to the Stack
push name
where name consists of a sequence of letters, digits, and underscores, starting with a letter or underscore.
1. example
2. example
If name does not conform to previously mentioned specifications, only push the error literal (:error:) onto the stack instead of pushing name.
To bind ‘a’ to the value 13 and name1 to the value 3, we will use ‘bind’ operation which we will see later (Section 5.7) You can assume that a name will not contain any illegal tokens—no commas, quotation marks, etc. It will always be a sequence of letters, digits, and underscores,
starting with a letter (uppercase or lowercase) or an underscore.
4.3 boolean
push bool
There are two kinds of boolean literals: :true: and :false:. Your interpreter should push the corresponding value onto the stack. For example,
input
|
stack
|
push 5
push :true:
|
:true: 5
|
4.4 error and unit
push :error:
push :unit:
Similar with boolean literals, pushing an error literal or unit literal will push :error: or :unit: onto the stack, respectively.
4.5 pop
The command pop removes the top value from the stack. If the stack is empty, an error literal
(:error:) will be pushed onto the stack. For example,
4.6 add
The command add refers to integer addition. Since this is a binary operator, it consumes the top two values in the stack, calculates the sum and pushes the result back to the stack. If one of the following cases occurs, which means there is an error, any values popped out from the stack should be pushed back in the same order, then a value :error: should also be pushed onto the stack:
• not all top two values are integer numbers
• only one value in the stack
• stack is empty
for example, the following non-error case:
Alternately, if there is only one number in the stack and we use add, an error will occur. Then
5 should be pushed back as well as :error:
4.7 sub
The command sub refers to integer subtraction. It is a binary operator and works in the following way:
• if top two elements in the stack are integer numbers, pop the top element(y) and the next element(x), subtract y from x, and push the result x-y back onto the stack
• if the top two elements in the stack are not all integer numbers, push them back in the same order and push :error: onto the stack
• if there is only one element in the stack, push it back and push :error: onto the stack
• if the stack is empty, push :error: onto the stack
for example, the following non-error case:
Alternately, if one of the top two values in the stack is not a numeric number when sub is used, an error will occur. Then 5 and :false: should be pushed back as well as :error:
4.8 mul
The command mul refers to integer multiplication. It is a binary operator and works in the following way:
• if top two elements in the stack are integer numbers, pop the top element(y) and the next element(x), multiply x by y, and push the result x*y back onto the stack
• if the top two elements in the stack are not all integer numbers, push them back in the same order and push :error: onto the stack
• if there is only one element in the stack, push it back and push :error: onto the stack
• if the stack is empty, push :error: onto the stack
For example, the following non-error case:
Alternately, if the stack empty when mul is executed, an error will occur and :error: should be pushed onto the stack:
4.9 div
The command div refers to integer division. It is a binary operator and works in the following way:
• if top two elements in the stack are integer numbers, pop the top element(y) and the next element(x), divide x by y, and push the result y/x back onto the stack
• if top two elements in the stack are integer numbers but y equals to 0, push them back in the same order and push :error: onto the stack
• if the top two elements in the stack are not both integer numbers, push them back in the same order and push :error: onto the stack
• if there is only one element in the stack, push it back and push :error: onto the stack
• if the stack is empty, push :error: onto the stack
For example, the following non-error case:
Alternately, if the top element in the stack equals to 0, there will be an error if div is executed. In such situations 5 and 0 should be pushed back onto the stack as well as :error:
4.10 rem
The command rem refers to the remainder of integer division. It is a binary operator and works in the following way:
• if top two elements in the stack are integer numbers, pop the top element(y) and the next element(x), calculate the remainder of y/x, and push the result back onto the stack
• if top two elements in the stack are integer numbers but y equals to 0, push them back in the same order and push :error: onto the stack
• if the top two elements in the stack are not all integer numbers, push them back and push :error: onto the stack
• if there is only one element in the stack, push it back and push :error: onto the stack
• if the stack is empty, push :error: onto the stack For example, the following non-error case:
Alternately, if one of the top two elements in the stack is not an integer, an error will occur if rem is executed. If this occurs the top two elements should be pushed back onto the stack as well as :error:. For example:
4.11 neg
The command neg is to calculate the negation of an integer (negation of 0 should still be 0). It is unary therefore consumes only the top element from the stack, calculate its negation and push the result back. A value :error: will be pushed onto the stack if:
• the top element is not an integer, push the top element back and push :error:
• the stack is empty, push :error: onto the stack
For example, the following non-error case:
Alternately, if the value on top of the stack is not an integer, when neg is used, that value should be pushed back onto the stack as well as :error:. For example:
4.12 swap
The command swap interchanges the top two elements in the stack, meaning that the first element becomes the second and the second becomes the first. A value :error: will be pushed onto the
stack if:
• there is only one element in the stack, push the element back and push :error:
• the stack is empty, push :error: onto the stack
For example, the following non-error case:
Alternately, if there is only one element in the stack when swap is used, an error will occur and :error: should be pushed onto the stack. Now we have two elements in the stack (5 and :error:), therefore the second swap will interchange the two elements:
4.13 toString
The command toString removes the top element of the stack and converts it to a string. If the
stack is empty, the error value is pushed instead. Conversions of values to strings should be done as follows:
• Integers shall be rendered as decimal values with no leading zeros. Negative integers should be prefixed with a ‘- ’.
• Booleans shall be rendered as the values “:true” and “:false:” .
• Error values shall be rendered as “:error:”
• Unit values shall be rendered as “:unit:”
• String values should be left unchanged (i.e. surrounding punctuation may not be added)
• Names shall be converted to the corresponding string, contents unchanged.
• Closures (once introduced in Part 3) shall be rendered as “:fun:”
4.14 println
The println command pops a string off the top of the stack and writes it, followed by a newline, to the output file that is specified as the second argument to the interpreter function. In the case that the top element on the stack is not a string, it should be returned to the stack and an :error: pushed. If the stack is empty, an :error: shall be pushed.
4.15 quit
The command quit causes the interpreter to stop.