Bad VM
Bad VM is a simple constrained virtual machine I made as a learning exercise. It is based entirely around a 2-byte word. Each operation is encoded in one word, each register holds one word, and the stack is a stack of words. Also it uses both stacks and registers, which I’ve read is not a very practical approach.
Motivation
I wanted to learn more about how virtual machines and high level languages work in a way that would let me play around with how it functions. I also wanted to provide myself with constraints to really think about how to provide high level functionality with limited resources. Some of the features were motivated by curiosity, such as “how can I write an application that can completely rewrite itself?”
The result is a very bad, very insecure, virtual machine that uses stack and registers and can’t multiply. It is fun, though. I plan on changing it in the future when I have more ideas to play with.
Design
The entire design was centered around uint16_t, or 2 byte words of data. The first nibble is the operation, as described in the table below. The second, third, and fourth nibbles are used to denote registers, immediate values, addresses or arguments for an operation. This limitation means we can have 16 operations (0x0 to 0xF), 16 registers (0x0 to 0xF) and 256 addresses (0x00 to 0xFF). Thus, the maximum stack depth is 256 because it indexes from 0x00 to 0xFF. The maximum program size is 256 instructions so that every operation is accessible by the CHNG operation and reachable by JMP operations.
operation | nibble 0 | nibble 1 | nibble 2 | nibble 3 | notes |
---|---|---|---|---|---|
NOOP | 0x0 | do nothing | |||
LOAD | 0x1 | reg1 | imm | imm | load imm into reg1 |
ADD | 0x2 | reg1 | reg2 | reg3 | reg1 = reg2 + reg3 |
SUB | 0x3 | reg1 | reg2 | reg3 | reg1 = reg2 - reg3 |
CMP | 0x4 | reg1 | reg2 | reg3 | same as SUB |
CHNG | 0x5 | reg1 | address | address | change instruction @address to reg1 |
PUSH | 0x6 | reg1 | push reg1 onto the stack | ||
POP | 0x7 | reg1 | pop the stack into reg1 | ||
JMPL | 0x8 | reg1 | address | address | jump pc to @address if value at reg1 < 0 |
JMPG | 0x9 | reg1 | address | address | jump pc to @address if value at reg1 > 0 |
JMPE | 0xA | reg1 | address | address | jump pc to @address if value at reg1 == 0 |
JMPU | 0xB | address | address | jump pc to @address unconditionally | |
READ | 0xC | reg1 | read 16 bits from FIFO into reg1 | ||
PRNS | 0xD | 0 to print char, 1 to print int | print off the stack until a 0x0000 is reached | ||
PRNT | 0xE | reg1 | print chars from nibble 2 and 3 of program @reg1 until a 0x0000 is reached | ||
HALT | 0xF | exit the program |
Assembler
The “assembler” is just a simple python script that takes hex characters from a text file and writes them out in binary to a new output file. The proper usage is:
$ python3 assemble.py inputFile outputFile
The script is pretty simple. Let’s take a look.
with open(sys.argv[1], "r") as f:
data = "".join(line.rstrip()[0:4] for line in f)
This opens the input file, strips out all the whitespace, and concatenates all the operations together. It only takes the first 4 characters from each line, the rest of the characters on each line can be used as comments.
if len(data) % 4 != 0:
print("Error: input file contains incomplete or malformed operations")
sys.exit(1)
If the number of characters isn’t a multiple of 4 then one of the commands is not properly formed and the script should exit.
print("----- PROGRAM DUMP -----")
print("INS REG1 REG2 REG3 IMM")
print("---- ---- ---- ---- ----")
for i in range(0, len(data), 4):
print(f"{op_pairs[data[i]]} r{data[i+1]} r{data[i+2]}"
f" r{data[i+3]} #{data[i+2:i+4]}")
The commands are interpreted and dumped to the terminal for verification by the
user. The op_pairs
dictionary maps the operation nibble to its name.
data = bytearray.fromhex(data)
data[0::2], data[1::2] = data[1::2], data[0::2] # swap word endianness
Finally, the result is actually assembled. The data is turned into a bytearray from the hex string first. Then, at least on my system, the bytes in each byte pair need to be swapped to be properly read by the VM. The resulting file can be given to the vm and run.
Examples
Assembler
You can assemble the example program by using the existing pipenv environment and compiling the demo program as follows.
$ cd vm/asm
$ pipenv shell
$ ./assemble.py demo.vm output.vm
----- PROGRAM DUMP -----
INS REG1 REG2 REG3 IMM
---- ---- ---- ---- ----
LOAD r1 r0 rA #0A
LOAD r2 r0 r1 #01
LOAD rF r0 r0 #00
LOAD rE r0 r0 #00
SUB r0 r1 r2 #12
JMPE r0 r0 rC #0C
SUB r0 r0 r2 #02
ADD rD rE r0 #E0
PUSH rF r0 r0 #00
PUSH rD r0 r0 #00
PRNS r0 r0 r1 #01
JMPG r0 r0 r6 #06
LOAD rA r1 r2 #12
PRNT rA r0 r0 #00
READ r3 r0 r0 #00
CHNG r3 r0 r0 #00
JMPU r0 r0 r0 #00
HALT r0 r0 r0 #00
NOOP r0 r6 r8 #68
NOOP r0 r6 r5 #65
NOOP r0 r6 rc #6c
NOOP r0 r6 rc #6c
NOOP r0 r6 rf #6f
NOOP r0 r2 r0 #20
NOOP r0 r7 r7 #77
NOOP r0 r6 rf #6f
NOOP r0 r7 r2 #72
NOOP r0 r6 rc #6c
NOOP r0 r6 r4 #64
NOOP r0 r2 r1 #21
NOOP r0 r0 r0 #00
Example Program
With your program assembled, open up one terminal window and run the following command. The VM uses stderr for debug information so we will pipe it to /dev/null and get just the program output.
$ ./vm /tmp/fifo asm/output.vm 2>/dev/null
8
7
6
5
4
3
2
1
hello world!
waiting to read 16 bits from FIFO /tmp/fifo...
Now the program is waiting to receive two hex characters into the FIFO at /tmp/fifo. You can echo four characters into the pipe and they will be interpreted as binary data by the VM and read into the program. The code is converting ascii characters to their equivalent bytes, do not actually pipe binary data into the VM. Open up a second terminal window and enter the following command.
$ echo FFFF > /tmp/fifo
In this case we’re piping FFFF
into the program
which writes 0xFFFF
as an instruction into index 0 of the program and then
unconditionally jumps to index 0. Since we told it to write 0xFFFF
the program
will halt.
We could also write 110F
and the program will execute again instead
loading 15 into reg1, looping 15 times and printing 13 times instead of 10 and 8.
Or we could write anything we’d like and the program will be re-run with it as
the first instruction. Writing B00D
will cause the program to skip the loop
and immediately jump to printing hello world!
.