Bad VM is a simple constrained virtual machine I made as a learning exercise. It is based entirely around a 2-byte word. Each operation is encoded in one word, each register holds one word, and the stack is a stack of words. Also it uses both stacks and registers, which I’ve read is not a very practical approach.

Motivation

I wanted to learn more about how virtual machines and high level languages work in a way that would let me play around with how it functions. I also wanted to provide myself with constraints to really think about how to provide high level functionality with limited resources. Some of the features were motivated by curiosity, such as “how can I write an application that can completely rewrite itself?”

The result is a very bad, very insecure, virtual machine that uses stack and registers and can’t multiply. It is fun, though. I plan on changing it in the future when I have more ideas to play with.

Design

The entire design was centered around uint16_t, or 2 byte words of data. The first nibble is the operation, as described in the table below. The second, third, and fourth nibbles are used to denote registers, immediate values, addresses or arguments for an operation. This limitation means we can have 16 operations (0x0 to 0xF), 16 registers (0x0 to 0xF) and 256 addresses (0x00 to 0xFF). Thus, the maximum stack depth is 256 because it indexes from 0x00 to 0xFF. The maximum program size is 256 instructions so that every operation is accessible by the CHNG operation and reachable by JMP operations.

operation nibble 0 nibble 1 nibble 2 nibble 3 notes
NOOP 0x0       do nothing
LOAD 0x1 reg1 imm imm load imm into reg1
ADD 0x2 reg1 reg2 reg3 reg1 = reg2 + reg3
SUB 0x3 reg1 reg2 reg3 reg1 = reg2 - reg3
CMP 0x4 reg1 reg2 reg3 same as SUB
CHNG 0x5 reg1 address address change instruction @address to reg1
PUSH 0x6 reg1     push reg1 onto the stack
POP 0x7 reg1     pop the stack into reg1
JMPL 0x8 reg1 address address jump pc to @address if value at reg1 < 0
JMPG 0x9 reg1 address address jump pc to @address if value at reg1 > 0
JMPE 0xA reg1 address address jump pc to @address if value at reg1 == 0
JMPU 0xB   address address jump pc to @address unconditionally
READ 0xC reg1     read 16 bits from FIFO into reg1
PRNS 0xD     0 to print char, 1 to print int print off the stack until a 0x0000 is reached
PRNT 0xE reg1     print chars from nibble 2 and 3 of program @reg1 until a 0x0000 is reached
HALT 0xF       exit the program

Assembler

The “assembler” is just a simple python script that takes hex characters from a text file and writes them out in binary to a new output file. The proper usage is:

$ python3 assemble.py inputFile outputFile

The script is pretty simple. Let’s take a look.

with open(sys.argv[1], "r") as f:
    data = "".join(line.rstrip()[0:4] for line in f)

This opens the input file, strips out all the whitespace, and concatenates all the operations together. It only takes the first 4 characters from each line, the rest of the characters on each line can be used as comments.

    if len(data) % 4 != 0:
        print("Error: input file contains incomplete or malformed operations")
        sys.exit(1)

If the number of characters isn’t a multiple of 4 then one of the commands is not properly formed and the script should exit.

    print("-----   PROGRAM DUMP   -----")
    print("INS   REG1  REG2  REG3   IMM")
    print("----  ----  ----  ----  ----")
    for i in range(0, len(data), 4):
        print(f"{op_pairs[data[i]]}   r{data[i+1]}    r{data[i+2]}"
              f"    r{data[i+3]}    #{data[i+2:i+4]}")

The commands are interpreted and dumped to the terminal for verification by the user. The op_pairs dictionary maps the operation nibble to its name.

    data = bytearray.fromhex(data)
    data[0::2], data[1::2] = data[1::2], data[0::2]  # swap word endianness

Finally, the result is actually assembled. The data is turned into a bytearray from the hex string first. Then, at least on my system, the bytes in each byte pair need to be swapped to be properly read by the VM. The resulting file can be given to the vm and run.

Examples

Assembler

You can assemble the example program by using the existing pipenv environment and compiling the demo program as follows.

$ cd vm/asm
$ pipenv shell
$ ./assemble.py demo.vm output.vm
-----   PROGRAM DUMP   -----
INS   REG1  REG2  REG3   IMM
----  ----  ----  ----  ----
LOAD   r1    r0    rA    #0A
LOAD   r2    r0    r1    #01
LOAD   rF    r0    r0    #00
LOAD   rE    r0    r0    #00
SUB    r0    r1    r2    #12
JMPE   r0    r0    rC    #0C
SUB    r0    r0    r2    #02
ADD    rD    rE    r0    #E0
PUSH   rF    r0    r0    #00
PUSH   rD    r0    r0    #00
PRNS   r0    r0    r1    #01
JMPG   r0    r0    r6    #06
LOAD   rA    r1    r2    #12
PRNT   rA    r0    r0    #00
READ   r3    r0    r0    #00
CHNG   r3    r0    r0    #00
JMPU   r0    r0    r0    #00
HALT   r0    r0    r0    #00
NOOP   r0    r6    r8    #68
NOOP   r0    r6    r5    #65
NOOP   r0    r6    rc    #6c
NOOP   r0    r6    rc    #6c
NOOP   r0    r6    rf    #6f
NOOP   r0    r2    r0    #20
NOOP   r0    r7    r7    #77
NOOP   r0    r6    rf    #6f
NOOP   r0    r7    r2    #72
NOOP   r0    r6    rc    #6c
NOOP   r0    r6    r4    #64
NOOP   r0    r2    r1    #21
NOOP   r0    r0    r0    #00

Example Program

With your program assembled, open up one terminal window and run the following command. The VM uses stderr for debug information so we will pipe it to /dev/null and get just the program output.

$ ./vm /tmp/fifo asm/output.vm 2>/dev/null
8
7
6
5
4
3
2
1

hello world!
waiting to read 16 bits from FIFO /tmp/fifo...

Now the program is waiting to receive two hex characters into the FIFO at /tmp/fifo. You can echo four characters into the pipe and they will be interpreted as binary data by the VM and read into the program. The code is converting ascii characters to their equivalent bytes, do not actually pipe binary data into the VM. Open up a second terminal window and enter the following command.

$ echo FFFF > /tmp/fifo

In this case we’re piping FFFF into the program which writes 0xFFFF as an instruction into index 0 of the program and then unconditionally jumps to index 0. Since we told it to write 0xFFFF the program will halt. We could also write 110F and the program will execute again instead loading 15 into reg1, looping 15 times and printing 13 times instead of 10 and 8. Or we could write anything we’d like and the program will be re-run with it as the first instruction. Writing B00D will cause the program to skip the loop and immediately jump to printing hello world!.

References