A bytecode file [extension .vvm or .vvm.gz] is a binary program or library module for the Vanilla Virtual Machine.
The Vanilla Virtual Machine operates [conceptually] on a higher level than say the JVM. It deals directly with things like local variables, types and closures - rather than requiring direct manipulation of the stack by VVM bytecode. This has several advantages:
- Code is denser because it is higher level
- This means files are smaller and [raw bytecode] programs are smaller in memory
- Compilers for arbitrary languages can determine module information from it's binary representation without having to parse it's source code.
- It can - at the VM level - be optimised to a greater extent based on information available only at runtime
- The specification is smaller, more understandable and easier to mantain
- It is simpler to create a compiler targeting VVM code
- It is simpler to create a new VVM
But poses several difficulties:
- Bytecode is more directly related to the source code, which is often undesirable in closed source projects
- Can easily be countered by automated code mangling - making the bytecode less understandable [harder to reverse-engineer] without effecting it's functionality or efficiency
- It must still be able to represent code compiled from a huge group of different languages
- This will simply add some complexity to the specification - it is not really easier or simpler to make such languages inter-operate on a lower level VM
- It is a more complicated process to optimise inside the VM
- This is more a switch of such complexity from compiler to VM than additional complexity
Each VVM bytecode file corresponds to exactly one source module/file. It is the equivalent of a .class file in Java. All binary values mentioned herein are little endian, unless otherwise specified, and any text is encoded as UTF-8.
If the file starts with a '#', the first line [up to \n] of the file must be ignored.
"VVM1" should be the first 4 characters in the file [after the optional script line]
The rest of the file conforms to a structure - similar to an XML document, though simpler. LISP users will be familiar with it's textual representation:
- (i like (poo in a (bucket)))
Brackets enclose a section, and the first member of each is the op-code or type of the section. For example, a function return operation with the argument 33 can be represented by (return (int32-constant 33)).
This textual list-based approach is used as input to the assembler and output from the disassembler, however it is not dense enough for distribution, so actual bytecode files are encoded as follows:
- [Single byte section type/opcode] [32-bit section size, if required by type] [the section's data or sub-sections]
Op-Codes and Sections table
|Byte Value||Size word?||Name||Description|
|0x00||Yes||module||Encapsulates the entire module|
|0x01||Yes||type||An abstract type definition|