API Reference: https://api.binary.ninja/binaryninja.architecture-module.html#binaryninja.architecture.Architecture
The architecture plugin is where we define things such as
- endianness
- max_instr_length
- default_int_size
- instr_alignment
- stack_pointer
```python
class CoolVMArch(Architecture):
name = "coolvm"
endianness = Endianness.BigEndian
default_int_size = 1
max_instr_size = 4
instr_alignment = 4
stack_pointer = "sp"
```
# Registers
We can also define the registers. In `CoolVm` the `init_program` function sets the registers to 0.
```c
int64_t init_program(struct program_struct* arg1)
{
arg1->sp = 0
arg1->sp = malloc(bytes: 0x400)
int64_t rax_5
if (arg1->sp != 0) {
arg1->reg1 = 0
arg1->reg2 = 0
arg1->reg3 = 0
arg1->reg4 = 0
arg1->exit = 0
rax_5 = 0
} else {
rax_5 = 1
}
return rax_5
}
```
We can see there are 4 general purpose registers, and exit register (if this value is 1 then exit), a stack pointer (with the stack space of 0x400 bytes), and while not shown in this function there is a pc register.
For each register, we will create a `RegisterInfo` object (https://api.binary.ninja/binaryninja.architecture-module.html?highlight=registerinfo#binaryninja.architecture.RegisterInfo) which defines the name, and size, as well as if its a sub-register. `CoolVM` however does not have any sub-registers.
```python
regs = {}
regs['sp'] = RegisterInfo("sp",1)
regs['pc'] = RegisterInfo("pc",1)
regs['exit'] = RegisterInfo("exit",1)
for x in range(1,5):
reg_name = f"r{x}"
regs[reg_name] = RegisterInfo(reg_name,1)
```
# Intrinsics
We can make the instructions `read`,`print`,and `exit` intrinsic so when binja decompiles these instructions they get treated as blackbox functions.
https://api.binary.ninja/binaryninja.architecture-module.html?highlight=registerinfo#binaryninja.architecture.IntrinsicInfo
An intrinsic can have inputs and outputs. In the case of `CoolVM`, `read` and `print` both take one argument as an input. `read` modifies the register in the input, however we don't need to specify that to binja. Lastly, we create a `printStr` intrinsic which is used in the [[Workflow]] that combines multiple 1 char `print` instructions into one `printStr` instruction.
```python
intrinsics = {
"read": IntrinsicInfo([Type.char()],[]),
"print":IntrinsicInfo([Type.char()],[]),
"exit": IntrinsicInfo([],[]),
"printStr": IntrinsicInfo([],[])
}
```
# Disassembling
Next, we can create an object of out disassembler defined in [[Disassembler]] in the `__init__` function of the class. The disassembler will be used in both disassembling and lifting of instructions.
```python
def __init__(self):
self.disassembler = CoolVMDisassembler()
```
Binja has two functions related to disassembling:
- `get_instruction_info`
- `get_instruction_text`
## get_instruction_info
API Reference: https://api.binary.ninja/binaryninja.architecture-module.html?highlight=registerinfo#binaryninja.architecture.InstructionInfo
An instruction info holds the information about the size of the instruction and any branching. Since `CoolVM` has fixed instruction length of 4 we can hard code that. The disassembler returns two variables, `tokens` and `branch_conds`. The tokens arent used in this function, however the branch conditions are. `branch_conds` is a list of `BranchInfo` objects, created by us. The branch types are as follows:
```python
class BranchType(enum.IntEnum):
UnconditionalBranch = 0
FalseBranch = 1
TrueBranch = 2
CallDestination = 3
FunctionReturn = 4
SystemCall = 5
IndirectBranch = 6
ExceptionBranch = 7
UnresolvedBranch = 127
UserDefinedBranch = 128
```
Some branch types do not use a target (like `FunctionReturn`)
```python
class BranchInfo:
def __init__(self,_type,target=None):
self.type = _type
self.target = target
```
So the final code for the function is:
```python
def get_instruction_info(self,data,addr):
_, branch_conds = self.disassembler.disas(data,addr)
instr_info = InstructionInfo(4)
for branch_info in branch_conds:
if branch_info.target is not None:
instr_info.add_branch(branch_info.type,branch_info.target)
else:
instr_info.add_branch(branch_info.type)
return instr_info
```
## get_instruction_text
This function is how binja displays the tokens in the linear and graph view, this will be the actual text of the instruction as well as the type of each token. However, our disassembler will do the heavy lifting and return the correct tokens.. The second var that this function returns should be the instruction size, which for `CoolVM` is always `4`.
```python
def get_instruction_text(self,data,addr):
tokens,_ = self.disassembler.disas(data,addr)
return tokens, 4
```
# Lifting
Lifting is how you go from diassembly to decompilation. In binja, you write the basic `lowlevelil` associated for each instruction. Binja will propogate that information up to `MediumLevelIL` and `HighLevelIL` as well as providing `Pseudo-C`. The actual lifting is implemented in our [[Lifter]], however, the Arch class has a function that will call our lifter so we will also need to add the lifter class to out `__init__` function.
## get_instruction_low_level_il
Calling the lifter and then return the instruction size.
```python
def get_instruction_low_level_il(self,data,addr,il):
self.lifter.lift(data,addr,il)
return 4
```
# Code
Can be found at [coolvm_binja/arch.py at master · thisusernameistaken/coolvm_binja (github.com)](https://github.com/thisusernameistaken/coolvm_binja/blob/master/arch.py)
```python
from binaryninja import (
Architecture,
Endianness,
RegisterInfo
)
from .disassembler import CoolVMDisassembler
from .disassembler import CoolVMLifter
class CoolVMArch(Architecture):
name = "coolvm"
endianness = Endianness.BigEndian
default_int_size = 1
max_instr_size = 4
instr_alignment = 4
stack_pointer = "sp"
regs = {}
regs['sp'] = RegisterInfo("sp",1)
regs['pc'] = RegisterInfo("pc",1)
regs['exit'] = RegisterInfo("exit",1)
for x in range(1,5):
reg_name = f"r{x}"
regs[reg_name] = RegisterInfo(reg_name,1)
intrinsics = {
"read": IntrinsicInfo([Type.char()],[]),
"print": IntrinsicInfo([Type.char()],[]),
"exit": IntrinsicInfo([],[]),
"printStr": IntrinsicInfo([],[])
}
def __init__(self):
self.disassembler = CoolVMDisassembler()
self.lifter = CoolVMLifter()
def get_instruction_info(self,data,addr):
_, branch_conds = self.disassembler.disas(data,addr)
instr_info = InstructionInfo(4)
for branch_info in branch_conds:
if branch_info.target is not None:
instr_info.add_branch(branch_info.type,branch_info.target)
else:
instr_info.add_branch(branch_info.type)
return instr_info
def get_instruction_text(self,data,addr):
tokens,_ = self.disassembler.disas(data,addr)
return tokens, 4
def get_instruction_low_level_il(self,data,addr,il):
self.lifter.lift(data,addr,il)
return 4
```