The lifter class will take each instruction and translate it to an architecture independent function using the `LowLevelIL` api (https://api.binary.ninja/binaryninja.lowlevelil-module.html#binaryninja.lowlevelil.LowLevelILFunction)
Following the same format as the disassembler, this class sets up a dictionary with each opcode correlating to the function used to lift it.
```python
class CoolVMLifter():
def __init__(self):
self.instructions = {
0: ["mov",self.mov],
1: ["push",self.push],
2: ["pop",self.pop],
3: ["sub",self.sub],
4: ["jnz",self.jnz],
5: ["jnzb",self.jnzb],
6: ["print",self.print],
7: ["read",self.read],
8: ["exit",self.exit],
9: ["xor",self.xor],
10:["ene",self.ene],
}
def lift(self,data,addr,il):
instr = Instruction(data)
mnem, func = self.instructions[instr.opcode]
return func(instr,addr,il)
```
# Mov
Operand 3 of the instruction is always a 1 byte value. Create an IL constant for it and set operand 1 (a register) to that value.
```python
def mov(self,instr,addr,il):
op3_const = il.const(1,instr.op3)
il_mov = il.set_reg(1,instr.op1,op3_const)
il.append(il_mov)
```
# Push
BinaryNinja IL already implements stack push and pop so you can just call push with the correct IL register.
```python
def push(self,instr,addr,il):
il_reg = il.reg(1,instr.op2)
il.append(il.push(1,il_reg))
```
## Lifted Example
```c
char var_2_1 = 0x67
```
# Pop
For popping off the stack you have to first pop and then set the byte popped to an IL register.
```python
def pop(self,instr,addr,il):
il_pop = il.pop(1)
il_set = il.set_reg(1,instr.op1,il_pop)
il.append(il_set)
```
# Sub
Grab the first two registers and subtract the first from the second, then set the result to `r4`.
```python
def sub(self,instr,addr,il):
il_op1 = il.reg(1,instr.op1)
il_op2 = il.reg(1,instr.op2)
il_sub = il.sub(1,il_op2,il_op1)
il.append(il.set_reg(1,"r4",il_sub))
```
# JNZ
For jump if not zero instruction, grab the register `r4`, create a constant for `0` and then compare the two. Define a constant pointer for the target we will jump to if `r4` is not zero. For conditionals like `if_expr` we pass the expression that holds the conditional and then 2 labels, the true and false labels. The we can mark one of the labels, the IL expressions added after the marked label is what happens on one branch and then we can mark the second label, however we don't need to append any IL expressions after the false label. This is because of the true branch we want to jump but for the false branch we don't do anything except move to the next instruction. For the target IL constant, its important to make it the size 2 so we can have a jump in the address space that we based it to since 0x1000 would require 2 bytes to get to an instruction.
```python
def jnz(self,instr,addr,il):
il_reg_zero = il.reg(1,"r4")
il_zero = il.const(1,0)
target = il.const(2,addr+instr.op3)
cond = il.compare_not_equal(1,il_reg_zero,il_zero)
t = LowLevelILLabel()
f = LowLevelILLabel()
il.append(il.if_expr(cond,t,f))
il.mark_label(t)
il.append(il.jump(target))
il.mark_label(f)
```
# JNZB
Exactly the same as JNZ except the target address goes backwards. Add 4 to get to the next instruction and then go back.
```python
def jnz(self,instr,addr,il):
il_reg_zero = il.reg(1,"r4")
il_zero = il.const(1,0)
target = il.const(2,addr-instr.op3 + 4)
cond = il.compare_not_equal(1,il_reg_zero,il_zero)
t = LowLevelILLabel()
f = LowLevelILLabel()
il.append(il.if_expr(cond,t,f))
il.mark_label(t)
il.append(il.jump(target))
il.mark_label(f)
```
# Print
Both print and read implementations are pretty much the same, call an IL intrinsic with one input. Print uses operand 2.
```python
def print(self,instr,addr,il):
il_op = il.reg(1,instr.op2)
il.append(il.intrinsic([],"print",[il_op]))
```
## Lifted Example
```c
print(0x79)
```
# Read
Read uses operand 1, creates a temporary variable which is the output of the read instruction, then sets operand 1 to that temp register
```python
def read(self,instr,addr,il):
temp = LLIL_TEMP(il.temp_reg_count)
temp_il = ILRegister(il.arch, temp)
il.append(il.intrinsic([temp_il],"read",[]))
il.append(il.set_reg(1,instr.op1,il.reg(1,temp)))
```
## Lifted Example
```c
r2_13 = read()
```
# Exit
Exit uses an intrinsic with no parameters. Exit also has a `no_ret` experssion to tell binja there is nothing after this.
```python
def exit(self,instr,addr,il):
il.append(il.intrinsic([],"exit",[]))
il.append(il.no_ret())
```
## Lifted Example
```c
exit()
```
# Xor
Xor is similar to mov, where we have both an IL register and an IL constant.
```python
def xor(self,instr,addr,il):
il_reg = il.reg(1,instr.op1)
op3_const = il.const(1,instr.op3)
il_xor = il.xor_expr(1,il_reg,op3_const)
il_expr= il.set_reg(1,instr.op1,il_xor)
il.append(il_expr)
```
## Lifted Example
```c
char r2_17 = r2_16 ^ 0xb
```
# ENE
Exit if not equal performs like a conditional, with two branches, except instead of jumping we call the exit intrinsic.
```python
def ene(self,instr,addr,il):
il_op1 = il.reg(1,instr.op1)
il_op2 = il.reg(1,instr.op2)
cond = il.compare_not_equal(1,il_op1,il_op2)
t = LowLevelILLabel()
f = LowLevelILLabel()
il.append(il.if_expr(cond,t,f))
il.mark_label(t)
il.append(il.intrinsic([],"exit",[]))
il.mark_label(f)
```
## Lifted Example
```c
if (0x7f != r2_17) {
exit()
}
```
# Code
Can be found at [coolvm_binja/lifter.py at master · thisusernameistaken/coolvm_binja (github.com)](https://github.com/thisusernameistaken/coolvm_binja/blob/master/lifter.py)
# Decompilation
After lifting, we can see that the program pushes a string to the stack one character at a time and then prints each character out.
It then pushes a key to the stack and for each character in the key it reads a byte and xors your input with 0xb and then compares it to the byte on the stack of the key. If it doesnt equal zero then it exits, otherwise it goes to the next check.
By implementing a [[Workflow]] we can add another analysis phase to binaryninja. This phase can be used to detect multiple print intructions in a row and then outline them to a new function called `prints` with the entire string to make it easier to read.
```python
int64_t sub_1000() __noreturn
{
char var_1 = 0xa
char var_2 = 0x3f
char var_3 = 0x64
char var_4 = 0x72
char var_5 = 0x6f
char var_6 = 0x77
char var_7 = 0x73
char var_8 = 0x73
char var_9 = 0x61
char var_a = 0x70
char var_b = 0x20
char var_c = 0x65
char var_d = 0x68
char var_e = 0x74
char var_f = 0x20
char var_10 = 0x73
char var_11 = 0x27
char var_12 = 0x74
char var_13 = 0x61
char var_14 = 0x68
char var_15 = 0x57
char var_16 = 0x20
char var_17 = 0x2e
char var_18 = 0x30
char var_19 = 0x2e
char var_1a = 0x31
char var_1b = 0x20
char var_1c = 0x6e
char var_1d = 0x6f
char var_1e = 0x69
char var_1f = 0x73
char var_20 = 0x72
char var_21 = 0x65
char var_22 = 0x56
char var_23 = 0x20
char var_24 = 0x4d
char var_25 = 0x56
char var_26 = 0x20
char var_27 = 0x6c
char var_28 = 0x6f
char var_29 = 0x6f
char var_2a = 0x43
print(0x43)
print(0x6f)
print(0x6f)
print(0x6c)
print(0x20)
print(0x56)
print(0x4d)
print(0x20)
print(0x56)
print(0x65)
print(0x72)
print(0x73)
print(0x69)
print(0x6f)
print(0x6e)
print(0x20)
print(0x31)
print(0x2e)
print(0x30)
print(0x2e)
print(0x20)
print(0x57)
print(0x68)
print(0x61)
print(0x74)
print(0x27)
print(0x73)
print(0x20)
print(0x74)
print(0x68)
print(0x65)
print(0x20)
print(0x70)
print(0x61)
print(0x73)
print(0x73)
print(0x77)
print(0x6f)
print(0x72)
print(0x64)
print(0x3f)
print(0xa)
char var_1_1 = 0x76
char var_2_1 = 0x67
char var_3_1 = 0x64
char var_4_1 = 0x64
char var_5_1 = 0x68
char var_6_1 = 0x54
char var_7_1 = 0x72
char var_8_1 = 0x7f
char var_9_1 = 0x7f
char var_a_1 = 0x6e
char var_b_1 = 0x79
char var_c_1 = 0x7b
char var_d_1 = 0x54
char var_e_1 = 0x6e
char var_f_1 = 0x79
char var_10_1 = 0x6a
char var_11_1 = 0x54
char var_12_1 = 0x78
char var_13_1 = 0x66
char var_14_1 = 0x7d
char var_15_1 = 0x70
char var_16_1 = 0x6d
char var_17_1 = 0x7f
char var_18_1 = 0x68
char var_19_1 = 0x7b
if (0x7b != (read() ^ 0xb))
exit()
if (0x68 != (read() ^ 0xb))
exit()
if (0x7f != (read() ^ 0xb))
exit()
if (0x6d != (read() ^ 0xb))
exit()
if (0x70 != (read() ^ 0xb))
exit()
if (0x7d != (read() ^ 0xb))
exit()
if (0x66 != (read() ^ 0xb))
exit()
if (0x78 != (read() ^ 0xb))
exit()
if (0x54 != (read() ^ 0xb))
exit()
if (0x6a != (read() ^ 0xb))
exit()
if (0x79 != (read() ^ 0xb))
exit()
if (0x6e != (read() ^ 0xb))
exit()
if (0x54 != (read() ^ 0xb))
exit()
if (0x7b != (read() ^ 0xb))
exit()
if (0x79 != (read() ^ 0xb))
exit()
if (0x6e != (read() ^ 0xb))
exit()
if (0x7f != (read() ^ 0xb))
exit()
if (0x7f != (read() ^ 0xb))
exit()
if (0x72 != (read() ^ 0xb))
exit()
if (0x54 != (read() ^ 0xb))
exit()
if (0x68 != (read() ^ 0xb))
exit()
if (0x64 != (read() ^ 0xb))
exit()
if (0x64 != (read() ^ 0xb))
exit()
if (0x67 != (read() ^ 0xb))
exit()
if (0x76 != (read() ^ 0xb))
exit()
char var_1_2 = 0xa
char var_2_2 = 0x21
char var_3_2 = 0x74
char var_4_2 = 0x63
char var_5_2 = 0x65
char var_6_2 = 0x72
char var_7_2 = 0x72
char var_8_2 = 0x6f
char var_9_2 = 0x63
char var_a_2 = 0x20
char var_b_2 = 0x73
char var_c_2 = 0x27
char var_d_2 = 0x74
char var_e_2 = 0x61
char var_f_2 = 0x68
char var_10_2 = 0x54
print(0x54)
print(0x68)
print(0x61)
print(0x74)
print(0x27)
print(0x73)
print(0x20)
print(0x63)
print(0x6f)
print(0x72)
print(0x72)
print(0x65)
print(0x63)
print(0x74)
print(0x21)
print(0xa)
exit()
noreturn
}
```
And with the [[Workflow]] as well as removing the dead instructions:
![[hlil_wf.png]]
A simple snippet that allows you to highlight several "dead code" instructions:
```python
start = current_selection[0]
idx = current_function.get_llil_at(start).hlil.instr_index
address = start
while address < current_selection[1]:
hlil = current_hlil[idx]
var = hlil.vars[0].dead_store_elimination=2
address = hlil.address
idx+=1
```