Lifter - librondo.so

The lifter class will take each instruction and translate it to an architecture independent function using the `LowLevelIL` api (https://api.binary.ninja/binaryninja.lowlevelil-module.html#binaryninja.lowlevelil.LowLevelILFunction) Following the same format as the disassembler, this class sets up a dictionary with each opcode correlating to the function used to lift it. ```python class CoolVMLifter(): def __init__(self): self.instructions = { 0: ["mov",self.mov], 1: ["push",self.push], 2: ["pop",self.pop], 3: ["sub",self.sub], 4: ["jnz",self.jnz], 5: ["jnzb",self.jnzb], 6: ["print",self.print], 7: ["read",self.read], 8: ["exit",self.exit], 9: ["xor",self.xor], 10:["ene",self.ene], } def lift(self,data,addr,il): instr = Instruction(data) mnem, func = self.instructions[instr.opcode] return func(instr,addr,il) ``` # Mov Operand 3 of the instruction is always a 1 byte value. Create an IL constant for it and set operand 1 (a register) to that value. ```python def mov(self,instr,addr,il): op3_const = il.const(1,instr.op3) il_mov = il.set_reg(1,instr.op1,op3_const) il.append(il_mov) ``` # Push BinaryNinja IL already implements stack push and pop so you can just call push with the correct IL register. ```python def push(self,instr,addr,il): il_reg = il.reg(1,instr.op2) il.append(il.push(1,il_reg)) ``` ## Lifted Example ```c char var_2_1 = 0x67 ``` # Pop For popping off the stack you have to first pop and then set the byte popped to an IL register. ```python def pop(self,instr,addr,il): il_pop = il.pop(1) il_set = il.set_reg(1,instr.op1,il_pop) il.append(il_set) ``` # Sub Grab the first two registers and subtract the first from the second, then set the result to `r4`. ```python def sub(self,instr,addr,il): il_op1 = il.reg(1,instr.op1) il_op2 = il.reg(1,instr.op2) il_sub = il.sub(1,il_op2,il_op1) il.append(il.set_reg(1,"r4",il_sub)) ``` # JNZ For jump if not zero instruction, grab the register `r4`, create a constant for `0` and then compare the two. Define a constant pointer for the target we will jump to if `r4` is not zero. For conditionals like `if_expr` we pass the expression that holds the conditional and then 2 labels, the true and false labels. The we can mark one of the labels, the IL expressions added after the marked label is what happens on one branch and then we can mark the second label, however we don't need to append any IL expressions after the false label. This is because of the true branch we want to jump but for the false branch we don't do anything except move to the next instruction. For the target IL constant, its important to make it the size 2 so we can have a jump in the address space that we based it to since 0x1000 would require 2 bytes to get to an instruction. ```python def jnz(self,instr,addr,il): il_reg_zero = il.reg(1,"r4") il_zero = il.const(1,0) target = il.const(2,addr+instr.op3) cond = il.compare_not_equal(1,il_reg_zero,il_zero) t = LowLevelILLabel() f = LowLevelILLabel() il.append(il.if_expr(cond,t,f)) il.mark_label(t) il.append(il.jump(target)) il.mark_label(f) ``` # JNZB Exactly the same as JNZ except the target address goes backwards. Add 4 to get to the next instruction and then go back. ```python def jnz(self,instr,addr,il): il_reg_zero = il.reg(1,"r4") il_zero = il.const(1,0) target = il.const(2,addr-instr.op3 + 4) cond = il.compare_not_equal(1,il_reg_zero,il_zero) t = LowLevelILLabel() f = LowLevelILLabel() il.append(il.if_expr(cond,t,f)) il.mark_label(t) il.append(il.jump(target)) il.mark_label(f) ``` # Print Both print and read implementations are pretty much the same, call an IL intrinsic with one input. Print uses operand 2. ```python def print(self,instr,addr,il): il_op = il.reg(1,instr.op2) il.append(il.intrinsic([],"print",[il_op])) ``` ## Lifted Example ```c print(0x79) ``` # Read Read uses operand 1, creates a temporary variable which is the output of the read instruction, then sets operand 1 to that temp register ```python def read(self,instr,addr,il): temp = LLIL_TEMP(il.temp_reg_count) temp_il = ILRegister(il.arch, temp) il.append(il.intrinsic([temp_il],"read",[])) il.append(il.set_reg(1,instr.op1,il.reg(1,temp))) ``` ## Lifted Example ```c r2_13 = read() ``` # Exit Exit uses an intrinsic with no parameters. Exit also has a `no_ret` experssion to tell binja there is nothing after this. ```python def exit(self,instr,addr,il): il.append(il.intrinsic([],"exit",[])) il.append(il.no_ret()) ``` ## Lifted Example ```c exit() ``` # Xor Xor is similar to mov, where we have both an IL register and an IL constant. ```python def xor(self,instr,addr,il): il_reg = il.reg(1,instr.op1) op3_const = il.const(1,instr.op3) il_xor = il.xor_expr(1,il_reg,op3_const) il_expr= il.set_reg(1,instr.op1,il_xor) il.append(il_expr) ``` ## Lifted Example ```c char r2_17 = r2_16 ^ 0xb ``` # ENE Exit if not equal performs like a conditional, with two branches, except instead of jumping we call the exit intrinsic. ```python def ene(self,instr,addr,il): il_op1 = il.reg(1,instr.op1) il_op2 = il.reg(1,instr.op2) cond = il.compare_not_equal(1,il_op1,il_op2) t = LowLevelILLabel() f = LowLevelILLabel() il.append(il.if_expr(cond,t,f)) il.mark_label(t) il.append(il.intrinsic([],"exit",[])) il.mark_label(f) ``` ## Lifted Example ```c if (0x7f != r2_17) { exit() } ``` # Code Can be found at [coolvm_binja/lifter.py at master · thisusernameistaken/coolvm_binja (github.com)](https://github.com/thisusernameistaken/coolvm_binja/blob/master/lifter.py) # Decompilation After lifting, we can see that the program pushes a string to the stack one character at a time and then prints each character out. It then pushes a key to the stack and for each character in the key it reads a byte and xors your input with 0xb and then compares it to the byte on the stack of the key. If it doesnt equal zero then it exits, otherwise it goes to the next check. By implementing a [[Workflow]] we can add another analysis phase to binaryninja. This phase can be used to detect multiple print intructions in a row and then outline them to a new function called `prints` with the entire string to make it easier to read. ```python int64_t sub_1000() __noreturn { char var_1 = 0xa char var_2 = 0x3f char var_3 = 0x64 char var_4 = 0x72 char var_5 = 0x6f char var_6 = 0x77 char var_7 = 0x73 char var_8 = 0x73 char var_9 = 0x61 char var_a = 0x70 char var_b = 0x20 char var_c = 0x65 char var_d = 0x68 char var_e = 0x74 char var_f = 0x20 char var_10 = 0x73 char var_11 = 0x27 char var_12 = 0x74 char var_13 = 0x61 char var_14 = 0x68 char var_15 = 0x57 char var_16 = 0x20 char var_17 = 0x2e char var_18 = 0x30 char var_19 = 0x2e char var_1a = 0x31 char var_1b = 0x20 char var_1c = 0x6e char var_1d = 0x6f char var_1e = 0x69 char var_1f = 0x73 char var_20 = 0x72 char var_21 = 0x65 char var_22 = 0x56 char var_23 = 0x20 char var_24 = 0x4d char var_25 = 0x56 char var_26 = 0x20 char var_27 = 0x6c char var_28 = 0x6f char var_29 = 0x6f char var_2a = 0x43 print(0x43) print(0x6f) print(0x6f) print(0x6c) print(0x20) print(0x56) print(0x4d) print(0x20) print(0x56) print(0x65) print(0x72) print(0x73) print(0x69) print(0x6f) print(0x6e) print(0x20) print(0x31) print(0x2e) print(0x30) print(0x2e) print(0x20) print(0x57) print(0x68) print(0x61) print(0x74) print(0x27) print(0x73) print(0x20) print(0x74) print(0x68) print(0x65) print(0x20) print(0x70) print(0x61) print(0x73) print(0x73) print(0x77) print(0x6f) print(0x72) print(0x64) print(0x3f) print(0xa) char var_1_1 = 0x76 char var_2_1 = 0x67 char var_3_1 = 0x64 char var_4_1 = 0x64 char var_5_1 = 0x68 char var_6_1 = 0x54 char var_7_1 = 0x72 char var_8_1 = 0x7f char var_9_1 = 0x7f char var_a_1 = 0x6e char var_b_1 = 0x79 char var_c_1 = 0x7b char var_d_1 = 0x54 char var_e_1 = 0x6e char var_f_1 = 0x79 char var_10_1 = 0x6a char var_11_1 = 0x54 char var_12_1 = 0x78 char var_13_1 = 0x66 char var_14_1 = 0x7d char var_15_1 = 0x70 char var_16_1 = 0x6d char var_17_1 = 0x7f char var_18_1 = 0x68 char var_19_1 = 0x7b if (0x7b != (read() ^ 0xb)) exit() if (0x68 != (read() ^ 0xb)) exit() if (0x7f != (read() ^ 0xb)) exit() if (0x6d != (read() ^ 0xb)) exit() if (0x70 != (read() ^ 0xb)) exit() if (0x7d != (read() ^ 0xb)) exit() if (0x66 != (read() ^ 0xb)) exit() if (0x78 != (read() ^ 0xb)) exit() if (0x54 != (read() ^ 0xb)) exit() if (0x6a != (read() ^ 0xb)) exit() if (0x79 != (read() ^ 0xb)) exit() if (0x6e != (read() ^ 0xb)) exit() if (0x54 != (read() ^ 0xb)) exit() if (0x7b != (read() ^ 0xb)) exit() if (0x79 != (read() ^ 0xb)) exit() if (0x6e != (read() ^ 0xb)) exit() if (0x7f != (read() ^ 0xb)) exit() if (0x7f != (read() ^ 0xb)) exit() if (0x72 != (read() ^ 0xb)) exit() if (0x54 != (read() ^ 0xb)) exit() if (0x68 != (read() ^ 0xb)) exit() if (0x64 != (read() ^ 0xb)) exit() if (0x64 != (read() ^ 0xb)) exit() if (0x67 != (read() ^ 0xb)) exit() if (0x76 != (read() ^ 0xb)) exit() char var_1_2 = 0xa char var_2_2 = 0x21 char var_3_2 = 0x74 char var_4_2 = 0x63 char var_5_2 = 0x65 char var_6_2 = 0x72 char var_7_2 = 0x72 char var_8_2 = 0x6f char var_9_2 = 0x63 char var_a_2 = 0x20 char var_b_2 = 0x73 char var_c_2 = 0x27 char var_d_2 = 0x74 char var_e_2 = 0x61 char var_f_2 = 0x68 char var_10_2 = 0x54 print(0x54) print(0x68) print(0x61) print(0x74) print(0x27) print(0x73) print(0x20) print(0x63) print(0x6f) print(0x72) print(0x72) print(0x65) print(0x63) print(0x74) print(0x21) print(0xa) exit() noreturn } ``` And with the [[Workflow]] as well as removing the dead instructions: ![[hlil_wf.png]] A simple snippet that allows you to highlight several "dead code" instructions: ```python start = current_selection[0] idx = current_function.get_llil_at(start).hlil.instr_index address = start while address < current_selection[1]: hlil = current_hlil[idx] var = hlil.vars[0].dead_store_elimination=2 address = hlil.address idx+=1 ```