BinaryView - librondo.so

API Reference: https://api.binary.ninja/binaryninja.binaryview-module.html#binaryninja.binaryview.BinaryView The BinaryView is the loader that parses the binary file format, maps code and data regions, defines any types associated with the custom arch. The loader inherits from `BinaryView`, there are some members and functions that need to be implemented. In the `__init__` function, call `BinaryView`s init as well as setting a class variable `raw` which is the raw, unmapped `BinaryView`. ```python class CoolVMLoader(BinaryView): name = "coolvm" long_name = "coolvm loader" def __init__(self,data): super().__init__(self,data.file,data) self.raw = data ``` To get binja to know what loader to use for a given binary file, binja goes through all registered loaders and calls their `is_valid_for_data()` function. If this function returns `True` then the loader will be selectable and can be used to map the binary file. The data passed to this function is the raw `BinaryView`. With this, we can determine if the binary file is valid in a few ways like: - `data.read(offset,size)` Grab any bytes and compare to a certain header - `data.file.filename` - Grab the final name and compare the extension. In this case, the `CoolVM` file both ends in `.cool` and has a header of `COOL`. ```python ... @classmethod def is_valid_for_data(cls,data): return data.read(0,4) == b"COOL" ... ``` The last needed piece of a loader is the actual logic for mapping sections/segments. This is done in an `init()` function. # Parsing a header If the custom architecture has a specific header that you can parse to learn where sections/segments are mapped to or parsing a symbol table (like an ELF), you can use a `BinaryReader` class to more conviently parse the header to get the correct values. It can also be wise to define a type and apply this type to the header, for example this is what the ELF loader does. However, this is not needed for `CoolVM`. ![[elf_type.png]] To create a type, you can use the `StructureBuilder` API (https://api.binary.ninja/binaryninja.types-module.html?highlight=structurebuilder#binaryninja.types.StructureBuilder). ```python with StructureBuilder.builder(bv,"test_t") as struct: struct.packed = True struct.append(Type.array(Type.char(),0x4),"signature") struct.append(Type.int(8),"entry_point") ``` Which adds the struct to the `BinaryView` ```c struct test_t __packed { char signature[0x4]; int64_t entry_point; }; ``` You can then use `bv.define_data_var(offset,struct)` to apply the struct to an offset. To the grab the values within the struct, you can then do the following: ```python dv = bv.get_data_var_at(offset) dv.value['signature'] ``` `.value` returns a dictionary where the struct member name is the key and the bytes associated is the value. # Mapping Sections/Segments ## Segments In order to map data from the binary file to a virtual address we need to map the segment into memory. BinaryNinja has two functions for adding segments, `add_auto_segment` and `add_user_segment`. The difference between the two is that anything created with the `*user*` term can be undone where `*auto*` is permanent. So when creating the segments through the loader, its best to use `add_auto_segment`. A `BinaryView` requires at least one segment mapped. API Reference: https://api.binary.ninja/binaryninja.binaryview-module.html?highlight=add_auto_segment#binaryninja.binaryview.BinaryView.add_auto_segment The parameters required are: ```python add_auto_segment(start,length,data_start,data_length,flags) ``` - `start` is the virtual address the segment starts at. - `length` is the entire size of the segment. This can be larger then the actual data length and the extra bytes will be padded with nulls. - `data_start` is the offset in the raw binary file - `data_length` is the number of bytes in the raw file until the end of the segment - `flags` is one or more `SegmentFlag` https://api.binary.ninja/binaryninja.enums-module.html#binaryninja.enums.SegmentFlag ```python class SegmentFlag(enum.IntEnum): SegmentExecutable = 1 SegmentWritable = 2 SegmentReadable = 4 SegmentContainsData = 8 SegmentContainsCode = 16 SegmentDenyWrite = 32 SegmentDenyExecute = 64 ``` ## Sections API Reference: https://api.binary.ninja/binaryninja.binaryview-module.html?highlight=add_auto_section#binaryninja.binaryview.BinaryView.add_auto_section While sections are NOT required in order to load a custom file format, they can be pretty helpful not only to a reverser being able to understand where in a binary they are, but to Binjas auto-analysis as well. Specifying the `SectionSemantics` helps binja know how to treat the data there. For example if the section specifies `ReadOnlyCodeSectionSemantics` then binja knows the disassemble the bytes as instructions where as `ReadOnlyDataSectionSemantics` it will not. ```python class SectionSemantics(enum.IntEnum): DefaultSectionSemantics = 0 ReadOnlyCodeSectionSemantics = 1 ReadOnlyDataSectionSemantics = 2 ReadWriteDataSectionSemantics = 3 ExternalSectionSemantics = 4 ``` The parameters are: ```python add_auto_section(name,start,length,semantics) ``` For `CoolVM`, the following code bases the binary at 0x1000. ```python def init(self): self.add_auto_segment(0x1000,len(self.raw)-4,4,len(self.raw)-4,SegmentFlag.SegmentReadable|SegmentFlag.SegmentContainsCode|SegmentFlag.SegmentExecutable) self.add_auto_section(".code",0x1000,len(self.raw)-4,SectionSemantics.ReadOnlyCodeSectionSemantics) ``` After implementing the printStr [[Workflow]], which appends stack strings to the raw binary view file in order to map them to be used in decompilation. We add a delimited before writing the strings. This way in the loader we can know where the actual code ends and where the stack strings begin. ```python end = len(self.raw) if (b"_"*0x10) in self.raw[::]: end = self.raw[::].index(b"_"*0x10) ``` # Setting the Architecture Inside the loader is also where we specify the architecture, this will be the custom arch created [[Architecture Plugin]] ```python self.platform = Architecture['coolvm'].standalone_platform self.arch = Architecture['coolvm'] ``` The platform is a way to distinguish between things like linux or windows, since x86 linux has some differences to x86 windows (calling convetion, syscalls). In this case, we dont need to create multiple platforms so we will just use the default. # Endianness If your architecture is Big Endian then you will also need to tell this to binja in the `BinaryView`. ```python def perform_get_default_endianness(self): return Endianness.BigEndian ``` # Code Can be found at [coolvm_binja/view.py at master · thisusernameistaken/coolvm_binja (github.com)](https://github.com/thisusernameistaken/coolvm_binja/blob/master/view.py) ```python from binaryninja import ( BinaryView, Endianness SegmentFlag, SectionSemantics ) class CoolVMLoader(BinaryView): name = "coolvm" long_name = "coolvm loader" def __init__(self,data): super().__init__(self,data.file,data) self.raw = data def is_valid_for_data(cls,data): return data.read(0,4) == b"COOL" def perform_get_default_endianness(self): return Endianness.BigEndian def init(self): self.platform = Architecture['coolvm'].standalone_platform self.arch = Architecture['coolvm'] end = len(self.raw) if (b"_"*0x10) in self.raw[::]: end = self.raw[::].index(b"_"*0x10) self.add_auto_segment(0x1000,end-4,4,end-4,SegmentFlag.SegmentReadable|SegmentFlag.SegmentContainsCode|SegmentFlag.SegmentExecutable) self.add_auto_section(".code",0x1000,end-4,SectionSemantics.ReadOnlyCodeSectionSemantics) return True ```