结合 ctf赛题 学习一下 unicorn 引擎的基础使用。
参考文章
unicorn中文文档
hxp CTF 2017 Fibonacci
运行该二进制程序时,我们发现打印flag
的速度越来越慢:
因此必须对该程序进行优化,在合理时间内打印出flag
利用ida_pro
反编译看一下程序:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| __int64 __fastcall main(int a1, char **a2, char **a3) { char *v3; int v4; __int64 v5; char v6; __int64 v7; char v8; int v10[7];
v3 = &unk_4007E1; v4 = 0; setbuf(stdout, 0LL); printf("The flag is: "); while ( 1 ) { LODWORD(v5) = 0; do { v10[0] = 0; FIB(v4 + v5, v10); v8 = v7; v5 = v7 + 1; } while ( v5 != 8 ); v4 += 8; if ( (v10[0] << v8) == v6 ) break; ++v3; _IO_putc(v6 ^ (LOBYTE(v10[0]) << v8), stdout); } _IO_putc('\n', stdout); return 0LL; }
|
再来看一下FIB
函数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
| __int64 __fastcall FIB(int a1, _DWORD *a2) { int v3; __int64 result; unsigned int v5; unsigned int v6;
if ( a1 ) { if ( a1 == 1 ) { result = FIB(0LL, a2); } else { v3 = FIB((a1 - 2), a2); result = v3 + FIB((a1 - 1), a2); } v5 = ((result - ((result >> 1) & 0x55555555)) >> 2) & 0x33333333; v6 = v5 + ((result - ((result >> 1) & 0x55555555)) & 0x33333333) + ((v5 + ((result - ((result >> 1) & 0x55555555)) & 0x33333333)) >> 4); *a2 ^= ((BYTE1(v6) & 0xF) + (v6 & 0xF) + ((((v6 >> 8) & 0xF0F0F) + (v6 & 0xF0F0F0F)) >> 16)) & 1; } else { *a2 ^= 1u; result = 1LL; } return result; }
|
我们看到程序在不断递归调用FIB
,而递归函数的缺点显而易见:
递归由于是函数调用自身,而函数调用是有时间和空间的消耗的:每一次函数调用,都需要在内存栈中分配空间以保存参数、返回地址以及临时变量,而往栈中压入数据和弹出数据都需要时间。->效率
递归中很多计算都是重复的,由于其本质是把一个问题分解成两个或者多个小问题,多个小问题存在相互重叠的部分,则存在重复计算,如fibonacci斐波那契数列
的递归实现。->效率
因此需要考虑优化问题,若选择重构代码则太过复杂,且易产生bug
,而使用unicorn
就能很好地避免这个问题。
1. 利用unicorn运行目标程序
首先先将需要的库导入:
1 2 3 4 5 6 7 8 9 10
| from unicorn import * from unicorn.x86_const import * import struct def read_file(name): with open(name,'rb') as f: return f.read() def u32(data): return struct.unpack("I",data)[0] def p32(num): return struct.pack("I",num)
|
记住python3
下二进制文件的打开形式为‘rb’,否则会报如下错误:
1
| UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 24: invalid start byte
|
首先实例化一个基于X86
架构的模式为64位
的模拟器:
1
| mu = Uc(UC_ARCH_X86, UC_MODE_64)
|
接着开辟两段空间,一段用于储存程序代码,另一段用于模拟栈,申请大小均为10241024(即*1MB):
1 2 3 4 5 6 7 8 9 10
| prog_base = 0x400000 prog_size = 1024*1024 stack_base = 0x0 stack_size = 1024*1024
mu.mem_map(prog_base, prog_size) mu.mem_map(stack_base, stack_size)
mu.mem_write(prog_base, read_file("./fibonacci")) mu.reg_write(UC_X86_REG_RSP, stack_base+stack_size-1)
|
设置程序的加载地址和结束地址,分别为0x4004E0
和0x400575
1 2 3
| start = 0x4004E0 end = 0x400575 mu.emu_start(start, end)
|
我们运行一下脚本,发现报错,说是读到非法的地址:
1 2 3 4 5 6
| Traceback (most recent call last): File "try.py", line 28, in <module> mu.emu_start(start, end) File "/home/fuzz/.local/lib/python3.8/site-packages/unicorn/unicorn.py", line 525, in emu_start raise UcError(status) unicorn.unicorn.UcError: Invalid memory read (UC_ERR_READ_UNMAPPED)
|
那么是走到哪一步报错了呢?我们需要添加hook
将成功执行的代码和其大小给打印出来:
1 2 3
| def hook_code(mu, address, size, user_data): print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size)) mu.hook_add(UC_HOOK_CODE, hook_code)
|
再次运行:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| fuzz@fuzz-virtual-machine:~/unicorn-ctf$ python3 try.py >>> Tracing instruction at 0x4004e0, instruction size = 0x1 >>> Tracing instruction at 0x4004e1, instruction size = 0x1 >>> Tracing instruction at 0x4004e2, instruction size = 0x2 >>> Tracing instruction at 0x4004e4, instruction size = 0x5 >>> Tracing instruction at 0x4004e9, instruction size = 0x2 >>> Tracing instruction at 0x4004eb, instruction size = 0x4 >>> Tracing instruction at 0x4004ef, instruction size = 0x7 Traceback (most recent call last): File "try.py", line 34, in <module> mu.emu_start(start, end) File "/home/fuzz/.local/lib/python3.8/site-packages/unicorn/unicorn.py", line 525, in emu_start raise UcError(status) unicorn.unicorn.UcError: Invalid memory read (UC_ERR_READ_UNMAPPED)
|
可以得知,程序在执行如下code
报错:
1
| .text:00000000004004EF mov rdi, cs:stdout ; stream
|
因为我们未在空间里引入glibc
,所以直接跳过这些函数就好了,除此之外还需跳过的code
有:
1 2 3
| .text:00000000004004F6 call _setbuf .text:0000000000400502 call _printf .text:000000000040054F mov rsi, cs:stdout ; fp
|
可以通过改写rip
进行跳过:
1
| mu.reg_write(UC_X86_REG_RIP, address+size)
|
当然我们对打印函数也应当进行处理:
1 2 3
| .text:0000000000400558 movsx edi, dil ; c .text:000000000040055C add rbp, 1 .text:0000000000400560 call __IO_putc
|
__IO_putc
将寄存器rdi
内的元素进行输出,我们可以对其进行改写:
1 2 3
| c = mu.reg_read(UC_X86_REG_RDI) print(chr(c)) mu.reg_write(UC_X86_REG_RIP, address+size)
|
因此hook_code
有:
1 2 3 4 5 6 7 8 9 10
| instructions_skip_list = [0x4004EF,0x4004F6,0x400502,0x40054F] def hook_code(mu, address, size, user_data):
if address in instructions_skip_list: mu.reg_write(UC_X86_REG_RIP, address+size) elif address == 0x400560: c = mu.reg_read(UC_X86_REG_RDI) print(chr(c)) mu.reg_write(UC_X86_REG_RIP, address+size)
|
再次执行:
1 2 3 4 5
| fuzz@fuzz-virtual-machine:~/unicorn-ctf$ python3 try.py h x p
|
我们发现打印flag
依旧很慢,因此就需要开始我们的优化工作
2. 优化程序,提高速度
很显然许多参数以及返回值一样的FIB
函数被重复调用了多次,因此我们需要对这些被重复调用过的函数进行记录,当参数一样时,不再经过中间的计算,而直接给出返回值。
如何保存这些成对的值?
- 在函数开始的时候,我们可以检查参数对应的值是否已经被dict记录
- 如果是,直接返回这个key-value就行,只需将返回值写入到
RAX
中,同时设置RIP
为RET
指令的值,退出这个函数。不能在fabonacci
函数内直接跳转到RET
,因为这条指令已经被HOOK了,所以我们跳转到main
中的ret
。
- 如果dict中没有出现参数和对应的值,将参数添加到dict中。
- 当退出函数的时候,保存返回值。可以从我们的栈结构中读取参数和返回值。
因此我们需要在执行到FIB
函数时进行判断,在FIB
返回的时候进行更新,构造如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
| Fib_entry = 0x400670 Fib_return = [0x4006F1, 0x400709]
stack = [] d = {}
def hook_code(mu, address, size, user_data):
if address in instructions_skip_list: mu.reg_write(UC_X86_REG_RIP, address+size) elif address == 0x400560: c = mu.reg_read(UC_X86_REG_RDI) print(chr(c)) mu.reg_write(UC_X86_REG_RIP, address+size) elif address == Fib_entry: arg0 = mu.reg_read(UC_X86_REG_RDI) r_rsi = mu.reg_read(UC_X86_REG_RSI) arg1 = u32(mu.mem_read(r_rsi,4))
if (arg0,arg1) in d: (ret_rax, ret_ref) = d[(arg0,arg1)] mu.reg_write(UC_X86_REG_RAX, ret_rax) mu.mem_write(r_rsi, p32(ret_ref)) mu.reg_write(UC_X86_REG_RIP,0x400582) else: stack.append((arg0,arg1,r_rsi)) elif address in Fib_return: (arg0, arg1, r_rsi) = stack.pop() ret_rax = mu.reg_read(UC_X86_REG_RAX) ret_ref = u32(mu.mem_read(r_rsi,4)) d[(arg0, arg1)] = (ret_rax, ret_ref)
|
再次运行脚本:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| fuzz@fuzz-virtual-machine:~/unicorn-ctf$ python3 exp.py h x p { F 1 b 0 n 4 c C i _ n u m Z _ 4 r 3 _ T 0 O _ 3 a 5 Y }
|
3. 完整EXP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
| from unicorn import * from unicorn.x86_const import *
import struct def read_file(name): with open(name,'rb') as f: return f.read() def u32(data): return struct.unpack("I",data)[0] def p32(num): return struct.pack("I",num)
mu = Uc(UC_ARCH_X86, UC_MODE_64)
prog_base = 0x400000 stack_base = 0x0 stack_size = 1024*1024
mu.mem_map(prog_base, 1024*1024) mu.mem_map(stack_base, stack_size)
mu.mem_write(prog_base, read_file("./fibonacci")) mu.reg_write(UC_X86_REG_RSP, stack_base+stack_size-1)
start = 0x4004E0 end = 0x400575
instructions_skip_list = [0x4004EF,0x4004F6,0x400502,0x40054F]
Fib_entry = 0x400670 Fib_return = [0x4006F1, 0x400709]
stack = [] d = {}
def hook_code(mu, address, size, user_data):
if address in instructions_skip_list: mu.reg_write(UC_X86_REG_RIP, address+size) elif address == 0x400560: c = mu.reg_read(UC_X86_REG_RDI) print(chr(c)) mu.reg_write(UC_X86_REG_RIP, address+size) elif address == Fib_entry: arg0 = mu.reg_read(UC_X86_REG_RDI) r_rsi = mu.reg_read(UC_X86_REG_RSI) arg1 = u32(mu.mem_read(r_rsi,4))
if (arg0,arg1) in d: (ret_rax, ret_ref) = d[(arg0,arg1)] mu.reg_write(UC_X86_REG_RAX, ret_rax) mu.mem_write(r_rsi, p32(ret_ref)) mu.reg_write(UC_X86_REG_RIP,0x400582) else: stack.append((arg0,arg1,r_rsi)) elif address in Fib_return: (arg0, arg1, r_rsi) = stack.pop() ret_rax = mu.reg_read(UC_X86_REG_RAX) ret_ref = u32(mu.mem_read(r_rsi,4)) d[(arg0, arg1)] = (ret_rax, ret_ref)
mu.hook_add(UC_HOOK_CODE, hook_code) mu.emu_start(start, end)
|