Reversing Wings Of Vi/Clickteam Fusion

In my last post, I mentioned how I had been reverse engineering Wings Of Vi for the past month. Part of this included digging into the decompilation output from Ghidra and figuring out how the game works. I left it out of my previous post because there was too much to discuss and postponed it for another post. Well, here’s that post. I was going to talk about my workflows and my overall journey to get where I am today, but I think that’d still be way too long. More importantly, I don’t even remember some of the earlier steps of my journey. Instead, I’ll be splitting this post into two parts: the first part will talk about what I discovered myself and the second part will talk about discoveries after I had some help.

Part 1: Solo Expedition

For the first couple of weeks, I spent almost all of my free time banging my head against the wall, making hypotheses on what some code did based on Ghidra decompilation output and iteratively refining my understanding. I went through a ton of the code and annotated what I thought things were doing, labelled function names, etc. until I was nearing my limit for what I could figure out. I’ll present what I found as I was nearing that limit. Keep in mind, I had no idea if any of this was really correct. After all, it’s not like there was any way I could truly verify any of this.

Firstly, the game logic is run by what I called UpdateObjects. The game would iterate over these UpdateObjects and do various things commanded by them. Here’s what a typical UpdateObject looked like:

typedef struct {
  short         size;  // add this to the address of the current UpdateObject to get the next
  short         code1; // index into a jump table. usually it's 2
  short         code2; // index into a second jump table.
  char          unknown[2];
  short         index; // index into an Entity[]
  unsigned char flags1;
  unsigned char flags2;
  // rest is unknown. no defined size for this struct because it's variable length.
  // minimum size is unknown, but is at least bigger than size up to flags2.
  // often, there are "Expression"s at either offset 0x14 or offset 0x16 though.
} UpdateObject;

Let’s go through how a typical UpdateObject might’ve been used. Using code1, a function from a jump table is called and the UpdateObject is passed in. Using code2, another function from another jump table is called and the UpdateObject is passed in. Let’s take a specific case where code1 = 2 and code2 = 3. The resulting function after the two jump tables will:

  • Find an Entity using the index
  • Compute some values using an Expression
  • Set the Entity->y to that value

Actually, it’s a bit more complicated than that, but let’s focus on the UpdateObjects for now. The idea is that code1 and code2 encode a certain procedure for the UpdateObject and the index is what’s acted upon the procedure. The rest of the UpdateObject contains data on values (either directly stored or computed) that the procedure needs to act.

Furthermore, UpdateObjects may not always be run. Let me give some pseudocode for what was happening:

void fun1(int, int) {
  while (someCondition) {
    UOChainHeader* header = ...;
    UpdateObject* uo = header->updateObjects[0];
    bool shouldExecute = ShouldExecuteJumpTable[uo->code1](uo);
    if (shouldExecute) {
      Globals->numUOInChain = header->numUOInChain;
      ProcessUOChain(uo);
    }
  }
}

void ProcessUOChain(UpdateObject* uo) {
  int i = 0;
  UpdateObject* curr = uo;
  while (i++ < Globals->numUOInChain) {
    UOJumpTable1[uo->code1](uo);
    uo = (UpdateObject*)((int)uo + uo->size);
  }
}

Basically, fun1 would iterate through UOChainHeaders, which contained contiguous chains of UpdateObjects. The first UpdateObject contained a condition (usually dynamically computed) that signals whether to continue executing the rest of the chain. It turns out this is actually slightly wrong, but it’s accurate enough for now.

Let’s go back to how values are dynamically computed in UpdateObjects because this is fairly important. An UpdateObject may contain what I eventually called an ExpressionHeader. To compute the result of an ExpressionHeader, a global variable is set to the pointer of the start of the ExpressionHeader and then a function that I called getMagicNumber is called. getMagicNumber iterates through the payload of the ExpressionHeader and maintains a stack in a another global variable and operates on the stack. The payload within an ExpressionHeader contained Expressions which were variable length and iterated through a size field contained in the first few bytes. Using codes in these Expressions, we’d index into jump tables and operate on the stack. For example, an Expression with certain codes may contain an additional payload with information on how to index into the Entity array and find a certain value like an x coordinate and push it onto the stack. An Expression with another code may have a payload to index into a list of global values and push it onto the stack. Another Expression may compute the succeeding Expression, which will push a value onto the stack and then pop two values from the stack, and then add them. Other Expressions may even be used to compute sub-Expressions.

That was maybe a bit confusing, so I think it would be really useful to illustrate with by transforming the Expressions into something more legible. We don’t want to think about the variable length aspect of Expressions and we’d like to transform the codes into something more semantically meaningful. Let’s represent some Expressions in JSON:

[
// Get a variable from this entity, push it onto the stack
{ name: "GetEntityVar", entityIndex: 56, varName: "health" },
// Compute the next Expression. Then, subtract the top two values on the stack
{ name: "Subtract" },
// Get this global variable, push it onto the stack
{ name: "GetGlobalVar", index: 12 }
]

Much clearer, don’t you think? Much of reverse engineering software is transforming data into something with meaningful semantics and this is no exception. Obviously, this is a lot easier said than done. You should’ve seen me bang my head against the wall for days trying to figure this single function out.

There are a lot more details on not only what I’ve talked about, but also things I haven’t talked about. I’m choosing to omit them because I think what I’ve discussed gives a clear enough picture of how the game operates at a high level: the game is actually like an interpreter. It processes directives through the form of UpdateObjects and Expressions and updates the game using those directives. This naturally leads us to ask: “Where are these directives?”

Part 2: Hitting my limit

So the hunt for where the UpdateObjects were stored began. The executable for the game was pretty large (90MB) and there wasn’t anything else in the installation directory, so I figured the data should be within the executable itself. Furthermore, I should be able to see where in the code this data is being loaded. I did find it, but it was part of a huge function that Ghidra had trouble decompiling and it seemed to do other things from a DLL that was also packed in the executable, extracted into a temp directory, loaded, and then used to transform the data.

Long story short, I was unable to figure out how to take the data from within the executable and transform it to what’s used during the game’s runtime. During my weeks reversing the game, I had realized that the game was made with a tool called Clickteam Fusion. Desperate, I searched for tools to help with reversing Clickteam Fusion games and I found a tool called Anaconda. Honestly, I had my doubts about this tool because it was really poorly organized: copies of files, executables directly in the Git tree, build artifacts floating around, etc. After wrangling with getting it installed and changing some settings, I actually managed get some output though. It gave me a file that to open in the Clickteam editor but it wouldn’t load correctly. I wasn’t planning on continuing my reversing efforts in the Clickteam editor, but I wanted to confirm that this tool wasn’t giving me completely bad output so I got into contact with one of the authors of Anaconda and he very graciously gave me some tips to get it loaded. And… it actually worked. Well, I was still getting some errors when loading the file and there were clearly artifacts, but it was recognizable. Great, now that I had confirmation it mostly worked, I started digging through the Anaconda source code to see how it worked its magic.

Wow. I was right about so many things. I mean, of course we had different names for the same thing, but it was clear that their Event referred to the UOChainHeader I talked about and their Actions corresponded to UpdateObjects. Hell, we both settled on the same name for Expressions! It’s hard to put into words how I felt reading that code. I had spent the better part of a month working blindly, unable to truly verify if any of my ideas were correct. Seeing confirmation made me so happy.

In order to make sure I understood how the game was loaded (at least the Events), I wrote a C program to read the file that closely followed the Anaconda code. I think my code does a pretty good job of documenting how Clickteam games pack their data, so go read my code if you’re interested in it.

The most trouble I had was figuring how to decoding and decompressing some parts of the data. In fact, I still don’t really understand how to decode some of the data, and neither does Anaconda. It just calls some C++ function that’s clearly the output of some decompiler (probably IDA). I couldn’t even manage to do that though. The decoding algorithm required a key that I was too lazy to figure out how to generate. I knew it was generated deterministically based on the contents of the executable, but I just hooked onto the key generation function and saved its output. Then, I tried to naively copy and paste the decoding algorithm from Ghidra’s output and use it in my function, but it just didn’t work. I tried to use Compiler Explorer and modifying the algorithm bit by bit to match the right assembly, but this was still really difficult because I don’t know what compiler was used, flags, etc. It would be really difficult to make it match. So I said screw it and just decided to copy/paste the assembly code as inline GCC assembly.

After wrestling with translating Intel assembly to AT&T assembly and compiling my program in 32-bit, I checksummed my data and confirmed this actually worked.

Anaconda contained another tool called Chowdren that was supposed to take a Clickteam game and transform it into C++. It didn’t work for Wings Of Vi even after I tried patching the errors that were popping up, but there was clearly something to it. I began to wonder how the author knew enough about the Clickteam runtime to build such a tool. I asked the author of Anaconda, but he said that he didn’t actually write Chowdren. I ended up finding another tool called CTFAK which seemed to be an updated rewrite of Anaconda alongside an active community working to decompile Clickteam games. I asked them if there was any documentation on the Clickteam runtime or if they knew anything about it and it turns out, the source code for the runtime is just sitting in the Clickteam Editor’s install directory.

What the fuck? Are you telling me I’ve been working tirelessly on decoding bytecode piece by piece when there was perfectly legible source code out there?

Well, okay. It’s not actually the source code for my runtime exactly, but it’s pretty damn close. It contained source code for the Android runtime in Java and XNA runtime in C#, so it wasn’t exactly the Windows runtime I was using, but all of the logic was clearly preserved when I compared the decompilation to the high level source. Furthermore, Clickteam published header files for some of their runtime structs. It’s not super easy to tell what the low-level structs look like from Java or C# code, so that supplementary material was extremely helpful. Furthermore, the person who gave this information even gave a CE structure file that represented most of the runtime headers!

Well, I’ll be damned. That person also told me that I should’ve immediately asked around communities when I start reverse engineering projects. I probably will take that advice in the future but for this project in particular, I’m glad I didn’t do that. Staring at the disassembly and decompilation for hours and figuring things out that way is a skill I wouldn’t have learned otherwise.

Now that I have the source code, how right was I about how the game ran? Pretty fuckin’ right! I’m a BEAST! Almost all of what I described above (before reading the source) was accurate. I didn’t describe a lot of other things, but those were accurate as well. Literally the only thing that I was wrong about was the conditions and it was a relatively minor mistake. It turns out, an Event (a.k.a a UOChainHeader) has multiple Conditions and Actions. I mistook Conditions and Actions for both being UpdateObjects. To be fair, they actually are the exact same, except a Condition has an extra short id at the end. That’s why I thought the Expression was either at offset 0x14 or 0x16. Additionally, I thought only a single Condition was checked before running the Actions. It turns out, there can be multiple Conditions within a single Event. The Ghidra output made that really unclear because there were tons of nesting, loops and gotos in the function.

Out of the things I regarded as having figured out, I was almost always correct. There were plenty of things that I just didn’t get around to figuring out though, mostly because it took so much time. Either way, this gave me a lot of confidence that my methods for reversing binaries were effective and that the biggest optimization to my methods currently is to get faster at it.

Conclusion

So what now? I’ve more or less satiated my desire to figure the game out. I mean, I basically have access to the source code.

I’ve been spending some time creating a modding API by allowing hooking onto certain functions (e.g. action handlers, condition evaluators, etc.), and giving access to functions in the game like looking up an entity or getting a global variable, but I don’t think this is a great way to mod Clickteam games ultimately. Sure, I can get things done like this, but intercepting an action handler and deciding whether it should be run is not that easy. You have to figure out if you have the right action, which requires you to understand what the action does. Doing anything more than just blocking an action from running can be really difficult. Since the game is essentially an interpreter, the best way to mod the game will probably be to modify what’s being interpreted. My interest ends here though. I’m interested in reverse engineering, not making a modding tool for an out-of-date, obscure game engine.

This post isn’t meant to be as informative as the previous post, so sorry if you made it all the way here and didn’t get much out of it. I just wanted to share my progress with low-level software reverse engineering. I’ve made strides in this area over the past month and a half, and I’m proud of how far I’ve come.

Leave a Reply

Your email address will not be published. Required fields are marked *