Sunday, December 29, 2024

Fixing a Bug the Hard Way (NME5 Part 1)

The day was April 2, 2024. Team SCU had just unveiled a new mystery download link for any and all who dare peek inside. The few brave souls who clicked open the executable file were greeted by not one, not two, but ten-thousand blades of grass blowing in the wind...


That's right, my game engine was now on Version 5.0! Woohoo! And following that announcement was of course a playable demo for Team SCU's stealth video game.

So what did the new version mean exactly? It meant I finally decided to increase the number to 5.0!

Oh, and there was a pretty huge change, too. Our story begins eight months prior...

Lost in the Woods

It was an ordinary Wednesday afternoon in Camp Troidia when three hikers journeyed into the woods. "We'll head this way. Just keep walking," said the leader.

Not long had passed before the hikers wondered if they were headed the right direction. Suddenly they heard a sequence of snapping sounds, growing more fierce then stopping with a heavy thud. It was as if a tree had fallen in the forest, and everyone was around to hear it! Yet, within the resulting plume of dust, there was no toppled timber in sight. "Just keep walking," said the leader.

Hours passed, and the party grew concerned. Where were all the birds and ferns that should be here? The hikers turned a corner and stopped with a jolt. There was a vast angular ravine beneath them, and they'd nearly stepped over the edge! But with the blink of an eye, the ravine was gone. The ground was flat as far as they could see. "Just keep walking," said the leader.

As the sun descended into the horizon, the hikers knew they were lost. In the final moments of light, they pulled out their Book of Maps. "That's strange. The cover just says 'of Maps' now. I guess the other word must've rubbed off in my pack." They flipped through the pages trying to find something relevant, but all the important panels had become scrambled and indecipherable.

"Wait, where's my arm?!" By now it was too late. Within seconds the hikers were consumed... by darkness!

When the search party arrived the next day, they found only two bodies. The leader was was nowhere in sight.

"Oh I'm right here," a mysterious voice interjected, startling the two sleeping hikers awake. "We're totally fine. I'm just invisible right now. Once in a blue moon, Troid's engine doesn't load an image properly for some reason, so weird visual artifacts can happen."

Yep. That's annoying. But the occurrence is almost too infrequent to care. Well, almost...

Finding the Trail

Building a new feature into my engine is a huge commitment, often spanning several months to a year, so I must always choose my next steps wisely. But in Summer 2023, there were suddenly too many important paths forward, and no deadlines to point me toward any in particular. I was feeling as lost as those fictional hikers...

So I organized my thoughts into a full summary of my engine, highlighting the good, the bad, and the ugly. The hope was that this outline would make it obvious to me how to proceed. Spoiler: it worked! Let's walk through some of the shortcomings to discover the pattern I saw:

"Making games in NME means mostly just writing code, with minimal support for visual editors."
"It'd be nice to have model animations back, using standard formats instead of my custom system."

It takes a lot of code to make a game scene work, and as the one who writes the code, I become the de facto author of many core aspects of the game. In other words, I'm wearing too many hats! The more hats I can offload to others, the better.

However, I'm very picky about data formats I will support in NME. For one, I never want developers to feel they must pay money for 3rd-party software to get the most out of my engine. This rules out the most popular 2D skeletal animation tools out there like Spline and Spriter Pro.

A sequence of forces and procedural reactions allow these abstract shapes to glide around believably. I think it's some kind of bear's head maybe? Who knows.

If anyone knows of a robust free solution for playing with that kind of data, let me know! For now all these 2D figures will continue to be animated with my code and/or with more hand-drawn frames by the artists. In my head I was pondering the idea of projecting 3D formats into 2D space so that we could leverage programs like Blender to make game assets. Hey, that reminds me of another thing...

"I believe NME has a lot to offer as a 3D engine, but certain modules restrict it to 2D for now."

If I wish to start supporting 3D models in my engine, I might as well go all in! NME isn't too far off from going full 3D, but it would certainly take some time. The biggest barrier would be my own understanding of graphics APIs, as NME 4.4 uses only the most basic OpenGL features possible to achieve its results. (Note: a graphics API is how games display things to the screen, and OpenGL is the one I've always used). Which reminds me of another thing...

"The graphics capabilities are not reaching their full potential. NME runs well on older hardware, but someday I'd like to take better advantage of modern GPU architectures. I'm behind the times."

Over two decades behind to be more specific! Well, in a sense. I'm somewhat ashamed to admit that all my games had been using something called Immediate Mode for everything.

Sorry, I cut some corners on this one...

What is Immediate Mode? Pretend we're in a restaurant. I'm a chef and you're a waiter. The customer just ordered a bowl of rice from their phone, so I call you over to deliver it to them.

Now, I could just scoop the rice into a bowl and hand it to you, but that would be too sophisticated! Instead, I reach into the cooker and start handing over one grain at a time. I've gotten weirdly fast at this, but you'll still need some patience. I don't provide you with a bowl, and I don't even tell you in advance how much rice was ordered, so you first have to guess what size receptacle to use.

If you guess too small, the customer has to receive their order in multiple batches. But guess too large and that oversized bowl could've been used for something else, so go do some dishes! Anyway that's what Immediate Mode is like for your graphics driver. Many grains. Much improvisation.

In my defense, NME's games don't exactly have much data to send, so Immediate Mode can actually make a lot of sense as it's very flexible. The first room of Prime 2D renders 1640 triangles (from 3258 vertices), which might sound like a lot, but nowadays a single "low-detail character" in a typical console game might already have 10,000-40,000 triangles (source).

Unfortunately going beyond OpenGL's Immediate Mode is a daunting task for a hobbyist like me, even if the concepts are simple to use once understood. Poorly-named terms like Vertex Array Objects, Vertex Buffer Objects, and Element Buffer Objects (all different things) can easily scare someone away. Still, there was no excuse to put it off anymore. By procrastinating on this next layer, I was preventing NME from displaying detailed 3D models or tons of particles, not to mention that Immediate Mode isn't even available on mobile devices.

Hey, speaking of other platforms...

"NME is Windows-only. Native support for other platforms (Apple, Linux, web) would be nice."

I've tried to keep NME as cross-platform as possible, so there are only a few key locations in the engine that still require Windows. One of these locations relates to spinning up multiple OpenGL graphics threads, for tasks like loading several images at the same time. OpenGL was never designed for multi-threading, so this aspect required me to read through incomplete documentation of "WGL functions", discover best practices through old or obscure message boards, and do some plain old trial and error. Which reminds me of another thing...

"There are two phantom bugs that are incredibly rare but have been haunting me for years."

One of those bugs... is the one where images don't always load correctly! When tens of thousands of players tried one of NME's games, the bug was bound to show up a few times.

Did I look into it? Absolutely. The problem happens only with images, so I meticulously studied every image-related shared-state access. I experimented with extra command flushes and synchronization guards, but I wasn't able to narrow it down. It's possible I was hitting an obscure behavior of OpenGL context sharing, WGL functions, or maybe just certain drivers or hardware, but still, I couldn't shake the feeling that it was somehow my fault. I had no idea how to tiptoe around it, other than making load times longer by reducing image submissions to a single thread, just to stop this once-in-a-blue-moon phenomenon. I didn't want to do that, and my time was spread thin, so the bug continued to haunt me.

But... given all of the above shortcomings of my engine, and the path I just walked through... there was another way. Another way to make everything better.

I could switch to Vulkan.

Beneath the Crust

Vulkan (Vk for short) is a modern alternative to OpenGL. All the crumbs for this fell into place so neatly. Working backward:

Vulkan was designed with multi-threading in mind, so my phantom bug would surely go away. I would need fewer OS-specific calls to initiate that, so the codebase would be a step closer to being cross-platform (assuming MoltenVK stays strong). I'd be forced out of Immediate Mode and pulled closer to present-day GPU architecture, now with fancy techniques like Compute Shaders available at my fingertips. NME would take a big step toward eventual 3D capabilities, which would also add future potential to connect with many standard game asset formats.

Don't get me wrong: OpenGL is fine. Its latest flavors are still relevant and can handle the above. But my particular old-school use of it finally rang clear in my head as the #1 thing holding my engine back. I had two options: learn more advanced OpenGL features, or switch to a newer API.

It wasn't a decision to make lightly. I'd be dropping support for older hardware, and if Vulkan lost its traction in the coming years I'd have taken a big misstep. It's significantly more fine-grained than OpenGL, but that's exactly the kind of control I wanted. It seemed I was ready for such a challenge. It was time for NetMission to overcome its greatest weakness and join the world of cutting-edge graphics!

Luckily this switch would probably only take me a month or two, right? Everything would be up and running in no time. What could go wrong?

TO BE CONTINUED IN PART 2...