Tightening all the screws

For the past week I have been exclusively focusing on optimizations.

Work in the 2.0 unstable builds has been progressing without much attention having been paid to performance profiling. However, it has become clear that even in very basic maps (like Doom E1M1) there were unacceptable FPS dips. I decided to get to the bottom of what was going on.

There are many things in Doomsday’s map rendering that are very redundant and/or overly CPU-focused. There is no getting around these inefficiencies without rewriting much of the renderer from the ground up, so one can do as much GPU-based processing as possible. However, after some profiling, I discovered that there were many simple issues with commonly used routines. Together these added up to quite a performance hit.

I ended up applying the following optimizations:

  • Avoid constructing and destructing objects. This was sometimes happening behind the scenes in the form of temporary variables and Unicode string conversions.
  • Avoid repeated memory allocations (more of a problem on Windows than macOS). For example, it is better to reuse previously allocated objects if they’re needed hundreds of times per frame. One particular offender here was arrays used during triangle mesh writing.
  • Avoid slow lookups from scripting-oriented data structures. Native code should use native structs particularly if the data is never modified (e.g., sprite definitions).
  • Use appropriate Qt containers. For instance, an array of pointers is usually faster to manage with a QVector instead of QList. Also, QHash is faster than QMap if the data doesn’t need to be ordered by key.
  • Avoid repeating costly lookups. Certain objects like sprite animations, materials and textures are looked up very often during a frame.
  • Make simple getters inline. The PIMPL idiom is great and all, but for the most heavily used getters, it helps to make the necessary data inlineable.
  • Avoid inefficient patterns. For instance, there should usually be a way to query an object in such a way that failure only leads to a null pointer instead of an exception.
  • Read-only operations should always happen via const APIs so that it is clear that no one needs to prepare for changes.

So, how much of an impact did all this have? Below are FPS measurements from my profiling location:

Build FPS
2129 47.5
2136 66

That is +39%. Naturally the bottlenecks in the renderer are somewhat different depending on the map and number of objects, but the improvement should be perceptible everywhere.