A tale of model definition bug hunting

After updating all code that deals with DED Model definitions to use the new Record based storage, I was slightly disappointed — although not surprised — to see that loading up a resource pack and trying to use it would result in a crash.

This is a pattern that I’ve come to accept as the norm: after working on a set of significant changes, my initial design and plan are shown to be valid, but I have to spend a couple of hours tracking down bugs that, in the end, prove to be trivial. In other words, one can’t expect to write bug free code.

The typo

Since I started without knowing what the problem actually was, I essentially reviewed a large portion of the model setup code and as a nice bonus got reacquainted with this part of the engine. This in turn gave me several ideas how to streamline and clean it up: particularly the separation between the definitions and the runtime model data structure used by the renderer is too fuzzy; the renderer shouldn’t be looking up values from the definition namespace when it draws a model. If one puts effort into constructing a separate data structure that is runtime-optimized, it really should contain all the information that is needed at runtime.

The first problem I found was trivial: a typo in the name of a model definition variable, causing a fatal error. This is the penalty of adding dynamic behavior: errors can be detected only at runtime. Fortunately, typos are easy to fix, and with a little effort, names could (and should) be put behind string constants so the compiler ensures they’re correct.

The oversight

This fixed the crash, however now I was facing a different problem. All model animations were broken, with enemies shown as corpses sliding across the ground. This got me stumped for some time. Digging into the runtime model setup, I couldn’t find a single mistake in updating it to use the new APIs. Eventually I concluded that the error must be in the part of code that I had altered the most: the DED parser. Since model animations were broken, and they largely depend on copied definitions, perhaps the copy mechanism was the culprit?

In the old days, one could simply calculate the ordinal of a model definition by looking at the pointer addresses (since everything was stored in simple C POD structs), however with Doomsday Script objects that is not possible. Therefore, I had added a special __order__ variable to each definition, storing its ordinal. The second problem I found was that when making copies of model definitions, the __order__ variable was being copied along with the rest of the values, messing up the perceived order of the definitions. Fixing this was not difficult, as Record already provides a mechanism for handling either all members or only the “non-underscore” members — double underscores being a convention to mark variables as hidden/private members (idea borrowed from Python).

Triumphant after noticing and fixing this oversight, I again fired up the engine. The animations were still broken, with my fix having no apparent effect.

The lingering defect

I had already been printing out the parsed model definitions as text, and all the values seemed to check out. But after singling out the definitions of a particular monster, I noticed something weird: all its submodels were using the “corpse” frame. In fact, turned out all the copied model definitions shared identical submodels. This is why everything was sliding around as corpses: that’s what the submodel definitions were telling the engine.

Understanding why this was happening calls for a brief explanation of Doomsday Script. Data manipulated by scripts is stored in Record objects, which contain a set of Variables, each of which has a Value. Values may in turn refer to Records, to create hierarchical structures (subrecords). When a script expression is evaluated, the values of the identified variables are looked up and possibly duplicated. Duplication is essential because each variable owns its own value, and changing the value of one variable shouldn’t affect any other variable.

When one makes a function call in Doomsday Script, the parameters of the call are passed by value to the called function, unless the Value in question refers to a Record, in which case only a pointer to the record is passed. This is crucial because there is currently no way to force certain parameters to be passed by value and others by reference, and it is almost always more beneficial to just deal with references to records rathan than duplicating them in full. For instance, this way one can have a function that takes a record as a parameter and modifies the record in some way, without making any copies along the way.

What does this have to do with copying model definitions? The submodels of a model definition are stored in a Doomsday Script array, with each element of the array being a reference to a Record. When the model was copied, the submodels array was duplicated, however all the copied elements still point to the original model’s submodels.

Now, why would duplicating an array cause the contained subrecords to not be duplicated? This happened because the aforementioned “pass records by reference” mechanism was implemented incorrectly. Any time a RecordValue was being duplicated, the copy was made as a reference to the original record. This was breaking the behavior promised by the Value base class, which is to always make a perfect duplicate. When using polymorphism, like I’m doing with Doomsday Script Value classes, it is crucial that the derived classes behave in the way promised by the base class, as any third parties should remain agnostic of the actual class of the object and only manipulate it according to the assumptions made at the level of the base class.

Earlier, we had already applied a workaround that ensures that subrecords would be fully duplicated when making copies of Record instances. However, as seen here, RecordValue may also be used in arrays and dictionaries, and when these are duplicated, the special workaround wasn’t been applied.

To fix the issue properly, I made RecordValue duplicate itself as all other values do and only apply the pass-by-reference exception when evaluating expressions. This also made the previous workaround unnecessary.

And what do you know, the models were now working correctly!

This is what a model definition now looks like as a Record (at runtime):

   __order__: 479
       flags: 0
       group: 0
          id: 
   interMark: 0
  interRange: [ 0, 1 ]
         off: 0
      offset: [ 0, 0, 0 ]
      resize: 0
       scale: [ 1, 1, 1 ]
    selector: 0
shadowRadius: 0
    skinTics: 0
      sprite: 
 spriteFrame: 0
       state: MUMMY_LOOK1
         sub: [
                     alpha: 0
                 blendMode: 0
                  filename: Models:Actors/Golem/Golem.md2
                     flags: 0x20000
                     frame: idle01
                frameRange: 0
                    offset: [ 0, 0, 0 ]
                      parm: 0
               selSkinMask: 7
              selSkinShift: 0
                  selSkins: [ 1, 1, 1, 1, 1, 1, 0, 0 ]
                     shiny: 0
                shinyColor: [ 1, 1, 1 ]
                shinyReact: 1
                 shinySkin: 
                      skin: 0
              skinFilename: 
                 skinRange: 0 ]