77
88## Motivation
99
10- "Flag" and "bit" are synonymous in programming: they both mean a single micron
10+ "Flag" and "bit" are synonymous in programming -- they both mean a single micron
1111of data that can be in one of two states. We call those "true" and "false", or
1212sometimes "set" and "cleared". I'll use all of these interchangeably. "Dirty
1313bit" is an equally <span name =" specific " >common</span > name for this pattern,
@@ -24,32 +24,32 @@ bit](http://en.wikipedia.org/wiki/Dirty_bit).
2424
2525Many games have something called a * scene graph* . This is a big data structure
2626that contains all of the objects in the world. The rendering engine uses this to
27- determine where on screen to draw stuff.
27+ determine where to draw stuff on the screen .
2828
2929At its simplest, a scene graph is just a flat list of objects. Each object has a
30- model or some other graphic primitive, and a <span
30+ model, or some other graphic primitive, and a <span
3131name="transform">* transform* </span >. The transform describes the object's
32- position, rotation, and scale in the world. To move or turn an object, we just
32+ position, rotation, and scale in the world. To move or turn an object, we simply
3333change its transform.
3434
3535<aside name =" transform " >
3636
37- The mechanics of * how* this transform is stored and manipulated is unfortunately
37+ The mechanics of * how* this transform is stored and manipulated are unfortunately
3838out of scope here. The comically abbreviated summary is that it's a 4x4 matrix.
39- You can make a single transform that combines two transforms -- for example
39+ You can make a single transform that combines two transforms -- for example,
4040translating and then rotating an object -- by multiplying the two matrices.
4141
4242How and why that works is left as an exercise for the reader.
4343
4444</aside >
4545
4646When the renderer draws an object, it takes the object's model, applies the
47- transform to it, and then renders it there in the world. If we just had a scene
48- * bag* and not a scene * graph* that would be it and life would be simple.
47+ transform to it, and then renders it there in the world. If we had a scene
48+ * bag* and not a scene * graph* , that would be it, and life would be simple.
4949
5050However, most scene graphs are <span name =" hierarchical " >* hierarchical* </span >.
5151An object in the graph may have a parent object that it is anchored to. In that
52- case, its transform is relative to the * parent's* position, and isn't its
52+ case, its transform is relative to the * parent's* position and isn't its
5353absolute position in the world.
5454
5555For example, imagine our game world has a pirate ship at sea. Atop the ship's
@@ -70,7 +70,7 @@ This way, when a parent object moves, its children move with it automatically.
7070If we change the local transform of the ship, the crow's nest, pirate, and
7171parrot go along for the ride. It would be a total <span
7272name="slide">headache</span > if, when the ship moved, we had to manually adjust
73- the transforms of everything on it to keep them from sliding off.
73+ the transforms of all the objects on it to keep them from sliding off.
7474
7575<aside name =" slide " >
7676
@@ -85,7 +85,7 @@ transform*. To render an object, we need to know its *world transform*.
8585
8686### Local and world transforms
8787
88- Calculating an object's world transform is pretty straightforward: you just walk
88+ Calculating an object's world transform is pretty straightforward -- you just walk
8989its parent chain starting at the root all the way down to the object, combining
9090transforms as you go. In other words, the parrot's world transform is:
9191
@@ -100,12 +100,12 @@ transforms are equivalent.
100100</aside >
101101
102102We need the world transform for every object in the world every frame, so even
103- though it's just a handful of matrix multiplications per model, it's on the hot
103+ though there are only a handful of matrix multiplications per model, it's on the hot
104104code path where performance is critical. Keeping them up to date is tricky
105105because when a parent object moves, that affects the world transform of itself
106106and all of its children, recursively.
107107
108- The simplest approach is to just calculate transforms on the fly while
108+ The simplest approach is to calculate transforms on the fly while
109109rendering. Each frame, we recursively traverse the scene graph starting at the
110110top of the hierarchy. For each object, we calculate its world transform right
111111then and draw it.
@@ -118,9 +118,9 @@ they haven't changed is a waste.
118118### Cached world transforms
119119
120120The obvious answer is to * cache* it. In each object, we store its local
121- transform and its derived world transform. When we render, we just use the
121+ transform and its derived world transform. When we render, we only use the
122122precalculated world transform. If the object never moves, the cached transform
123- is always up to date and everything's happy.
123+ is always up-to- date and everything's happy.
124124
125125When an object * does* move, the simple approach is to refresh its world
126126transform right then. But don't forget the hierarchy! When a parent moves, we
@@ -149,7 +149,7 @@ by the renderer. We calculated the parrot's world transform *four* times, but it
149149only got rendered once.
150150
151151The problem is that a world transform may depend on several local transforms.
152- Since we recalculate immediately each time * one* of those changes, we end up
152+ Since we recalculate immediately each time * one* of the transforms changes, we end up
153153recalculating the same transform multiple times when more than one of the local
154154transforms it depends on changes in the same frame.
155155
@@ -163,14 +163,14 @@ need it to render.
163163
164164<aside name =" decoupling " >
165165
166- It's interesting how much of software architecture is just intentionally
166+ It's interesting how much of software architecture is intentionally
167167engineering a little slippage.
168168
169169</aside >
170170
171171To do this, we add a flag to each object in the graph. When the local transform
172172changes, we set it. When we need the object's world transform, we check the
173- flag. If it's set, we calculate the world transform then and clear the flag. The
173+ flag. If it's set, we calculate the world transform and then clear the flag. The
174174flag represents, "Is the world transform out of date?" For reasons that aren't
175175entirely clear, the traditional name for this "out-of-date-ness" is "dirty".
176176Hence: * a dirty flag* .
@@ -180,8 +180,8 @@ example, the game ends up doing:
180180
181181<img src =" images/dirty-flag-update-good.png " alt =" By deferring until all moves are done, we only recalculate once. " />
182182
183- That's the best you could hope to do: the world transform for each affected
184- object is calculated exactly once. With just a single bit of data, this pattern
183+ That's the best you could hope to do -- the world transform for each affected
184+ object is calculated exactly once. With only a single bit of data, this pattern
185185does a few things for us:
186186
187187 * It collapses modifications to multiple local transforms along an object's
@@ -212,7 +212,7 @@ Dirty flags are applied to two kinds of work: *calculation* and
212212the derived data is time-consuming or otherwise costly.
213213
214214In our scene graph example, the process is slow because of the amount of math to
215- perform. When using this pattern for synchronization on the other hand, it's
215+ perform. When using this pattern for synchronization, on the other hand, it's
216216more often that the derived data is * somewhere else* -- either on disk or over
217217the network on another machine -- and simply getting it from point A to point B
218218is what's expensive.
@@ -225,14 +225,14 @@ There are a couple of other requirements too:
225225 yourself always needing that derived data after every single modification
226226 to the primary data, this pattern can't help.
227227
228- * ** It should be hard to incrementally update.** Let's say the
228+ * ** It should be hard to update incrementally .** Let's say the
229229 pirate ship in our game can only carry so much booty. We need to
230230 know the total weight of everything in the hold. We
231231 * could* use this pattern and have a dirty flag for the total weight. Every
232232 time we add or remove some loot, we set the flag. When we need the
233233 total, we add up all of the booty and clear the flag.
234234
235- But a simpler solution is to just * keep a running total* . When we add or
235+ But a simpler solution is to * keep a running total* . When we add or
236236 remove an item, just add or remove its weight from the current total. If
237237 we can "pay as we go" like this and keep the derived data updated, then
238238 that's often a better choice than using this pattern and calculating the
@@ -255,7 +255,7 @@ hacks.
255255Even after you've convinced yourself this pattern is a good fit, there are a few
256256wrinkles that can cause you some discomfort.
257257
258- ### There is a cost to deferring too long
258+ ### There is a cost to deferring for too long
259259
260260This pattern defers some slow work until the result is actually needed, but when
261261it is, it's often needed * right now* . But the reason we're using this pattern to
@@ -292,7 +292,7 @@ system too much by saving all the time.
292292
293293This mirrors the different garbage collection strategies in systems that
294294automatically manage memory. Reference counting frees memory the second it's no
295- longer needed, but burns CPU time updating ref counts eagerly every time
295+ longer needed, but it burns CPU time updating ref counts eagerly every time
296296references are changed.
297297
298298Simple garbage collectors defer reclaiming memory until it's really needed, but
@@ -321,13 +321,13 @@ cache invalidation and naming things."
321321</aside >
322322
323323Miss it in one place, and your program will incorrectly use stale derived data.
324- This leads to confused players and very hard to track down bugs . When you use
324+ This leads to confused players and bugs that are very hard to track down. When you use
325325this pattern, you'll have to take care that any code that modifies the primary
326326state also sets the dirty flag.
327327
328328One way to mitigate this is by encapsulating modifications to the primary data
329329behind some interface. If anything that can change the state goes through a
330- single narrow API, you can set the dirty bit there and rest assured that it
330+ single narrow API, you can set the dirty flag there and rest assured that it
331331won't be missed.
332332
333333### You have to keep the previous derived data in memory
@@ -355,7 +355,7 @@ Like many optimizations, then, this pattern <span name="trade">trades</span>
355355memory for speed. In return for keeping the previously calculated data in
356356memory, you avoid having to recalculate it when it hasn't changed. This
357357trade-off makes sense when the calculation is slow and memory is cheap. When
358- you've got more time than memory on your hands, it's better to just calculate it
358+ you've got more time than memory on your hands, it's better to calculate it
359359as needed.
360360
361361<aside name =" trade " >
@@ -367,7 +367,7 @@ Conversely, compression algorithms make the opposite trade-off: they optimize
367367
368368## Sample Code
369369
370- Let's assume we've met the surprisingly long list of requirements, and see how
370+ Let's assume we've met the surprisingly long list of requirements and see how
371371the pattern looks in code. As I mentioned before, the actual math behind
372372transform matrices is beyond the humble aims of this book, so I'll just
373373encapsulate that in a class whose implementation you can presume exists
@@ -391,13 +391,13 @@ parent. It has a mesh which is the actual graphic for the object. (We'll allow
391391their children.) Finally, each node has a possibly empty collection of child
392392nodes.
393393
394- With this, a "scene graph" is really just a single root ` GraphNode ` whose
394+ With this, a "scene graph" is really only a single root ` GraphNode ` whose
395395children (and grandchildren, etc.) are all of the objects in the world:
396396
397397^code scene-graph
398398
399399In order to render a scene graph, all we need to do is traverse that tree of
400- nodes starting at the root and call the following function for each node's mesh
400+ nodes, starting at the root, and call the following function for each node's mesh
401401with the right world transform:
402402
403403^code render
@@ -422,8 +422,8 @@ parent chain to calculate world transforms because we calculate as we go while
422422walking * down* the chain.
423423
424424We calculate the node's world transform and store it in ` world ` , then we render
425- the mesh if we have one. Finally, we recurse into the child nodes, passing in
426- * this* node's world transform. All in all, it's nice tight, simple recursive
425+ the mesh, if we have one. Finally, we recurse into the child nodes, passing in
426+ * this* node's world transform. All in all, it's nice, tight, simple recursive
427427method.
428428
429429To draw an entire scene graph, we kick off the process at the root node:
@@ -432,29 +432,29 @@ To draw an entire scene graph, we kick off the process at the root node:
432432
433433### Let's get dirty
434434
435- So this code does the right thing -- renders all the meshes in the right place
435+ So this code does the right thing -- it renders all the meshes in the right place
436436-- but it doesn't do it efficiently. It's calling ` local_.combine(parentWorld) `
437437on every node in the graph, every frame. Let's see how this pattern fixes that.
438438First, we need to add two fields to ` GraphNode ` :
439439
440440^code dirty-graph-node
441441
442- The ` world_ ` field caches the previously- calculated world transform, and
442+ The ` world_ ` field caches the previously calculated world transform, and
443443` dirty_ ` , of course, is the dirty flag. Note that the flag starts out ` true ` .
444- When we create a new node, we haven't calculated it's world transform yet, so at
445- birth it's already out of sync with the local transform.
444+ When we create a new node, we haven't calculated it's world transform yet. At
445+ birth, it's already out of sync with the local transform.
446446
447447The only reason we need this pattern is because objects can * move* , so let's add
448448support for that:
449449
450450^code set-transform
451451
452452The important part here is that it sets the dirty flag too. Are we forgetting
453- anything? Right: the child nodes!
453+ anything? Right -- the child nodes!
454454
455455When a parent node moves, all of its children's world coordinates are
456- invalidated too. But here we aren't setting their dirty flags. We * could* do
457- that, but that's recursive and slow. Instead we'll do something clever when we
456+ invalidated too. But here, we aren't setting their dirty flags. We * could* do
457+ that, but that's recursive and slow. Instead, we'll do something clever when we
458458go to render. Let's see:
459459
460460<span name =" branch " ></span >
@@ -464,12 +464,12 @@ go to render. Let's see:
464464<aside name =" branch " >
465465
466466There's a subtle assumption here that the ` if ` check is faster than a matrix
467- multiply. Intuitively, you would think it is: surely testing a single bit is
467+ multiply. Intuitively, you would think it is; surely testing a single bit is
468468faster than a bunch of floating point arithmetic.
469469
470470However, modern CPUs are fantastically complex. They rely heavily on
471471* pipelining* -- queueing up a series of sequential instructions. A branch like
472- our ` if ` here can cause a * branch misprediction* and force the CPU lose cycles
472+ our ` if ` here can cause a * branch misprediction* and force the CPU to lose cycles
473473refilling the pipeline.
474474
475475The <a href =" data-locality.html " class =" pattern " >Data Locality</a > chapter has
@@ -479,7 +479,7 @@ up like this.
479479</aside >
480480
481481This is similar to the original naïve implementation. The key changes are that
482- we check to see if the node is dirty before calculating the world transform, and
482+ we check to see if the node is dirty before calculating the world transform and
483483we store the result in a field instead of a local variable. When the node is
484484clean, we skip ` combine() ` completely and use the old but still correct ` world_ `
485485value.
@@ -489,8 +489,8 @@ be `true` if any node above this node in the parent chain was dirty. In much the
489489same way that ` parentWorld ` updates the world transform incrementally as we
490490traverse down the hierarchy, ` dirty ` tracks the dirtiness of the parent chain.
491491
492- This lets us avoid having to actually recursively set each child's ` dirty_ ` flag
493- in ` setTransform() ` . Instead, we just pass the parent's dirty flag down to its
492+ This lets us avoid having to recursively set each child's ` dirty_ ` flag
493+ in ` setTransform() ` . Instead, we pass the parent's dirty flag down to its
494494children when we render and look at that too to see if we need to recalculate
495495the world transform.
496496
@@ -520,15 +520,15 @@ This pattern is fairly specific, so there are only a couple of knobs to twiddle:
520520
521521 * * If the calculation is time-consuming, it can cause a noticeable pause.*
522522 Postponing the work until the player is expecting to see the result can
523- affect their gameplay experience. Often, it's fast enough that this
523+ affect their gameplay experience. It's often fast enough that this
524524 isn't a problem, but if it is, you'll have to do the work earlier.
525525
526526* ** At well-defined checkpoints:**
527527
528- Sometimes there is a point in time or the progression of the game where it's
528+ Sometimes, there is a point in time in the progression of the game where it's
529529 natural to do the deferred processing. For example,
530530 we may want to save the game only when the pirate sails into port. Or the
531- sync point may not be part of the game mechanics. We may just want to hide the
531+ sync point may not be part of the game mechanics. We may want to hide the
532532 work behind a loading screen or a cut scene.
533533
534534 * * Doing the work doesn't impact the user experience.* Unlike the previous
@@ -558,7 +558,7 @@ This pattern is fairly specific, so there are only a couple of knobs to twiddle:
558558 </aside >
559559
560560 * * You can tune how often the work is performed.* By adjusting the timer
561- interval you can ensure it happens as frequently (or infrequently) as
561+ interval, you can ensure it happens as frequently (or infrequently) as
562562 you want.
563563
564564 * * You can do more redundant work.* If the primary state only changes a
0 commit comments