Skip to content

Commit c151352

Browse files
committed
Dirty Flag Markdown file changes
Proofreading edits to the Dirty Flag Markdown file.
1 parent 91bb72d commit c151352

File tree

1 file changed

+49
-49
lines changed

1 file changed

+49
-49
lines changed

book/dirty-flag.markdown

Lines changed: 49 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
## Motivation
99

10-
"Flag" and "bit" are synonymous in programming: they both mean a single micron
10+
"Flag" and "bit" are synonymous in programming -- they both mean a single micron
1111
of data that can be in one of two states. We call those "true" and "false", or
1212
sometimes "set" and "cleared". I'll use all of these interchangeably. "Dirty
1313
bit" is an equally <span name="specific">common</span> name for this pattern,
@@ -24,32 +24,32 @@ bit](http://en.wikipedia.org/wiki/Dirty_bit).
2424

2525
Many games have something called a *scene graph*. This is a big data structure
2626
that contains all of the objects in the world. The rendering engine uses this to
27-
determine where on screen to draw stuff.
27+
determine where to draw stuff on the screen.
2828

2929
At its simplest, a scene graph is just a flat list of objects. Each object has a
30-
model or some other graphic primitive, and a <span
30+
model, or some other graphic primitive, and a <span
3131
name="transform">*transform*</span>. The transform describes the object's
32-
position, rotation, and scale in the world. To move or turn an object, we just
32+
position, rotation, and scale in the world. To move or turn an object, we simply
3333
change its transform.
3434

3535
<aside name="transform">
3636

37-
The mechanics of *how* this transform is stored and manipulated is unfortunately
37+
The mechanics of *how* this transform is stored and manipulated are unfortunately
3838
out of scope here. The comically abbreviated summary is that it's a 4x4 matrix.
39-
You can make a single transform that combines two transforms -- for example
39+
You can make a single transform that combines two transforms -- for example,
4040
translating and then rotating an object -- by multiplying the two matrices.
4141

4242
How and why that works is left as an exercise for the reader.
4343

4444
</aside>
4545

4646
When the renderer draws an object, it takes the object's model, applies the
47-
transform to it, and then renders it there in the world. If we just had a scene
48-
*bag* and not a scene *graph* that would be it and life would be simple.
47+
transform to it, and then renders it there in the world. If we had a scene
48+
*bag* and not a scene *graph*, that would be it, and life would be simple.
4949

5050
However, most scene graphs are <span name="hierarchical">*hierarchical*</span>.
5151
An object in the graph may have a parent object that it is anchored to. In that
52-
case, its transform is relative to the *parent's* position, and isn't its
52+
case, its transform is relative to the *parent's* position and isn't its
5353
absolute position in the world.
5454

5555
For example, imagine our game world has a pirate ship at sea. Atop the ship's
@@ -70,7 +70,7 @@ This way, when a parent object moves, its children move with it automatically.
7070
If we change the local transform of the ship, the crow's nest, pirate, and
7171
parrot go along for the ride. It would be a total <span
7272
name="slide">headache</span> if, when the ship moved, we had to manually adjust
73-
the transforms of everything on it to keep them from sliding off.
73+
the transforms of all the objects on it to keep them from sliding off.
7474

7575
<aside name="slide">
7676

@@ -85,7 +85,7 @@ transform*. To render an object, we need to know its *world transform*.
8585

8686
### Local and world transforms
8787

88-
Calculating an object's world transform is pretty straightforward: you just walk
88+
Calculating an object's world transform is pretty straightforward -- you just walk
8989
its parent chain starting at the root all the way down to the object, combining
9090
transforms as you go. In other words, the parrot's world transform is:
9191

@@ -100,12 +100,12 @@ transforms are equivalent.
100100
</aside>
101101

102102
We need the world transform for every object in the world every frame, so even
103-
though it's just a handful of matrix multiplications per model, it's on the hot
103+
though there are only a handful of matrix multiplications per model, it's on the hot
104104
code path where performance is critical. Keeping them up to date is tricky
105105
because when a parent object moves, that affects the world transform of itself
106106
and all of its children, recursively.
107107

108-
The simplest approach is to just calculate transforms on the fly while
108+
The simplest approach is to calculate transforms on the fly while
109109
rendering. Each frame, we recursively traverse the scene graph starting at the
110110
top of the hierarchy. For each object, we calculate its world transform right
111111
then and draw it.
@@ -118,9 +118,9 @@ they haven't changed is a waste.
118118
### Cached world transforms
119119

120120
The obvious answer is to *cache* it. In each object, we store its local
121-
transform and its derived world transform. When we render, we just use the
121+
transform and its derived world transform. When we render, we only use the
122122
precalculated world transform. If the object never moves, the cached transform
123-
is always up to date and everything's happy.
123+
is always up-to-date and everything's happy.
124124

125125
When an object *does* move, the simple approach is to refresh its world
126126
transform right then. But don't forget the hierarchy! When a parent moves, we
@@ -149,7 +149,7 @@ by the renderer. We calculated the parrot's world transform *four* times, but it
149149
only got rendered once.
150150

151151
The problem is that a world transform may depend on several local transforms.
152-
Since we recalculate immediately each time *one* of those changes, we end up
152+
Since we recalculate immediately each time *one* of the transforms changes, we end up
153153
recalculating the same transform multiple times when more than one of the local
154154
transforms it depends on changes in the same frame.
155155

@@ -163,14 +163,14 @@ need it to render.
163163

164164
<aside name="decoupling">
165165

166-
It's interesting how much of software architecture is just intentionally
166+
It's interesting how much of software architecture is intentionally
167167
engineering a little slippage.
168168

169169
</aside>
170170

171171
To do this, we add a flag to each object in the graph. When the local transform
172172
changes, we set it. When we need the object's world transform, we check the
173-
flag. If it's set, we calculate the world transform then and clear the flag. The
173+
flag. If it's set, we calculate the world transform and then clear the flag. The
174174
flag represents, "Is the world transform out of date?" For reasons that aren't
175175
entirely clear, the traditional name for this "out-of-date-ness" is "dirty".
176176
Hence: *a dirty flag*.
@@ -180,8 +180,8 @@ example, the game ends up doing:
180180

181181
<img src="images/dirty-flag-update-good.png" alt="By deferring until all moves are done, we only recalculate once." />
182182

183-
That's the best you could hope to do: the world transform for each affected
184-
object is calculated exactly once. With just a single bit of data, this pattern
183+
That's the best you could hope to do -- the world transform for each affected
184+
object is calculated exactly once. With only a single bit of data, this pattern
185185
does a few things for us:
186186

187187
* It collapses modifications to multiple local transforms along an object's
@@ -212,7 +212,7 @@ Dirty flags are applied to two kinds of work: *calculation* and
212212
the derived data is time-consuming or otherwise costly.
213213

214214
In our scene graph example, the process is slow because of the amount of math to
215-
perform. When using this pattern for synchronization on the other hand, it's
215+
perform. When using this pattern for synchronization, on the other hand, it's
216216
more often that the derived data is *somewhere else* -- either on disk or over
217217
the network on another machine -- and simply getting it from point A to point B
218218
is what's expensive.
@@ -225,14 +225,14 @@ There are a couple of other requirements too:
225225
yourself always needing that derived data after every single modification
226226
to the primary data, this pattern can't help.
227227

228-
* **It should be hard to incrementally update.** Let's say the
228+
* **It should be hard to update incrementally.** Let's say the
229229
pirate ship in our game can only carry so much booty. We need to
230230
know the total weight of everything in the hold. We
231231
*could* use this pattern and have a dirty flag for the total weight. Every
232232
time we add or remove some loot, we set the flag. When we need the
233233
total, we add up all of the booty and clear the flag.
234234

235-
But a simpler solution is to just *keep a running total*. When we add or
235+
But a simpler solution is to *keep a running total*. When we add or
236236
remove an item, just add or remove its weight from the current total. If
237237
we can "pay as we go" like this and keep the derived data updated, then
238238
that's often a better choice than using this pattern and calculating the
@@ -255,7 +255,7 @@ hacks.
255255
Even after you've convinced yourself this pattern is a good fit, there are a few
256256
wrinkles that can cause you some discomfort.
257257

258-
### There is a cost to deferring too long
258+
### There is a cost to deferring for too long
259259

260260
This pattern defers some slow work until the result is actually needed, but when
261261
it is, it's often needed *right now*. But the reason we're using this pattern to
@@ -292,7 +292,7 @@ system too much by saving all the time.
292292

293293
This mirrors the different garbage collection strategies in systems that
294294
automatically manage memory. Reference counting frees memory the second it's no
295-
longer needed, but burns CPU time updating ref counts eagerly every time
295+
longer needed, but it burns CPU time updating ref counts eagerly every time
296296
references are changed.
297297

298298
Simple garbage collectors defer reclaiming memory until it's really needed, but
@@ -321,13 +321,13 @@ cache invalidation and naming things."
321321
</aside>
322322

323323
Miss it in one place, and your program will incorrectly use stale derived data.
324-
This leads to confused players and very hard to track down bugs. When you use
324+
This leads to confused players and bugs that are very hard to track down. When you use
325325
this pattern, you'll have to take care that any code that modifies the primary
326326
state also sets the dirty flag.
327327

328328
One way to mitigate this is by encapsulating modifications to the primary data
329329
behind some interface. If anything that can change the state goes through a
330-
single narrow API, you can set the dirty bit there and rest assured that it
330+
single narrow API, you can set the dirty flag there and rest assured that it
331331
won't be missed.
332332

333333
### You have to keep the previous derived data in memory
@@ -355,7 +355,7 @@ Like many optimizations, then, this pattern <span name="trade">trades</span>
355355
memory for speed. In return for keeping the previously calculated data in
356356
memory, you avoid having to recalculate it when it hasn't changed. This
357357
trade-off makes sense when the calculation is slow and memory is cheap. When
358-
you've got more time than memory on your hands, it's better to just calculate it
358+
you've got more time than memory on your hands, it's better to calculate it
359359
as needed.
360360

361361
<aside name="trade">
@@ -367,7 +367,7 @@ Conversely, compression algorithms make the opposite trade-off: they optimize
367367

368368
## Sample Code
369369

370-
Let's assume we've met the surprisingly long list of requirements, and see how
370+
Let's assume we've met the surprisingly long list of requirements and see how
371371
the pattern looks in code. As I mentioned before, the actual math behind
372372
transform matrices is beyond the humble aims of this book, so I'll just
373373
encapsulate that in a class whose implementation you can presume exists
@@ -391,13 +391,13 @@ parent. It has a mesh which is the actual graphic for the object. (We'll allow
391391
their children.) Finally, each node has a possibly empty collection of child
392392
nodes.
393393

394-
With this, a "scene graph" is really just a single root `GraphNode` whose
394+
With this, a "scene graph" is really only a single root `GraphNode` whose
395395
children (and grandchildren, etc.) are all of the objects in the world:
396396

397397
^code scene-graph
398398

399399
In order to render a scene graph, all we need to do is traverse that tree of
400-
nodes starting at the root and call the following function for each node's mesh
400+
nodes, starting at the root, and call the following function for each node's mesh
401401
with the right world transform:
402402

403403
^code render
@@ -422,8 +422,8 @@ parent chain to calculate world transforms because we calculate as we go while
422422
walking *down* the chain.
423423

424424
We calculate the node's world transform and store it in `world`, then we render
425-
the mesh if we have one. Finally, we recurse into the child nodes, passing in
426-
*this* node's world transform. All in all, it's nice tight, simple recursive
425+
the mesh, if we have one. Finally, we recurse into the child nodes, passing in
426+
*this* node's world transform. All in all, it's nice, tight, simple recursive
427427
method.
428428

429429
To draw an entire scene graph, we kick off the process at the root node:
@@ -432,29 +432,29 @@ To draw an entire scene graph, we kick off the process at the root node:
432432

433433
### Let's get dirty
434434

435-
So this code does the right thing -- renders all the meshes in the right place
435+
So this code does the right thing -- it renders all the meshes in the right place
436436
-- but it doesn't do it efficiently. It's calling `local_.combine(parentWorld)`
437437
on every node in the graph, every frame. Let's see how this pattern fixes that.
438438
First, we need to add two fields to `GraphNode`:
439439

440440
^code dirty-graph-node
441441

442-
The `world_` field caches the previously-calculated world transform, and
442+
The `world_` field caches the previously calculated world transform, and
443443
`dirty_`, of course, is the dirty flag. Note that the flag starts out `true`.
444-
When we create a new node, we haven't calculated it's world transform yet, so at
445-
birth it's already out of sync with the local transform.
444+
When we create a new node, we haven't calculated it's world transform yet. At
445+
birth, it's already out of sync with the local transform.
446446

447447
The only reason we need this pattern is because objects can *move*, so let's add
448448
support for that:
449449

450450
^code set-transform
451451

452452
The important part here is that it sets the dirty flag too. Are we forgetting
453-
anything? Right: the child nodes!
453+
anything? Right -- the child nodes!
454454

455455
When a parent node moves, all of its children's world coordinates are
456-
invalidated too. But here we aren't setting their dirty flags. We *could* do
457-
that, but that's recursive and slow. Instead we'll do something clever when we
456+
invalidated too. But here, we aren't setting their dirty flags. We *could* do
457+
that, but that's recursive and slow. Instead, we'll do something clever when we
458458
go to render. Let's see:
459459

460460
<span name="branch"></span>
@@ -464,12 +464,12 @@ go to render. Let's see:
464464
<aside name="branch">
465465

466466
There's a subtle assumption here that the `if` check is faster than a matrix
467-
multiply. Intuitively, you would think it is: surely testing a single bit is
467+
multiply. Intuitively, you would think it is; surely testing a single bit is
468468
faster than a bunch of floating point arithmetic.
469469

470470
However, modern CPUs are fantastically complex. They rely heavily on
471471
*pipelining* -- queueing up a series of sequential instructions. A branch like
472-
our `if` here can cause a *branch misprediction* and force the CPU lose cycles
472+
our `if` here can cause a *branch misprediction* and force the CPU to lose cycles
473473
refilling the pipeline.
474474

475475
The <a href="data-locality.html" class="pattern">Data Locality</a> chapter has
@@ -479,7 +479,7 @@ up like this.
479479
</aside>
480480

481481
This is similar to the original naïve implementation. The key changes are that
482-
we check to see if the node is dirty before calculating the world transform, and
482+
we check to see if the node is dirty before calculating the world transform and
483483
we store the result in a field instead of a local variable. When the node is
484484
clean, we skip `combine()` completely and use the old but still correct `world_`
485485
value.
@@ -489,8 +489,8 @@ be `true` if any node above this node in the parent chain was dirty. In much the
489489
same way that `parentWorld` updates the world transform incrementally as we
490490
traverse down the hierarchy, `dirty` tracks the dirtiness of the parent chain.
491491

492-
This lets us avoid having to actually recursively set each child's `dirty_` flag
493-
in `setTransform()`. Instead, we just pass the parent's dirty flag down to its
492+
This lets us avoid having to recursively set each child's `dirty_` flag
493+
in `setTransform()`. Instead, we pass the parent's dirty flag down to its
494494
children when we render and look at that too to see if we need to recalculate
495495
the world transform.
496496

@@ -520,15 +520,15 @@ This pattern is fairly specific, so there are only a couple of knobs to twiddle:
520520

521521
* *If the calculation is time-consuming, it can cause a noticeable pause.*
522522
Postponing the work until the player is expecting to see the result can
523-
affect their gameplay experience. Often, it's fast enough that this
523+
affect their gameplay experience. It's often fast enough that this
524524
isn't a problem, but if it is, you'll have to do the work earlier.
525525

526526
* **At well-defined checkpoints:**
527527

528-
Sometimes there is a point in time or the progression of the game where it's
528+
Sometimes, there is a point in time in the progression of the game where it's
529529
natural to do the deferred processing. For example,
530530
we may want to save the game only when the pirate sails into port. Or the
531-
sync point may not be part of the game mechanics. We may just want to hide the
531+
sync point may not be part of the game mechanics. We may want to hide the
532532
work behind a loading screen or a cut scene.
533533

534534
* *Doing the work doesn't impact the user experience.* Unlike the previous
@@ -558,7 +558,7 @@ This pattern is fairly specific, so there are only a couple of knobs to twiddle:
558558
</aside>
559559

560560
* *You can tune how often the work is performed.* By adjusting the timer
561-
interval you can ensure it happens as frequently (or infrequently) as
561+
interval, you can ensure it happens as frequently (or infrequently) as
562562
you want.
563563

564564
* *You can do more redundant work.* If the primary state only changes a

0 commit comments

Comments
 (0)