-
Notifications
You must be signed in to change notification settings - Fork 61
struct Av1Block: narrow field types
#1460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…MODES` table with a `const fn` w/ a `match`
…f a `match` since it optimizes better
Note that `rav1d_msac_decode_symbol_adapt16` already truncates its return to `< 16`, so the check is also a no-op here.
… handle negatives
randomPoison
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, though I'm a bit worried about all of the unwraps added thanks to using InRange in various places. Are you sure that adding all those checks doesn't degrade performance? Can you confirm that they're optimized out?
kkysen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, though I'm a bit worried about all of the
unwraps added thanks to usingInRangein various places. Are you sure that adding all those checks doesn't degrade performance? Can you confirm that they're optimized out?
They generally should be. I'll go add the // Elided comments. Ones like Av1BlockInterRefIndex::new(N).unwrap() or Av1BlockInterRefIndex::new(N + rav1d_msac_decode_bool_adapt(...) as i8).unwrap() are elided since it knows N and the range of a bool. Some others using rav1d_msac_decode_symbol_adapt8 or rav1d_msac_decode_symbol_adapt16 are elided because they truncate to 8/16.
I think Av1BlockInterRefIndex::new(frame_hdr.skip_mode.refs[0]).unwrap(), doesn't, but we can also make frame_hdr.skip_mode.refs use an InRange type (didn't want to do too much just in this PR).
I also measured performance, and there's basically no difference (just noise, < 0.1% different).
The benchmark script found that the merge commit 8eb6932 (#951) was one of the largest perf regressions (+4.7% on 80 threads, +4.3% on 150 threads). While it touched a bunch of things, one significant part was
f.frame_thread.b, aDisjointMut<Vec<Av1Block>>.Av1Blockis currently 32 bytes, but could be 24 bytes. This PR narrows all of the field types (recursively) as much as possible so we know their more precise range. This will help us then re-arrange the layout (Av1Blockis not passed to asm) and get it down to 24 bytes (this PR itself is perf neutral, although it should remove a bunch of bounds checks). Plus, this PR just makes things more idiomatic. Moreover, I think it may be possible to switch fromDisjointMuttoRelaxedatomics forAv1Block. It should also help makeAv1Block: FromZeroes, which will let me optimize the.resizewithnew_zeroed_slice.