Skip to content

Conversation

@eadf
Copy link
Contributor

@eadf eadf commented Jul 13, 2025

Added some modest optimizations for .count() of all the iterators

Results

With a benchmark of 100_000 random bits, on my M1 Mac:

  • Vob<usize>::iter().count() took 31_472 ps, now it takes 314 ps (100 times faster)
  • Vob<usize>::iter_set_bits(..).count() took 178_860 ns, now it takes 198 ns (900 times faster)
  • Vob::iter_storage().count() has been updated too (now it just does math instead of counting).

Changes

Added a private count_set_bits() method that counts set bits in O(number of words)
Fixed the deprecated rand::Rng methods in the benchmark code
Added tests and benchmarks for the new iter*.count() functionality

Issues that needs resolve:

### move count_set_bits() (delay to next semver update)
We could move count_set_bits() into the iterator if IterSetBits and IterUnsetBits were merged into one BitsIterator. This could be done without any runtime penalties, and we'd skip some duplicated boilerplate:

pub struct BitsIterator<'a, T: 'a, const BIT_STATE: bool> {...}

We could do this is stages, for now we could simply copy&paste like this (to reduce risk of new bugs):

fn next(&mut self) -> Option<usize> {
    if BIT_STATE {
      // old IterSetBits::next() code
    } else {
      // old IterUnsetBits::next() code 
    }
}

Remove benchmark

Once the gain is confirmed, the new count_bits() benchmark can be removed because it simply tests range.count(), and that's like benchmarking std::Range.

Possibly add a new helper method

for n % bits_per_block::<T>(), just like block_offset() does n / bits_per_block::<T>()

### Tidy up the .filter(|_|true)
I use iterator.filter(|_| true).count() to trigger the old .count() behavior in the tests, maybe there are better ways to do that.

### count_ones_usize()
count_ones_usize() could safely use unwrap_unchecked() (or it could even safely use as)

(link #86)

@eadf eadf changed the title Optimized iter*().count() for the iterators. [DON'T MERGE YET] Optimized .count() for the iterators. [WIP] Jul 13, 2025
@ltratt
Copy link
Member

ltratt commented Jul 14, 2025

Could you update the benchmarks so that they all use the same units please? My pea brain misreads comparisons when the units change!

eadf added a commit to eadf/vob that referenced this pull request Jul 14, 2025
Added a private count_set_bits() method that counts set bits in O(number of words)
Fixed the deprecated rng methods in the benchmark
Added tests and benchmarks for the new .count() functionality

(softdevteam#87)
eadf added a commit to eadf/vob that referenced this pull request Jul 14, 2025
Added a private count_set_bits() method that counts set bits in O(number of words)
Fixed the deprecated rng methods in the benchmark
Added tests and benchmarks for the new .count() functionality

(softdevteam#87)
src/lib.rs Outdated
if range.start >= range.end {
return 0;
}
let start_word = block_offset::<T>(range.start);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In vob we use off not word. So this would be start_off (etc).

src/lib.rs Outdated

if start_word == end_word {
// Range entirely within one word
let word = self.vec[start_word];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vob calls this b.

src/lib.rs Outdated
/// Convenience function that calls T::count_ones() and converts the result to usize
/// (The conversion is always safe even if T is u128 and usize is u16)
#[inline(always)]
fn count_ones_usize<T: PrimInt>(value: T) -> usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to call the count_ones std lib method directly rather than use this wrapper. The try_from is mildly ugly in source code, but it'll be optimised away, so it's not worth worrying about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it'll be optimised away, so it's not worth worrying about.

I was sceptical about that, so i tested it out. It's actually true (on non-embedded platforms anyways)

@ltratt
Copy link
Member

ltratt commented Jul 15, 2025

I like the direction of this! My comments are mostly shallow and making sure this code feels in keeping with the rest of Vob's source, but the general idea clearly works as-is.

We could move count_set_bits() into the iterator if IterSetBits and IterUnsetBits were merged into one BitsIterator

This would be a semver breakage. It might be worth doing after the next release, but it would be nice to get the optimisation out to people first before we think about breaking the API.

I use iterator.filter(|_| true).count() to trigger the old .count() behavior in the tests, maybe there are better ways to do that.

I don't have a better idea!

count_ones_usize() could safely use unwrap_unchecked() (or it could even safely use as)

rustc will almost certainly optimise the unwrap away entirely on the platforms we know and care about (certainly, in all similar cases I've seen, and checked, it has done so, though obviously one can not 100% guarantee this), so the try_from is good, because if someone ever tries this on a platform with different integer constraints it will break in the right way.

eadf added a commit to eadf/vob that referenced this pull request Jul 15, 2025
Added a private count_set_bits() method that counts set bits in O(number of words)
Fixed the deprecated rng methods in the benchmark
Added tests and benchmarks for the new .count() functionality

(softdevteam#87)
@eadf
Copy link
Contributor Author

eadf commented Jul 15, 2025

This would be a semver breakage

Yes, you are right about that. But it is unlikely that anybody stores an IterSetBits as just that type, instead of a impl Iterator. But i agree, let's wait with that change.

rustc will almost certainly optimise the unwrap away entirely

Not only that, the try_from() is removed too. At least on 64bits machines.
I didn't know that, as you said - we learn every day.

@ltratt
Copy link
Member

ltratt commented Jul 15, 2025

But it is unlikely that anybody stores an IterSetBits as just that type, instead of a impl Iterator

I quite often store iterator types, so I know one user who'll be affected ;)

src/lib.rs Outdated
/// Counts the number of set bits.
/// This method assumes the range is processed with process_range()
fn count_set_bits(&self, range: Range<usize>) -> usize {
use std::convert::TryFrom;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go at the top of the file alongside the other imports.

@ltratt
Copy link
Member

ltratt commented Jul 15, 2025

One minor comment then I think we're ready to merge.

@eadf
Copy link
Contributor Author

eadf commented Jul 15, 2025

One minor comment then I think we're ready to merge.

And the other issues? i.e remove count_bits() bench and helper method for n % bits_per_block::<T>()?

eadf added a commit to eadf/vob that referenced this pull request Jul 15, 2025
Added a private count_set_bits() method that counts set bits in O(number of words)
Fixed the deprecated rng methods in the benchmark
Added tests and benchmarks for the new .count() functionality

(softdevteam#87)
eadf added a commit to eadf/vob that referenced this pull request Jul 15, 2025
Added a private count_set_bits() method that counts set bits in O(number of words)
Fixed the deprecated rng methods in the benchmark
Added tests and benchmarks for the new .count() functionality
@ltratt
Copy link
Member

ltratt commented Jul 16, 2025

I think we're ready. Please undraft?

eadf added a commit to eadf/vob that referenced this pull request Jul 16, 2025
Added a private count_set_bits() method that counts set bits in O(number of words)
Fixed the deprecated rng methods in the benchmark
Added tests and benchmarks for the new .count() functionality
@eadf
Copy link
Contributor Author

eadf commented Jul 16, 2025

Ok, I'll remove the benches i mentioned earlier and won't not add a modulus helper method.

@eadf eadf marked this pull request as ready for review July 16, 2025 17:09
eadf added a commit to eadf/vob that referenced this pull request Jul 16, 2025
Added a private count_set_bits() method that counts set bits in O(number of words)
Fixed the deprecated rng methods in the benchmark
Added tests and benchmarks for the new .count() functionality
Added a private count_set_bits() method that counts set bits in O(number of words)
Fixed the deprecated rng methods in the benchmark
Added tests and benchmarks for the new .count() functionality
@ltratt
Copy link
Member

ltratt commented Jul 16, 2025

Thanks!

@ltratt ltratt added this pull request to the merge queue Jul 16, 2025
@ltratt ltratt changed the title Optimized .count() for the iterators. [WIP] Optimized .count() for the iterators. Jul 16, 2025
Merged via the queue into softdevteam:master with commit 3c512e0 Jul 16, 2025
2 checks passed
@eadf eadf deleted the iter_count branch July 19, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants