Skip to content

Conversation

@ltratt
Copy link
Member

@ltratt ltratt commented Jul 7, 2025

The main commit here is f9372b4.

ltratt added 2 commits July 7, 2025 08:49
This stuff mostly dates from when I was trying to get my head around
what such iterators should be. Some of the comments are of negligible
use; some are so blindingly obvious that I'm not sure why I ever put
them in!
@vext01 vext01 added this pull request to the merge queue Jul 7, 2025
Merged via the queue into softdevteam:master with commit 29e6231 Jul 7, 2025
2 checks passed
@eadf
Copy link
Contributor

eadf commented Jul 7, 2025

Maybe we should add:

impl<T: Debug + PrimInt> ExactSizeIterator for IterSetBits<'_, T> {}
impl<T: Debug + PrimInt> ExactSizeIterator for IterUnsetBits<'_, T> {}

While touching the iterators anyway..

It should be safe since the range is Range<usize>

@ltratt
Copy link
Member Author

ltratt commented Jul 7, 2025

I'm not totally sure of ExactSizeIterator's guarantees: note that we do change self.range, but we only ever make it smaller. If that is allowable, then I'd happily take a PR to implement ExactSizeIterator.

@eadf
Copy link
Contributor

eadf commented Jul 7, 2025

hm, maybe it's a bad idea. It might be an expensive operation that seldom is needed.

@ltratt
Copy link
Member Author

ltratt commented Jul 7, 2025

I don't think it would add expense. I don't know whether it's very useful, or not, though; but if it is useful, and it's safe for us to do so, I'm happy to see a PR for it.

@ratmice
Copy link
Collaborator

ratmice commented Jul 7, 2025

I'm not totally sure of ExactSizeIterator's guarantees: note that we do change self.range, but we only ever make it smaller. If that is allowable

Is that just the normal size reduction by one corresponding to the consuming of an item from the iterator?
Because that is expected for ExactSizeIterator, (also I see that Iter implements it, but IterSetBits and IterUnsetBits do not).

also out of curiousity are the iterators fused? (never return Some after returning None) it seemed like they might not be if that loop can fall-through in the Bits iterators.

  1. if they are this is potentially another trait that could be implemented
  2. Curiously reading through the ExactSizeIterator docs I'm uncertain the expected behavior for a non-fused iterator.

Since the docs kind of imply that the iterator should be fused

ExactSizeIterator::len

The implementation ensures that the iterator will return exactly len() more times a Some(T) value, before returning None.

So I'm curious what a non-fused iterator should return, the number of Some items remaining until the next None I suppose. (As opposed to the number of Some items remaining in total which in some cases might be more useful for sparse collections?).

Edit: initially my search failed me, but I did manage to find the following, which indicates it should return
the length until the first None.
https://users.rust-lang.org/t/whats-the-exactsizeiterator-meaning-for-a-non-fusediterator/101781

@eadf
Copy link
Contributor

eadf commented Jul 7, 2025

Regarding the size_hint() operation complexity it would be O(words).
word::count_ones() for words 'in between' and a bit mask + count_ones()for words at start and end.
I was worried that that was a cost any iteration had to pay, but it it seems to only apply if you do .collect() and explicit calls to size_hint(). I'll see what i can do for a PR.

And for fused iterators, i think IterSetBits should emulate

let field : Vec<bool> = ...
let range = 1..4;

field.iter()
    .enumerate()
    .skip(range.start)
    .take(range.end - range.start)
    .filter(|(_, &b)| b)
    .map(|(i, _)| i)

And that iterator is fused (to the best of my knowledge).

@ltratt
Copy link
Member Author

ltratt commented Jul 8, 2025

@ratmice makes a good point (which I hadn't considered). Here are the docs:

Many Iterators don’t know how many times they will iterate, but some do. If an iterator knows how many times it can iterate, providing access to that information can be useful.

That's not the case here. iter_set_bits has no idea how many times it will return Some other than it will be 0..=len() (both inclusive). So Iter can be ExactSizeIterator but IterSetBits and IterUnsetBits can't.

@eadf
Copy link
Contributor

eadf commented Jul 8, 2025

iter_set_bits has no idea how many times it will return Some

But if iter_set_bits is fused, then we can figure that out:

size_hint() just need to do something like this (pseudo code)

fn count_set_bits_in_range(&self, range: Range<usize>) -> usize {
    let start_word = range.start / 64;
    let end_word = range.end / 64;
    
    if start_word == end_word {
        // Range entirely within one word
        let mut word = self.words[start_word];
        let start_bit = range.start % 64;
        let end_bit = range.end % 64;
        
        // Shift out unwanted bits directly
        word <<= start_bit;           // Remove bits before start
        word >>= start_bit + (64 - end_bit);  // Remove bits after end
        return word.count_ones() as usize;
    }
    
    let mut count = 0;
    
    // First word: shift out bits before start_bit
    let start_bit = range.start % 64;
    let first_word = self.words[start_word] >> start_bit;
    count += first_word.count_ones() as usize;
    
    // Middle words (unchanged)
    for word_idx in (start_word + 1)..end_word {
        count += self.words[word_idx].count_ones() as usize;
    }
    
    // Last word: shift out bits after end_bit
    if end_word < self.words.len() {
        let end_bit = range.end % 64;
        let last_word = self.words[end_word] << (64 - end_bit);
        count += last_word.count_ones() as usize;
    }
    
    count
} 

The question is if we are willing to pay that O(number-of-words) cost.

@ltratt
Copy link
Member Author

ltratt commented Jul 8, 2025

I agree that it works, but I'm not sure there's a use case which is important enough to bear this (surprising) cost: I think most users would expect the cost to be O(1). [Another way of putting this is that the only way to know how far long the journey will be is to complete the journey then go back and tell everyone how long it will be :)]

@eadf
Copy link
Contributor

eadf commented Jul 8, 2025

Oh, the journey is not O(max bit range) but i get your point.

But on the other hand, if the Vob is used as a non-manifold-vertex indicator in a very large 3d mesh, and someone does
let my_bad_indices : Vec<usize> = aVob::iter_set_bits().colllect() they get a Vec of Mega/Gigabytes capacity instead of just a few bytes. But that's the story of any tool, there will always be ways to mess up and create a foot-gun.

However, one thing we get out of this that we should probably add the Fused trait and a test for "is iterator fused" otherwise the premise "behaves like a Vec<bool>" is broken.

@ltratt
Copy link
Member Author

ltratt commented Jul 8, 2025

Dumb question: how does ExactSizeIterator optimise vob::iter_set_bits().collect().

Good point about FusedIterator I must admit that I didn't even know this exists! I'll raise a PR shortly.

@eadf
Copy link
Contributor

eadf commented Jul 8, 2025

how does ExactSizeIterator optimise vob::iter_set_bits().collect()

.collect() first creates a Vec::with_capacity(vob.size_hint()) and then populates it from the iterator, so if the hint is large, the capacity is large.

@ltratt
Copy link
Member Author

ltratt commented Jul 8, 2025

I did not know that. Perhaps it's better not to implement size_hint? Will that cause it to preallocate 0 bytes?

@eadf
Copy link
Contributor

eadf commented Jul 8, 2025

We could always implement size_hint() as:

  fn size_hint(&self) -> (usize, Option<usize>) {
           // Conservative estimate - maybe small heuristic or just (0, None)
           let conservative = (self.vob.len() / 100).max(64);
           (0, Some(conservative))
    }

But i just had an idea, why not have the best of both worlds?

 pub fn iter_set_bits<const EXACT: bool = false, R>(&self, range: R) -> IterSetBits<T>
    where
        R: RangeBounds<usize>,
    {
    ...
    

So normally we just use the current, fast, variant with the same syntax as before.

If the user uses EXACT=true we calculate the exact size_hint().

We could not implement ExacSizeIterator in this scenario, as the return type should remain the same in both cases.
And the vob.iter_set_bits::<true,_>() syntax is bit awkward, but only used when really needed.

@ltratt
Copy link
Member Author

ltratt commented Jul 8, 2025

Is there much use in this if we can't implement ExactSizeIterator?

@eadf
Copy link
Contributor

eadf commented Jul 9, 2025

You're right, maybe it is not worth the code-bloat. At least if the worst over-capacity allocation cases are fixed with a more conservative size_hint().

@ltratt ltratt deleted the double_ended branch August 14, 2025 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants