Skip to content

Conversation

@tarzanek
Copy link
Contributor

@tarzanek tarzanek commented Oct 7, 2025

No description provided.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Oct 7, 2025
@tarzanek
Copy link
Contributor Author

tarzanek commented Oct 7, 2025

see comments in #4570

public BitIntsHolder reduce(Collection<SuggestResultCollector> collectors) {
BitIntsHolder reduced = documentIds;
for (SuggestResultCollector collector : collectors) {
documentIds.or(collector.documentIds); //TODO fix as per https://github.com/apache/lucene/pull/766/files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be addressed for this PR ? I don't like adding TODOs in the code.

Copy link
Contributor

@idodeclare idodeclare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Lubos. A couple of comments and questions, mostly concerning paging handling

collector = TopScoreDocCollector.create(totalHits, Short.MAX_VALUE);
searcher.search(query, collector);
collectorManager = new TopScoreDocCollectorManager(totalHits, Short.MAX_VALUE);
hits = searcher.search(query, collectorManager).scoreDocs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new version is compelling this slightly intrusive revision in order to use CollectorManager instead of Collector, may we rework at this time to call search() only once whether paging is true or false? That is, avoid the double search() on line 210 and 217 when paging is false (and go back to only setting hits once)?

Also that way we could make sure search() only happens prior to the Statistics.report() on line 212; in the current code, the Statistics are not reflecting the search() happening on line 217.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also do:

        if (!paging && totalHits > hitsPerPage * cachePages) {

to really only do another search when we are over cache (which would really be a corner case or cli usage)

and of course we can have another log in case a full scan is needed, which with above should be really rare (or non existent, not many people go over all pages we cache for - which is 5 pages)

searcher.search(query, collector);
totalHits = collector.getTotalHits();
hits = searcher.search(query, collectorManager).scoreDocs;
totalHits = searcher.count(query);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formerly totalhits would have had its accuracy bounded by the Short.MAX_VALUE value of totalHitsThreshold (on the TopScoreDocCollector). For very large indexes, is there any possible performance penalty getting the entire, "fully accurate" int value of totalHits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should come from index statistics, so it should be fast even for large indexes as per docs
however if statistics will be bad or nonexistent it will be a full scan, yes

@tarzanek tarzanek force-pushed the lucene9 branch 2 times, most recently from 22d176d to 3b81806 Compare December 8, 2025 19:35
@lkosco
Copy link

lkosco commented Dec 9, 2025

@idodeclare would those changes work?

@tarzanek
Copy link
Contributor Author

tarzanek commented Dec 9, 2025

@vladak I guess if @idodeclare will be OK, we can merge this

@tarzanek tarzanek changed the title Lucene 9 Lucene 9 upgrade Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants