-
Notifications
You must be signed in to change notification settings - Fork 791
Lucene 9 upgrade #4867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Lucene 9 upgrade #4867
Conversation
|
see comments in #4570 |
| public BitIntsHolder reduce(Collection<SuggestResultCollector> collectors) { | ||
| BitIntsHolder reduced = documentIds; | ||
| for (SuggestResultCollector collector : collectors) { | ||
| documentIds.or(collector.documentIds); //TODO fix as per https://github.com/apache/lucene/pull/766/files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be addressed for this PR ? I don't like adding TODOs in the code.
idodeclare
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Lubos. A couple of comments and questions, mostly concerning paging handling
| collector = TopScoreDocCollector.create(totalHits, Short.MAX_VALUE); | ||
| searcher.search(query, collector); | ||
| collectorManager = new TopScoreDocCollectorManager(totalHits, Short.MAX_VALUE); | ||
| hits = searcher.search(query, collectorManager).scoreDocs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the new version is compelling this slightly intrusive revision in order to use CollectorManager instead of Collector, may we rework at this time to call search() only once whether paging is true or false? That is, avoid the double search() on line 210 and 217 when paging is false (and go back to only setting hits once)?
Also that way we could make sure search() only happens prior to the Statistics.report() on line 212; in the current code, the Statistics are not reflecting the search() happening on line 217.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also do:
if (!paging && totalHits > hitsPerPage * cachePages) {
to really only do another search when we are over cache (which would really be a corner case or cli usage)
and of course we can have another log in case a full scan is needed, which with above should be really rare (or non existent, not many people go over all pages we cache for - which is 5 pages)
| searcher.search(query, collector); | ||
| totalHits = collector.getTotalHits(); | ||
| hits = searcher.search(query, collectorManager).scoreDocs; | ||
| totalHits = searcher.count(query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formerly totalhits would have had its accuracy bounded by the Short.MAX_VALUE value of totalHitsThreshold (on the TopScoreDocCollector). For very large indexes, is there any possible performance penalty getting the entire, "fully accurate" int value of totalHits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should come from index statistics, so it should be fast even for large indexes as per docs
however if statistics will be bad or nonexistent it will be a full scan, yes
22d176d to
3b81806
Compare
in searchIndex
|
@idodeclare would those changes work? |
|
@vladak I guess if @idodeclare will be OK, we can merge this |
No description provided.