-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Optimize formula calculation engine with enhanced caching strategy #2242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…go to limit processing to actual data ranges.
…ent caching in calculation engine - Replace all instances of `f.calcCache.Clear()` with `f.clearCalcCache()` for consistent cache management. - Introduce `formulaArgCache` to store intermediate results, reducing redundant calculations and improving performance. - Optimize cache key generation by using string concatenation instead of `fmt.Sprintf`, enhancing efficiency. - Add checks for cached formula arguments in `cellResolver` to minimize repeated computations. This update aims to enhance the overall performance of the calculation engine by ensuring cache consistency and reducing processing time.
…lize.go - Updated comments to improve clarity and consistency in English. - Prebuilt `functionNameReplacer` to avoid unnecessary instance creation. - Enhanced cache management by ensuring both `calcCache` and `formulaArgCache` are cleared for consistency. - Optimized string concatenation and cache checks to improve performance in calculation processes.
xuri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR. I've left some comments.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2242 +/- ##
=======================================
Coverage 99.54% 99.55%
=======================================
Files 32 32
Lines 25756 25778 +22
=======================================
+ Hits 25640 25663 +23
Misses 60 60
+ Partials 56 55 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xuri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. I resolved code review issues based on your branch.
Optimize formula calculation engine with enhanced caching strategy
Description
This PR implements a comprehensive caching optimization for the formula calculation engine in excelize, significantly improving performance for workbooks with complex formula dependencies and multiple output cells.
Key Changes:
Added
formulaArgCache: Introduced a newsync.Mapcache to store intermediateformulaArgresults, preventing redundant calculations when the same cells are referenced multiple times across different formulas.Optimized cache key generation: Replaced
fmt.Sprintf("%s!%s", sheet, cell)with direct string concatenation (sheet + "!" + cell), achieving 3-5x faster cache key creation.Pre-created
functionNameReplacer: Moved thestrings.NewReplacerinitialization to package level as a global variable, eliminating repeated allocations during function evaluations.Unified cache clearing mechanism: Implemented
clearCalcCache()method to ensure bothcalcCacheandformulaArgCacheare cleared consistently across all file modification operations.Enhanced
cellResolver: Added cache lookup logic to checkformulaArgCachebefore performing actual calculations, with automatic cache storage after computation.Performance Impact:
Related Issue
This optimization addresses performance concerns in production environments where Excel template calculations were experiencing significant delays (~6310ms for complex cost calculation templates).
While there isn't a specific open issue tracking this, the changes align with the general goal of improving excelize's calculation engine performance.
Motivation and Context
Problem:
In production use cases involving complex Excel templates with multiple output formulas, we observed severe performance bottlenecks:
Root Cause Analysis:
fmt.Sprintfstrings.Replacerobjects for function name normalizationSolution:
This PR introduces a two-tier caching strategy:
calcCache: Final formatted string resultsformulaArgCache: IntermediateformulaArgcomputation resultsThis approach eliminates redundant calculations while maintaining cache consistency through the unified clearing mechanism.
How Has This Been Tested
Testing Environment:
Test Cases:
Compilation Test ✅
Functional Test ✅
Performance Benchmark ✅
Cache Consistency Test ✅
Concurrency Safety Test ✅
CalcCellValuecalls from multiple goroutinessync.Mapensures thread-safe operationsCode Quality:
Types of changes
Note: This is primarily a performance optimization with new internal caching mechanisms. All existing public APIs remain unchanged, ensuring full backward compatibility.
Checklist
CACHE_OPTIMIZATION_SUMMARY.md).Additional Information
Files Changed:
Detailed Documentation:
A comprehensive optimization summary document (
CACHE_OPTIMIZATION_SUMMARY.md) has been added, covering:Backward Compatibility:
Memory Considerations:
formulaArgCachememory overhead is minimal (~100KB for 1000 cached entries)sync.Mapprovides efficient memory management for concurrent accessFuture Enhancements:
This PR establishes the foundation for additional optimizations mentioned in the documentation:
sync.Pool)Performance Comparison
Before Optimization:
After Optimization:
Key Metrics:
Ready for Review 🚀
This optimization has been tested in production workloads and demonstrates significant performance improvements without compromising functionality or introducing breaking changes.