Skip to content

Commit a6a4bc0

Browse files
committed
Merge pull request #40 from isc-tdyar/062-fix-iris-vector
fix: add entity_types parameter to extract_batch_with_dspy (Feature 062)
2 parents e43b1d4 + e9d7ab5 commit a6a4bc0

34 files changed

+7504
-25
lines changed

.specify/memory/constitution.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,18 @@ Production deployments MUST include:
287287

288288
**Package Management**: All Python projects MUST use uv for dependency management, virtual environment creation, and package installation. Traditional pip/virtualenv workflows are deprecated in favor of uv's superior performance and reliability.
289289

290+
**Testing Standards (NON-NEGOTIABLE)**:
291+
292+
All tests MUST be executed using `python -m pytest` instead of standalone `pytest` command. This ensures correct Python module resolution and prevents import failures.
293+
294+
**Critical Lesson Learned**:
295+
-`pytest command` → imports failed (wrong Python path)
296+
-`python -m pytest` → all tests pass
297+
298+
**Always use:** `python -m pytest` for reliable module imports
299+
300+
**Rationale**: The standalone `pytest` command may use a different Python interpreter or sys.path configuration than the active virtual environment, leading to module import errors even when packages are correctly installed. Using `python -m pytest` guarantees tests run with the same Python interpreter and import paths as the application code.
301+
290302
Code MUST pass linting (black, isort, flake8, mypy) before commits. All public APIs MUST include comprehensive docstrings. Breaking changes MUST follow semantic versioning. Dependencies MUST be pinned and regularly updated for security.
291303

292304
Documentation MUST include quickstart guides, API references, and integration examples. Agent-specific guidance files (CLAUDE.md) MUST be maintained for AI development assistance.

CHANGELOG.md

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,169 @@
11
# Changelog
22

3+
## [0.5.5] - 2025-11-16
4+
5+
### Fixed - Entity Types Configuration Bug (Feature 062)
6+
- **Entity Types Configuration**: `EntityExtractionService.extract_batch_with_dspy()` now accepts and honors `entity_types` parameter from configuration
7+
- **Issue**: Configured entity types were ignored and healthcare-specific defaults (USER, MODULE, VERSION) were always used
8+
- **Root Cause**: Method signature lacked `entity_types` parameter, couldn't pass config to `TrakCareEntityExtractionModule`
9+
- **Fix**: Added `entity_types: Optional[List[str]] = None` parameter with resolution chain: parameter > config > DEFAULT_ENTITY_TYPES
10+
- **Impact**: HotpotQA Question 2 now answers correctly (F1 improved from 0.000 to >0.0)
11+
- **Files Modified**: `iris_vector_rag/services/entity_extraction.py`
12+
- Line 41-49: Added `DEFAULT_ENTITY_TYPES` constant with domain-neutral defaults
13+
- Line 890-955: Updated `extract_batch_with_dspy()` signature and implementation
14+
- Added parameter validation (ValueError for empty list)
15+
- Added warning logging for unknown entity types
16+
- Updated docstring with parameter documentation and examples
17+
18+
### Added
19+
- `DEFAULT_ENTITY_TYPES` constant for domain-neutral entity type defaults
20+
- Values: `["PERSON", "ORGANIZATION", "LOCATION", "PRODUCT", "EVENT"]`
21+
- Replaces healthcare-specific defaults (USER, MODULE, VERSION) when configuration missing
22+
- `entity_types` parameter to `EntityExtractionService.extract_batch_with_dspy()`
23+
- Backward compatible (defaults to None)
24+
- Validation for empty list (raises ValueError with clear message)
25+
- Warning logging for unknown entity types (supports custom types)
26+
- Contract tests (`tests/contract/test_entity_types_config.py`)
27+
- 7 tests validating parameter acceptance, defaults, validation, typing, and backward compatibility
28+
- Test Results: 6/7 passing (1 skipped due to service initialization requirements)
29+
30+
## [0.5.4] - 2025-11-14
31+
32+
### Fixed - Critical Bug Fixes
33+
- **CRITICAL (Bug 1)**: Fixed AttributeError breaking all database connections (iris_dbapi_connector.py:210)
34+
- **Issue**: Non-existent `iris.connect()` method caused AttributeError in v0.5.3
35+
- **Fix**: Replaced with correct `iris.createConnection()` API
36+
- **Impact**: Restores database connectivity (was completely broken in v0.5.3)
37+
- **Test Results**: FHIR-AI test suite now 6/6 passing (up from 3/6 in v0.5.3)
38+
- ✅ ConfigurationManager (backward compatibility preserved)
39+
- ✅ ConnectionManager (was failing - now fixed)
40+
- ✅ IRISVectorStore (was failing - now fixed)
41+
- ✅ SchemaManager (was failing - now fixed)
42+
- ✅ Environment Variables (backward compatibility preserved)
43+
- ✅ Document Model (backward compatibility preserved)
44+
- **Files Modified**: `iris_vector_rag/common/iris_dbapi_connector.py`
45+
- Line 210: `iris.connect()``iris.createConnection()`
46+
- Enhanced error handling: AttributeError → ConnectionError with clear messages
47+
- Updated docstrings and log messages
48+
49+
- **HIGH PRIORITY (Bug 2)**: Added automatic iris-vector-graph table initialization
50+
- **Issue**: Silent PPR (Personalized PageRank) failures due to missing database tables
51+
- **Fix**: Automatic detection and creation of iris-vector-graph tables during pipeline initialization
52+
- **Impact**: Eliminates "Table not found" errors for GraphRAG operations
53+
- **Performance**: Table initialization completes in < 5 seconds (4 tables created)
54+
- **Tables Created**: rdf_labels, rdf_props, rdf_edges, kg_NodeEmbeddings_optimized
55+
- **Files Modified**: `iris_vector_rag/storage/schema_manager.py`
56+
- Added `_detect_iris_vector_graph()` method (uses importlib.util.find_spec)
57+
- Added `ensure_iris_vector_graph_tables()` public method
58+
- Added `validate_graph_prerequisites()` validation method
59+
- Added `InitializationResult` dataclass for table creation results
60+
- Added `ValidationResult` dataclass for prerequisite validation results
61+
62+
### Technical Details
63+
**Bug 1 - Connection API Fix**:
64+
- Root Cause: intersystems-irispython v5.3.0 provides `iris.createConnection()`, not `iris.connect()`
65+
- Error Messages: Clear ConnectionError with connection parameters and remediation steps
66+
- Backward Compatibility: No breaking changes to public APIs
67+
- Testing: Contract tests verify no AttributeError during connection establishment
68+
69+
**Bug 2 - Schema Initialization**:
70+
- Detection: Non-invasive package detection (no import side effects)
71+
- Initialization: Idempotent table creation (safe to call multiple times)
72+
- Validation: Clear error messages listing specific missing prerequisites
73+
- Graceful Degradation: Skips initialization when iris-vector-graph not installed
74+
- Logging: INFO for success, ERROR for failures with actionable context
75+
76+
### Migration Notes
77+
**From v0.5.3 to v0.5.4**:
78+
- No action required - bug fixes are backward compatible
79+
- ConnectionManager automatically uses correct API
80+
- SchemaManager automatically initializes iris-vector-graph tables if package detected
81+
- Optional: Run `SchemaManager.validate_graph_prerequisites()` to verify setup
82+
83+
## [0.5.3] - 2025-11-12
84+
85+
### Fixed
86+
- **CRITICAL**: Fixed SchemaManager bug where VECTOR_DIMENSION environment variable was ignored
87+
- SchemaManager now correctly reads vector dimension from CloudConfiguration API
88+
- Previous behavior: Always returned default 384 dimensions regardless of VECTOR_DIMENSION env var
89+
- New behavior: Respects configuration priority (env > config > defaults) via Feature 058 CloudConfiguration
90+
- Impact: Fixes FHIR-AI-Hackathon deployment issues where custom embedding dimensions were required
91+
- Fixed iris.dbapi import issues in connection_pool.py
92+
- Replaced invalid `Connection` type hints with `Any` (iris.dbapi doesn't export Connection class)
93+
- Removed incorrect `from iris.dbapi import Connection` import
94+
95+
### Added
96+
- **Integration Test Coverage**: 9 comprehensive integration tests against real IRIS database
97+
- `TestConnectionManagerIntegration`: 2 tests validating ConnectionManager with CloudConfiguration
98+
- `TestSchemaManagerIntegration`: 3 tests validating SchemaManager dimension configuration
99+
- `TestConfigurationPriorityChain`: 3 tests validating env > config > defaults priority
100+
- `TestCompleteConfigurationFlow`: 1 test validating end-to-end configuration to database
101+
- All tests verify real IRIS database operations (not mocked)
102+
- Test Results: 9/9 passing (100%)
103+
104+
### Technical Details
105+
- Files Modified:
106+
- `iris_vector_rag/storage/schema_manager.py` - Lines 49-77: Changed from incorrect `config.get("embedding_model.dimension", 384)` to `cloud_config.vector.vector_dimension`
107+
- `iris_vector_rag/common/connection_pool.py` - Replaced 7 Connection type hints with Any
108+
- Test Coverage: Added `tests/integration/test_cloud_config_integration.py` (400 lines)
109+
- FHIR-AI-Hackathon Compatibility: SchemaManager now properly reads VECTOR_DIMENSION=1024 and other custom dimensions
110+
111+
## [0.5.2] - 2025-11-12
112+
113+
### Added - Cloud Configuration Flexibility (Feature 058)
114+
- **Environment Variable Support**: Configure IRIS connection via environment variables
115+
- `IRIS_HOST`, `IRIS_PORT`, `IRIS_USERNAME`, `IRIS_PASSWORD`, `IRIS_NAMESPACE`
116+
- `VECTOR_DIMENSION` (128-8192), `TABLE_SCHEMA` for cloud deployments
117+
- **12-Factor App Configuration**: Priority order (env > config > defaults)
118+
- **Configuration Source Tracking**: Audit trail showing where each value originated
119+
- **Vector Dimension Flexibility**: Support 128-8192 dimensions for different embedding models
120+
- 384: SentenceTransformers (default)
121+
- 1024: NVIDIA NIM, Cohere
122+
- 1536: OpenAI ada-002
123+
- 3072: OpenAI text-embedding-3-large
124+
- **Table Schema Configuration**: Configurable schema prefix via `TABLE_SCHEMA` env var
125+
- AWS: SQLUser schema requirement
126+
- Azure: RAG schema (default)
127+
- Local: RAG schema (default)
128+
- **Validation Framework**: Preflight validation for vector dimensions and namespaces
129+
- `VectorDimensionValidator`: Prevents data corruption from dimension mismatches
130+
- `NamespaceValidator`: Validates namespace permissions
131+
- `PreflightValidator`: Orchestrates all validation checks
132+
- **Configuration Entities**: Strongly-typed configuration models
133+
- `ConnectionConfiguration`, `VectorConfiguration`, `TableConfiguration`
134+
- `CloudConfiguration` with `.validate()` method
135+
- **Deployment Examples**: Production-ready configuration templates
136+
- `config/examples/aws.yaml` - AWS IRIS (%SYS namespace, SQLUser schema)
137+
- `config/examples/azure.yaml` - Azure IRIS (USER namespace, Azure Key Vault)
138+
- `config/examples/local.yaml` - Local development (Docker-ready)
139+
- `config/examples/README.md` - Comprehensive deployment guide
140+
- **Password Masking**: Automatic password masking in configuration logs (`***MASKED***`)
141+
142+
### Changed
143+
- `ConfigurationManager.get_cloud_config()`: New method for cloud deployment configuration
144+
- Configuration priority system ensures 100% backward compatibility
145+
- All existing APIs continue to work unchanged (v0.4.x compatible)
146+
147+
### Fixed
148+
- Cloud deployment configuration now properly respects environment variables
149+
- Vector dimension validation prevents data corruption from configuration errors
150+
151+
### Documentation
152+
- Added cloud deployment examples for AWS, Azure, and local environments
153+
- Security best practices for secret management (AWS Secrets Manager, Azure Key Vault)
154+
- Configuration troubleshooting guide with common errors and solutions
155+
156+
### Technical Details
157+
- **Files Added**: 7 new files (~1,500 lines)
158+
- `iris_vector_rag/config/entities.py` (380 lines)
159+
- `iris_vector_rag/config/validators.py` (420 lines)
160+
- Configuration examples and documentation (700 lines)
161+
- **Files Modified**: 2 files
162+
- `iris_vector_rag/config/manager.py` - Added `get_cloud_config()` method
163+
- Contract test suite - 18/22 tests passing (82%)
164+
- **Test Coverage**: 18 contract tests passing, all unit tests passing
165+
- **Zero Breaking Changes**: 100% backward compatible with v0.5.1
166+
3167
## [0.5.1] - 2025-11-09
4168

5169
### Fixed

0 commit comments

Comments
 (0)