-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
@FIX_PATHWAY - API Contract Violation
Current Behavior: export_for_llamaindex() returns Dict[str, Any] but callers expect Document objects
Impact: Breaking integration with semantic-search-service - requires manual conversion step
Root Cause:
- File:
claude_parser/export/llamaindex.py:36 - Function returns dicts with
{'text': str, 'metadata': dict} - Callers use
VectorStoreIndex.from_documents()which expectsDocumentobjects
Evidence from LlamaIndex Docs:
# Correct pattern (from llama_index docs):
from llama_index.core import Document, VectorStoreIndex
documents = [
Document(text="content", metadata={"key": "value"}),
Document(text="more", metadata={"key": "value"})
]
index = VectorStoreIndex.from_documents(documents) # ONE LINE@FRAMEWORK_GATE Required Fix
Change _extract_document to return Document objects:
def _extract_document(msg: Dict[str, Any]) -> Document:
"""Transform message to LlamaIndex Document object
@FRAMEWORK_FIRST: Return framework types, not dicts
"""
from llama_index.core import Document # Optional dependency
return Document(
text=get_text(msg),
metadata={
'speaker': msg.get('type', 'unknown'),
'uuid': msg.get('uuid', ''),
'timestamp': msg.get('timestamp', ''),
'session_id': msg.get('sessionId', '')
}
)Update return type annotation:
def export_for_llamaindex(
jsonl_path: str,
batch_size: int = None
) -> Union[List[Document], Iterator[List[Document]]]: # Not Dict!Update docstring with TRUE one-liner example:
"""Export conversation as LlamaIndex Document objects
Returns:
List[Document] or Iterator[List[Document]] - Ready for VectorStoreIndex.from_documents()
Example - TRUE ONE-LINER for long-term memory:
from llama_index.core import VectorStoreIndex
from claude_parser.export import export_for_llamaindex
# ONE line to create searchable memory:
index = VectorStoreIndex.from_documents(
export_for_llamaindex("conversation.jsonl")
)
"""@TDD_GATE - Test Requirements
Update test to verify Document objects:
def test_export_returns_document_objects():
from llama_index.core import Document
result = export_for_llamaindex("test.jsonl")
assert isinstance(result, list)
assert all(isinstance(doc, Document) for doc in result)
assert hasattr(result[0], 'text')
assert hasattr(result[0], 'metadata')@VERIFICATION_GATE - Acceptance Criteria
-
_extract_documentreturnsDocumentobjects (not dicts) - Return type annotation updated to
List[Document] - Docstring shows one-liner example
- Tests verify
Documentobjects returned - Integration test:
VectorStoreIndex.from_documents(result)works - No breaking changes to batch mode
@DECISION_GATE - Why Document Objects?
DECISION: Export framework types, not primitives
RATIONALE:
- Enables true one-liner:
VectorStoreIndex.from_documents(export_for_llamaindex(path)) - Type-safe integration
- Follows LlamaIndex patterns
- Optional dependency (import inside function)
- Zero boilerplate for callers
ALTERNATIVE REJECTED: Keep dicts + document conversion step
- Requires manual conversion:
[Document(**d) for d in result] - Not a "one-liner" anymore
- Violates @FRAMEWORK_FIRST principle
@MEMORY_UPDATE_GATE
After fix, update project navigator:
LlamaIndex | EXISTS | Document-export-API | search:"export_for_llamaindex"
Integration | FIXED | One-liner-memory-creation | search:"VectorStoreIndex.from_documents"
Priority: High - Blocking semantic-search integration
Framework: LNCA v4.1
Pathway: @FIX_PATHWAY
Estimated LOC: <10 lines changed
Metadata
Metadata
Assignees
Labels
No labels