-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
@FIX_PATHWAY - Hook Content Leaking into Exports
Current Behavior: filter_pure_conversation() returns user messages containing <session-start-hook> and <user-prompt-submit-hook> content
Impact: Pollutes semantic search with hook instructions instead of pure conversation
Root Cause:
- File:
claude_parser/filtering/filters.py:35 - Only checks
isVisibleInTranscriptOnlyflag - Doesn't check for hook content embedded in regular user messages
- Hook tags appear in user messages WITHOUT the flag set
Evidence:
# Message gets through filter but contains hooks:
type=user
isVisibleInTranscriptOnly=None # Not marked!
text="<session-start-hook>FEATURE_IMPLEMENTATION:..."Test Results:
Total messages: 424
After filter_pure_conversation: 202
❌ First message has hook content: "<session-start-hook>..."
@PATTERN_SMELL_GATE - Why Manual Checks?
SMELL DETECTED: Manual string checking when hooks have structured markers
QUESTION: Do hook messages have consistent patterns we can detect?
<session-start-hook><user-prompt-submit-hook><post-tool-use-hook><system-reminder>
@FRAMEWORK_GATE - Required Fix
Add content-based filtering to filter_pure_conversation:
def filter_pure_conversation(messages: List) -> Iterator:
"""Filter pure conversation - exclude tool operations and system messages"""
from ..messages.utils import is_hook_message
# Hook content patterns to exclude
HOOK_PATTERNS = [
'<session-start-hook>',
'<user-prompt-submit-hook>',
'<post-tool-use-hook>',
'<system-reminder>',
'<command-name>', # CLI commands
'<local-command-stdout>' # CLI output
]
def is_pure_conversation(msg):
# Must be user or assistant
if msg.get('type') not in ['user', 'assistant']:
return False
# Skip meta messages
if msg.get('is_meta', False):
return False
# Skip compact summaries
if msg.get('isCompactSummary', False):
return False
# Skip hook messages using util
if is_hook_message(msg):
return False
# NEW: Check content for hook patterns
from ..messages.utils import get_text
text = get_text(msg)
if any(pattern in text for pattern in HOOK_PATTERNS):
return False
return True
return filter(is_pure_conversation, messages)@TDD_GATE - Test Requirements
Add test to verify hook content excluded:
def test_filter_excludes_hook_content():
"""Hook content in regular messages should be filtered"""
messages = [
{'type': 'user', 'text': 'normal user message'},
{'type': 'user', 'text': '<session-start-hook>hook content here'},
{'type': 'user', 'text': '<user-prompt-submit-hook>more hooks'},
{'type': 'assistant', 'text': 'normal assistant response'},
]
filtered = list(filter_pure_conversation(messages))
assert len(filtered) == 2
assert all('<hook>' not in get_text(msg) for msg in filtered)@VERIFICATION_GATE - Acceptance Criteria
- Hook content patterns detected in message text
- Messages with hook tags excluded from filter
- Test verifies no hook content passes through
- Export test:
export_for_llamaindexreturns zero hook content - Regression test: Normal conversation still passes through
@DECISION_GATE - Pattern Detection vs Flags
DECISION: Use content pattern detection + flags
RATIONALE:
- Flags alone insufficient (hooks appear in unflagged messages)
- Content patterns are consistent and well-defined
- Small set of patterns (6 tags)
- Fast string contains check
ALTERNATIVE REJECTED: Only use flags
- Incomplete: Hook content appears without flags set
- Leaks hook instructions into exports
Integration Test Command
# Verify no hooks in export:
python3 -c "
from claude_parser.export import export_for_llamaindex
from claude_parser.discovery import discover_claude_files
docs = export_for_llamaindex(str(list(discover_claude_files())[0]))
hook_found = any('<hook>' in doc.text for doc in docs)
print(f'Hook content found: {hook_found}') # Should be False
"Priority: High - Polluting semantic search indexes
Framework: LNCA v4.1
Pathway: @FIX_PATHWAY
Estimated LOC: <15 lines changed
Metadata
Metadata
Assignees
Labels
No labels