Skip to content

Conversation

@levi42x
Copy link

@levi42x levi42x commented Nov 20, 2025

Fixes #3659

This PR fixes the issue where copyright signs using bracket notation [C] and [c] weren't being detected.

Implementation Steps

  1. Created test cases using the examples from the issue
  2. Ran tests - both failed (brackets were stripped, leaving just C or c)
  3. Added normalization in prepare_text_line() to convert [C] and [c] to (c) before brackets get removed
  4. Re-ran tests - both now pass.

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Testing

Before Fix (Failed Tests) (myenv) PS D:\scancode-dev\scancode-toolkit> python -m pytest tests\cluecode\test_copyrights_basic.py::TestTextPreparation::test_prepare_text_line_normalizes_bracket_C_uppercase tests\cluecode\test_copyrights_basic.py::TestTextPreparation::test_prepare_text_line_normalizes_bracket_c_lowercase -vs ========================================================================================= test session starts ========================================================================================= platform win32 -- Python 3.10.1, pytest-7.4.4, pluggy-1.6.0 -- D:\scancode-dev\scancode-toolkit\myenv\Scripts\python.exe cachedir: .pytest_cache rootdir: D:\scancode-dev\scancode-toolkit configfile: pyproject.toml collecting 2 items 2 tests selected, 0 tests skipped. collected 2 items

tests/cluecode/test_copyrights_basic.py::TestTextPreparation::test_prepare_text_line_normalizes_bracket_C_uppercase FAILED
tests/cluecode/test_copyrights_basic.py::TestTextPreparation::test_prepare_text_line_normalizes_bracket_c_lowercase FAILED

============================================================================================== FAILURES ===============================================================================================
______________________________________________________________ TestTextPreparation.test_prepare_text_line_normalizes_bracket_C_uppercase ______________________________________________________________

self = <test_copyrights_basic.TestTextPreparation testMethod=test_prepare_text_line_normalizes_bracket_C_uppercase>

def test_prepare_text_line_normalizes_bracket_C_uppercase(self):
    # Issue #3659: [C] should be normalized to (C)
    cp = '[C] The Regents of the University of Michigan and Merit Network, Inc. 1992, 1993, 1994, 1995 All Rights Reserved'
    result = prepare_text_line(cp)
  assert result == '(c) The Regents of the University of Michigan and Merit Network, Inc. 1992, 1993, 1994, 1995 All Rights Reserved'

E AssertionError: assert 'C The Regent...ghts Reserved' == '(c) The Rege...ghts Reserved'
E - (c) The Regents of the University of Michigan and Merit Network, Inc. 1992, 1993, 1994, 1995 All Rights Reserved
E ? ^^^
E + C The Regents of the University of Michigan and Merit Network, Inc. 1992, 1993, 1994, 1995 All Rights Reserved
E ? ^

tests\cluecode\test_copyrights_basic.py:75: AssertionError
______________________________________________________________ TestTextPreparation.test_prepare_text_line_normalizes_bracket_c_lowercase ______________________________________________________________

self = <test_copyrights_basic.TestTextPreparation testMethod=test_prepare_text_line_normalizes_bracket_c_lowercase>

def test_prepare_text_line_normalizes_bracket_c_lowercase(self):
    # Issue #3659: [c] should be normalized to (c)
    cp = 'Copyright [c] 2023 Example Company'
    result = prepare_text_line(cp)
  assert result == 'Copyright (c) 2023 Example Company'

E AssertionError: assert 'Copyright c ...ample Company' == 'Copyright (c...ample Company'
E - Copyright (c) 2023 Example Company
E ? - -
E + Copyright c 2023 Example Company

tests\cluecode\test_copyrights_basic.py:81: AssertionError
========================================================================================== warnings summary ===========================================================================================
conftest.py:94
D:\scancode-dev\scancode-toolkit\conftest.py:94: PytestDeprecationWarning: The hookimpl pytest_collection_modifyitems uses old-style configuration options (marks or attributes).
Please use the pytest.hookimpl(trylast=True) decorator instead
to configure the hooks.
See https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers
@pytest.mark.trylast

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================================= short test summary info =======================================================================================
FAILED tests/cluecode/test_copyrights_basic.py::TestTextPreparation::test_prepare_text_line_normalizes_bracket_C_uppercase - AssertionError: assert 'C The Regent...ghts Reserved' == '(c) The Rege...ghts Reserved'
FAILED tests/cluecode/test_copyrights_basic.py::TestTextPreparation::test_prepare_text_line_normalizes_bracket_c_lowercase - AssertionError: assert 'Copyright c ...ample Company' == 'Copyright (c...ample Company'
==================================================================================== 2 failed, 1 warning in 1.29s =====================================================================================

After Fix (Passing Tests) (myenv) PS D:\scancode-dev\scancode-toolkit> python -m pytest tests\cluecode\test_copyrights_basic.py::TestTextPreparation::test_prepare_text_line_normalizes_bracket_C_uppercase tests\cluecode\test_copyrights_basic.py::TestTextPreparation::test_prepare_text_line_normalizes_bracket_c_lowercase -vs =================================================== test session starts =================================================== platform win32 -- Python 3.10.1, pytest-7.4.4, pluggy-1.6.0 -- D:\scancode-dev\scancode-toolkit\myenv\Scripts\python.exe cachedir: .pytest_cache rootdir: D:\scancode-dev\scancode-toolkit configfile: pyproject.toml collecting 2 items 2 tests selected, 0 tests skipped. collected 2 items

tests/cluecode/test_copyrights_basic.py::TestTextPreparation::test_prepare_text_line_normalizes_bracket_C_uppercase PASSED
tests/cluecode/test_copyrights_basic.py::TestTextPreparation::test_prepare_text_line_normalizes_bracket_c_lowercase PASSED

==================================================== warnings summary =====================================================
conftest.py:94
D:\scancode-dev\scancode-toolkit\conftest.py:94: PytestDeprecationWarning: The hookimpl pytest_collection_modifyitems uses old-style configuration options (marks or attributes).
Please use the pytest.hookimpl(trylast=True) decorator instead
to configure the hooks.
See https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers
@pytest.mark.trylast

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================== 2 passed, 1 warning in 1.28s ===============================================

Signed-off-by: Shekhar Suman levi42x@gmail.com

…ection

- Normalize [C] and [c] before bracket removal in prepare_text_line()
- Add tests for both [C] and [c] variants

Signed-off-by: Shekhar <shekharsuman0397@gmail.com>
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @levi42x
See comments for some minor changes and we're good to merge.

assert result == '(c) The Regents of the University of Michigan and Merit Network, Inc. 1992, 1993, 1994, 1995 All Rights Reserved'

def test_prepare_text_line_normalizes_bracket_c_lowercase(self):
cp = 'Copyright [c] 2023 Example Company'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use real copyright examples here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one clarification — do you mean a real example (from any open source like "Copyright [c] 2008 IBM Corporation"), or something real but with slightly larger files?

# normalize copyright signs, quotes and spacing around them
.replace('"Copyright', '" Copyright')

# normalize [C] and [c] to (c) before bracket removal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is not required, the comment above is explanatory and enough. Remove the extra spaces too

Signed-off-by: Shekhar Suman <levi42x@gmail.com>
@levi42x
Copy link
Author

levi42x commented Dec 17, 2025

Hi @AyanSinhaMahapatra,
I’ve implemented the suggested updates. The changes are finalized and ready for merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invalid copyright not detected

2 participants