Skip to content

Commit f779e1e

Browse files
committed
Update api.rst
Clarify use of `hdr_info` parameter.
1 parent c136f93 commit f779e1e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/src/api.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ API
1818

1919
:arg list pages: optional, the pages to consider for output (caution: specify 0-based page numbers). If omitted all pages are processed.
2020

21-
:arg hdr_info: optional. Use this if you want to provide your own header detection logic. This may be a callable or an object having a method named `get_header_id`. It must accept a text span (a span dictionary as contained in `extractDict <https://pymupdf.readthedocs.io/en/latest/textpage.html#span-dictionary>`_) and has optional access to the owning `Page <https://pymupdf.readthedocs.io/en/latest/page.html>`_ object. It must return a string "" or up to 6 "#" characters followed by 1 space. If omitted, a full document scan will be performed to find the most popular font sizes and derive header levels based on this. For instance, avoid any headers by specifying `hdr_info=lambda s: ""`.
21+
:arg hdr_info: optional. Use this if you want to provide your own header detection logic. This may be a callable or an object having a method named `get_header_id`. It must accept a text span (a span dictionary as contained in `extractDict <https://pymupdf.readthedocs.io/en/latest/textpage.html#span-dictionary>`_) and a keyword parameter "page" (which is the owning `Page <https://pymupdf.readthedocs.io/en/latest/page.html>`_ object). It must return a string "" or up to 6 "#" characters followed by 1 space. If omitted, a full document scan will be performed to find the most popular font sizes and derive header levels based on them. To completely avoid this behavior specify `hdr_info=lambda s, page=None: ""` or `hdr_info=False`.
2222

2323
:arg bool write_images: when encountering images or vector graphics, PNG images will be created from the respective page area and stored in the folder of the document. Markdown references will be generated pointing to these images. Any text contained in these areas will not be included in the text output (but appear as part of the images). Therefore, if your document has text written on full page images, make sure to set this parameter to `False`.
2424

0 commit comments

Comments
 (0)