@@ -127,3 +127,97 @@ Check out [our site](https://example.com).
127127- Migrating forum content to modern platforms
128128- Converting archived discussions
129129- Importing user-generated content from legacy systems
130+
131+ ## HtmlToDjot
132+
133+ Converts HTML to Djot markup. Useful for importing content from CMS systems, WYSIWYG editors, or web scraping.
134+
135+ ``` php
136+ use Djot\Converter\HtmlToDjot;
137+
138+ $converter = new HtmlToDjot();
139+ $djot = $converter->convert($html);
140+ ```
141+
142+ ** Conversion Table:**
143+
144+ | HTML | Djot Output |
145+ | ------| -------------|
146+ | ` <strong> ` , ` <b> ` | ` *bold* ` |
147+ | ` <em> ` , ` <i> ` | ` _italic_ ` |
148+ | ` <u> ` , ` <ins> ` | ` {+underline+} ` |
149+ | ` <s> ` , ` <del> ` , ` <strike> ` | ` {-deleted-} ` |
150+ | ` <mark> ` | ` {=highlighted=} ` |
151+ | ` <sup> ` | ` ^superscript^ ` |
152+ | ` <sub> ` | ` ~subscript~ ` |
153+ | ` <code> ` | `` `code` `` |
154+ | ` <pre><code> ` | ` ```code block``` ` |
155+ | ` <a href="..."> ` | ` [text](url) ` |
156+ | ` <img src="..." alt="..."> ` | `  ` |
157+ | ` <h1> ` - ` <h6> ` | ` # ` - ` ###### ` |
158+ | ` <p> ` | Paragraph |
159+ | ` <blockquote> ` | ` > quote ` |
160+ | ` <ul> ` , ` <ol> ` | ` - item ` / ` 1. item ` |
161+ | ` <hr> ` | ` --- ` |
162+ | ` <br> ` | ` \ ` (hard break) |
163+ | ` <table> ` | Djot table syntax |
164+ | ` <dl> ` , ` <dt> ` , ` <dd> ` | Definition list |
165+ | ` <span class="x"> ` | ` [text]{.x} ` |
166+ | ` <figure> ` + ` <figcaption> ` | Image with caption |
167+
168+ ** File Operations:**
169+
170+ ``` php
171+ // Convert file and get result
172+ $djot = $converter->convertFile('/path/to/file.html');
173+ ```
174+
175+ ** Example:**
176+
177+ ``` php
178+ $html = <<<'HTML'
179+ <article >
180+ <h1 >Welcome</h1 >
181+ <p >This is <strong >important</strong > and <em >emphasized</em >.</p >
182+ <ul >
183+ <li >First item</li >
184+ <li >Second item</li >
185+ </ul >
186+ <blockquote >A famous quote</blockquote >
187+ <pre ><code class =" language-php" >echo "Hello";</code ></pre >
188+ </article >
189+ HTML;
190+
191+ $djot = $converter->convert($html);
192+ ```
193+
194+ Output:
195+ ``` djot
196+ # Welcome
197+
198+ This is *important* and _emphasized_.
199+
200+ - First item
201+ - Second item
202+
203+ > A famous quote
204+
205+ ```php
206+ echo "Hello";
207+ ```
208+ ```
209+
210+ **Behavior:**
211+ - Strips `<script>`, `<style>`, and `<noscript>` tags
212+ - Normalizes whitespace (multiple spaces/newlines become single space)
213+ - Preserves whitespace inside `<pre>` blocks
214+ - Detects code language from `class="language-xxx"` attribute
215+ - Converts `<span>` with class/id to Djot span syntax
216+ - Handles nested lists
217+ - Handles tables with headers
218+
219+ **Use Cases:**
220+ - Importing content from WordPress or other CMS
221+ - Converting WYSIWYG editor output
222+ - Web scraping and content extraction
223+ - Migrating HTML documentation to Djot
0 commit comments