Skip to content

Commit a371cbb

Browse files
committed
Add HtmlToDjot
1 parent 77d6e23 commit a371cbb

File tree

3 files changed

+819
-0
lines changed

3 files changed

+819
-0
lines changed

docs/converters.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,3 +127,97 @@ Check out [our site](https://example.com).
127127
- Migrating forum content to modern platforms
128128
- Converting archived discussions
129129
- Importing user-generated content from legacy systems
130+
131+
## HtmlToDjot
132+
133+
Converts HTML to Djot markup. Useful for importing content from CMS systems, WYSIWYG editors, or web scraping.
134+
135+
```php
136+
use Djot\Converter\HtmlToDjot;
137+
138+
$converter = new HtmlToDjot();
139+
$djot = $converter->convert($html);
140+
```
141+
142+
**Conversion Table:**
143+
144+
| HTML | Djot Output |
145+
|------|-------------|
146+
| `<strong>`, `<b>` | `*bold*` |
147+
| `<em>`, `<i>` | `_italic_` |
148+
| `<u>`, `<ins>` | `{+underline+}` |
149+
| `<s>`, `<del>`, `<strike>` | `{-deleted-}` |
150+
| `<mark>` | `{=highlighted=}` |
151+
| `<sup>` | `^superscript^` |
152+
| `<sub>` | `~subscript~` |
153+
| `<code>` | `` `code` `` |
154+
| `<pre><code>` | ` ```code block``` ` |
155+
| `<a href="...">` | `[text](url)` |
156+
| `<img src="..." alt="...">` | `![alt](src)` |
157+
| `<h1>` - `<h6>` | `#` - `######` |
158+
| `<p>` | Paragraph |
159+
| `<blockquote>` | `> quote` |
160+
| `<ul>`, `<ol>` | `- item` / `1. item` |
161+
| `<hr>` | `---` |
162+
| `<br>` | `\` (hard break) |
163+
| `<table>` | Djot table syntax |
164+
| `<dl>`, `<dt>`, `<dd>` | Definition list |
165+
| `<span class="x">` | `[text]{.x}` |
166+
| `<figure>` + `<figcaption>` | Image with caption |
167+
168+
**File Operations:**
169+
170+
```php
171+
// Convert file and get result
172+
$djot = $converter->convertFile('/path/to/file.html');
173+
```
174+
175+
**Example:**
176+
177+
```php
178+
$html = <<<'HTML'
179+
<article>
180+
<h1>Welcome</h1>
181+
<p>This is <strong>important</strong> and <em>emphasized</em>.</p>
182+
<ul>
183+
<li>First item</li>
184+
<li>Second item</li>
185+
</ul>
186+
<blockquote>A famous quote</blockquote>
187+
<pre><code class="language-php">echo "Hello";</code></pre>
188+
</article>
189+
HTML;
190+
191+
$djot = $converter->convert($html);
192+
```
193+
194+
Output:
195+
```djot
196+
# Welcome
197+
198+
This is *important* and _emphasized_.
199+
200+
- First item
201+
- Second item
202+
203+
> A famous quote
204+
205+
```php
206+
echo "Hello";
207+
```
208+
```
209+
210+
**Behavior:**
211+
- Strips `<script>`, `<style>`, and `<noscript>` tags
212+
- Normalizes whitespace (multiple spaces/newlines become single space)
213+
- Preserves whitespace inside `<pre>` blocks
214+
- Detects code language from `class="language-xxx"` attribute
215+
- Converts `<span>` with class/id to Djot span syntax
216+
- Handles nested lists
217+
- Handles tables with headers
218+
219+
**Use Cases:**
220+
- Importing content from WordPress or other CMS
221+
- Converting WYSIWYG editor output
222+
- Web scraping and content extraction
223+
- Migrating HTML documentation to Djot

0 commit comments

Comments
 (0)