Skip to content

manfromarce/DocSharp

DocSharp

DocSharp is a pure C# library to convert between document formats without Office interop or native dependencies.

The following packages are currently available:

  • DocSharp.Binary: convert Office 97-2003 binary documents (doc, xls, ppt) to OpenXML documents (docx, xlsx, pptx). This is a fork of the abandoned b2xtranslator project which provides critical fixes.
    Note: pre-97 formats and XLSB are very different and not supported. For Excel files, you can also consider the ExcelDataReader library to read at least plain values from more file types.
  • DocSharp.Docx: convert DOCX to RTF, HTML, Markdown and plain text (.txt). Possible applications include generating Open XML documents in C# and exporting for other editors/services, or loading Microsoft Word documents in a RichTextBox / RichEditBox control.
  • DocSharp.Markdown: convert Markdown to DOCX or RTF using custom Markdig renderers.

Packages can be installed via NuGet:
NuGet NuGet NuGet NuGet NuGet

The optional extra packages DocSharp.ImageSharp and DocSharp.SystemDrawing allow to convert unsupported images (e.g. GIF / TIFF for DOCX -> RTF or WMF / EMF / TIFF for DOCX -> MD).

There is no common DOM to manipulate or generate documents, this library is mainly for conversion. Some helper methods on top of the Open XML SDK and format-specific writers are available, but they are mostly intended for internal use; however they could be extended/improved in the future.

You can also consider the following libraries for documents creation and manipulation: the Open XML SDK itself, OfficeIMO, OpenXML-Office, ClosedXML, ShapeCrawler, QuestPDF, MigraDoc.

Supported features

  • Binary formats: most doc/xls/ppt features were supported by the original project, but exceptions occurred when using .NET (rather than .NET Framework) or loading specific documents. The most noticeable issues have been fixed, but more work is needed to make the library reliable; if you find other bugs, you are welcome to open an issue (please attach a sample file if the issue only occurs for specific documents).
  • DOCX, RTF, Markdown: supported elements vary depending on input and output formats, see Supported features for an overview.

Requirements

  • .NET 8, 9, 10 and .NET Framework 4.6.2 and higher are supported, but tests are mostly performed on .NET 8 and above.
  • DocSharp.SystemDrawing is for Windows only (.NET Framework or net*-windows), as System.Drawing.Common is only supported on Windows; while DocSharp.ImageSharp is cross-platform for .NET 8+ (ImageSharp does not support .NET Framework).

Usage

You can refer to the project Wiki or sample app.

Roadmap

  • Implement reverse RTF to DOCX conversion (⌛ started)
  • Implement DOCX renderer using QuestPDF (⌛ started)
  • Support more elements and attributes, and fix issues on edge cases
  • Reduce code duplication, cleanup
  • Async functions/progress callback (some tasks such as downloading images referenced in Markdown may take some time)
  • Improve support for right-to-left and complex script languages

Credits

Dependencies:

Forked:

Others:

  • Html2OpenXml for images header decoding and unit conversions.
  • dwml_cs for Office Math (OMML) to LaTex conversion
  • addFormula2docx for Office Math (OMML) to MathML conversion
  • OpenMcdf for better understanding Microsoft Compound format.

Used in the sample app or for internal tests/comparisons (not direct dependencies when installing packages):

License

DocSharp is licensed under MIT license and can be used for both open source and commercial projects.

DocSharp.ImageSharp is licensed under Apache 2.0 license; ImageSharp and VectSharp have their own licenses, please visit their repositories for more information.

If you find the library useful, adding a star is highly appreciated, stars are a way to guide other developers towards helpful libraries and tools.

About

Pure C# library to convert between document formats (Office 97-2003, Open XML, RTF, Markdown)

Topics

Resources

License

MIT, Apache-2.0 licenses found

Licenses found

MIT
LICENSE
Apache-2.0
LICENSE-Apache

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 5