Skip to content

Commit 11e4551

Browse files
committed
993942-ug: Added missing file.
1 parent 85adad9 commit 11e4551

File tree

1 file changed

+190
-0
lines changed

1 file changed

+190
-0
lines changed
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
---
2+
title: .NET 8 & Tesseract OCR on Amazon Linux 2023 EC2 | Syncfusion
3+
description: Install & configure .NET 8, Tesseract OCR on Amazon Linux 2023 EC2 to perform OCR on PDFs & images using Syncfusion .NET OCR library.
4+
control: PDF
5+
documentation: UG
6+
keywords: Assemblies
7+
---
8+
9+
# Perform OCR with Tesseract on Amazon Linux EC2 using .NET application
10+
11+
The [Syncfusion<sup>&reg;</sup> .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) is used to extract text from scanned PDFs and images in the Linux application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
12+
13+
This guide provides a detailed, step-by-step process for installing Tesseract OCR and its essential dependencies directly on an Amazon Linux 2023 (AL2023) EC2 instance. This approach allows you to deploy .NET applications that utilize OCR functionalities, such as those relying on Syncfusion PDF Processing with Tesseract, without the need for Docker containers.
14+
15+
## Pre-requisites
16+
17+
Before you begin, ensure you have:
18+
19+
* An active Amazon Linux 2023 (AL2023) EC2 instance.
20+
* SSH access to your EC2 instance.
21+
* Basic familiarity with Linux command-line operations.
22+
23+
24+
## Installation steps for .NET 8 and Tesseract OCR on Amazon Linux 2023 EC2
25+
26+
Execute the following commands sequentially in your EC2 instance's terminal. It is recommended to run these commands from the `/home/ec2-user` directory unless specified otherwise.
27+
28+
Step 1: **Update System Packages**: It's crucial to start by ensuring all existing packages on your EC2 instance are up to date
29+
30+
{% highlight c# tabtitle="C#" %}
31+
32+
sudo yum update -y
33+
34+
{% endhighlight %}
35+
36+
Step 2: **Add Microsoft Package Repository** : To install the .NET SDK, you need to add the Microsoft package repository for Fedora 39, which AL2023 is based on.
37+
38+
{% highlight c# tabtitle="C#" %}
39+
40+
sudo curl -o /etc/yum.repos.d/packages-microsoft-com-prod.repo https://packages.microsoft.com/config/fedora/39/prod.repo
41+
42+
{% endhighlight %}
43+
44+
Step 3: **Install .NET SDK**: Install the .NET 8.0 SDK using yum. This is essential for building and running your .NET application.
45+
46+
{% highlight c# tabtitle="C#" %}
47+
48+
sudo yum install -y dotnet-sdk-8.0
49+
50+
{% endhighlight %}
51+
52+
Step 4: **Verify .NET SDK Installation** : Confirm that the .NET SDK has been installed correctly by checking its version.
53+
54+
{% highlight c# tabtitle="C#" %}
55+
56+
sudo dotnet --version
57+
58+
{% endhighlight %}
59+
60+
You should see output similar to 8.0.x (where x is the patch version).
61+
62+
Step 5: **Install `libgdiplus` Package** : `libgdiplus` is a Mono implementation of the GDI+ API, often required by .NET applications for image processing functionalities. Run these commands completely in a single block from the `/home/ec2-user` directory.
63+
64+
{% highlight c# tabtitle="C#" %}
65+
66+
sudo yum groupinstall "Development Tools" -y
67+
sudo yum install autoconf automake libtool gettext libjpeg-turbo-devel libpng-devel giflib-devel freetype-devel -y
68+
69+
git clone https://github.com/mono/libgdiplus.git
70+
cd libgdiplus
71+
./autogen.sh
72+
make
73+
sudo make install
74+
75+
{% endhighlight %}
76+
77+
Step 6: **Install `leptonica` Package** : Leptonica is a software library that forms a core dependency for Tesseract OCR, providing image processing and analysis tools. Run these commands completely in a single block from the `/home/ec2-user` directory.
78+
79+
{% highlight c# tabtitle="C#" %}
80+
81+
sudo yum groupinstall "Development Tools" -y
82+
sudo yum install libjpeg-devel libpng-devel libtiff-devel zlib-devel -y
83+
wget http://www.leptonica.org/source/leptonica-1.82.0.tar.gz
84+
tar -xzf leptonica-1.82.0.tar.gz
85+
cd leptonica-1.82.0
86+
./configure
87+
make
88+
sudo make install
89+
sudo ldconfig
90+
91+
{% endhighlight %}
92+
93+
Step 7: **Install `libpng` Package** : `libpng` is the official PNG reference library, critical for handling PNG image formats often used in OCR processes. Although `libpng-devel` was installed, building from source ensures the correct version/setup sometimes.
94+
95+
{% highlight c# tabtitle="C#" %}
96+
97+
sudo yum groupinstall "Development Tools" -y
98+
sudo yum install gcc make wget tar -y
99+
100+
cd /tmp # Temporarily move to /tmp for build
101+
wget https://download.sourceforge.net/libpng/libpng-1.6.40.tar.gz
102+
tar -xzf libpng-1.6.40.tar.gz
103+
cd libpng-1.6.40
104+
./configure
105+
make
106+
sudo make install
107+
108+
{% endhighlight %}
109+
110+
Step 8: **Create Symbolic Link for libdl** : The .NET runtime often expects `libdl.so` to be directly accessible from its shared library path. You need to create a symbolic link from its actual location to the .NET runtime directory.
111+
112+
First, find the path of your installed .NET runtime version:
113+
114+
{% highlight c# tabtitle="C#" %}
115+
116+
dotnet --list-runtimes
117+
118+
{% endhighlight %}
119+
120+
The output will be similar to this (note the version number might differ slightly):
121+
122+
{% highlight c# tabtitle="C#" %}
123+
124+
Microsoft.AspNetCore.App 8.0.x [/usr/lib64/dotnet/shared/Microsoft.AspNetCore.App]
125+
Microsoft.NETCore.App 8.0.x [/usr/lib64/dotnet/shared/Microsoft.NETCore.App]
126+
127+
{% endhighlight %}
128+
129+
Now, create the symbolic link. `Replace 8.0.17` with the exact version number from your `dotnet --list-` output for `Microsoft.NETCore.App`.
130+
131+
{% highlight c# tabtitle="C#" %}
132+
133+
sudo ln -s /usr/lib64/libdl.so.2 /usr/lib64/dotnet/shared/Microsoft.NETCore.App/8.0.17/libdl.so
134+
135+
{% endhighlight %}
136+
137+
Step 9: Create Symbolic Link for `libpng16`
138+
139+
Create a symbolic link for the `libpng16` package to ensure it's accessible in common library paths.
140+
141+
{% highlight c# tabtitle="C#" %}
142+
143+
sudo ln -s /usr/local/lib/libpng16.so.16 /lib64/libpng16.so.16
144+
145+
{% endhighlight %}
146+
147+
Step 10: Create Symbolic Link for `liblept`
148+
149+
Similarly, create a symbolic link for the `liblept` package (Leptonica library).
150+
151+
{% highlight c# tabtitle="C#" %}
152+
153+
sudo ln -s /usr/local/lib/liblept.so.5 /lib64/liblept.so.5
154+
155+
{% endhighlight %}
156+
157+
Step 11: **Implement the Project Code** : To set up your project's OCR functionality, consult the comprehensive guide on [Perform OCR in Linux](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/working-with-ocr/linux).
158+
159+
Step 12: **Set Permissions for Tesseract Binaries** : Navigate to your application's Tesseract binaries directory and set read, write, and execute permissions. This is crucial for the OCR process to function correctly. Important: You need to change `bin/Debug/net8.0/runtimes/linux/native` to the actual path where your Syncfusion Tesseract binaries (e.g., `libSyncfusionTesseract.so, liblept1753.so`) are located within your published application.
160+
161+
{% highlight c# tabtitle="C#" %}
162+
163+
sudo chmod 777 libSyncfusionTesseract.so
164+
sudo chmod 777 liblept1753.so
165+
166+
{% endhighlight %}
167+
168+
Step 13: **Build and Run Your .NET Application** : Finally, build and publish your .NET application, and then run it.
169+
170+
{% highlight c# tabtitle="C#" %}
171+
172+
sudo dotnet build
173+
174+
sudo dotnet publish -c Release -o ./publish
175+
176+
cd publish
177+
178+
sudo dotnet PdfProcessingApi.dll --urls "http://0.0.0.0:5000"
179+
180+
{% endhighlight %}
181+
182+
Remember to replace `PdfProcessingApi.dll` with the actual name of your application's entry-point DLL.
183+
184+
By executing the program, you will get the PDF document as follows. The output will be saved in parallel to the program.cs file.
185+
![OCR Linux Output](OCR-Images/OCR-output-image.png)
186+
187+
A complete working sample can be downloaded from [Github](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/Linux).
188+
189+
190+

0 commit comments

Comments
 (0)