How to Extract Text from PDF
Compare with other popular file formats, PDF is secure to protect document from unauthorized using. But sometimes, if the author wants to use the content from his/her PDF document, it will be some kind of difficult because he/she can not simply copy PDF content and past out into clipboard. Here in this article, you will find a solution to extract text from PDF by easily using C#.
How to Extract Text from PDF via C#
Through the help of Spire.PDF, a powerful and professional PDF document creation component, users can easily use C# to extract text from PDF. Download Spire.PDF and install on system. Follow the simple steps below to extract PDF text.
Step 1 Create Project
Create a C# windows form project in visual studio. Drag a button on and add Spire.Pdf.dll as reference. The default setting of Spire.Pdf.dll is placed under "C:\Program Files\e-iceblue\Spire.Pdf\Bin”. Select assembly Spire.Pdf.dll and click OK to add it to the project.
using System;
using System.IO;
using System.Drawing;
using System.Collections.Generic;
using System.Text;
using Spire.Pdf;
namespace ExtractPDFIMG
{
class Program
{
static void Main(string[] args)
{
}
}
}
Step 2 Load PDF File
Put the PDF file which we need extract text content out from into the project folder and use the code below to load it into the project.
//Create a pdf document
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(@"..\..\Sample.pdf");
Step 3 Extract Text from PDF
Spire.PDF presents an easy solution to extract text from PDF. Use the simple code below, we can do this job effortlessly.
StringBuilder buffer = new StringBuilder();
foreach (PdfPageBase page in doc.Pages)
{
buffer.Append(page.ExtractText());
}
Step 4 Save Text Content
After finish extracting PDF text, Spire.PDF can help us save the text content into a .text file.
//save text
String fileName = "TextInPdf.txt";
File.WriteAllText(fileName, buffer.ToString());
Press F5 to start the project. A text file will be generated in project folder bin -> debug.
Original text:
How to Extract Text from PDF via C#
Through the help of Spire.PDF, a powerful and professional PDF document creation component, users can easily use C# to extract text from PDF. Download Spire.PDF and install on system. Follow the simple steps below to extract PDF text.
Step 1 Create Project
Create a C# windows form project in visual studio. Drag a button on and add Spire.Pdf.dll as reference. The default setting of Spire.Pdf.dll is placed under "C:\Program Files\e-iceblue\Spire.Pdf\Bin”. Select assembly Spire.Pdf.dll and click OK to add it to the project.
using System;
using System.IO;
using System.Drawing;
using System.Collections.Generic;
using System.Text;
using Spire.Pdf;
namespace ExtractPDFIMG
{
class Program
{
static void Main(string[] args)
{
}
}
}
Step 2 Load PDF File
Put the PDF file which we need extract text content out from into the project folder and use the code below to load it into the project.
//Create a pdf document
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(@"..\..\Sample.pdf");
Step 3 Extract Text from PDF
Spire.PDF presents an easy solution to extract text from PDF. Use the simple code below, we can do this job effortlessly.
StringBuilder buffer = new StringBuilder();
foreach (PdfPageBase page in doc.Pages)
{
buffer.Append(page.ExtractText());
}
Step 4 Save Text Content
After finish extracting PDF text, Spire.PDF can help us save the text content into a .text file.
//save text
String fileName = "TextInPdf.txt";
File.WriteAllText(fileName, buffer.ToString());
Press F5 to start the project. A text file will be generated in project folder bin -> debug.
Original text:
Output Text:
Spire.PDF can also easily extract images from PDF. And it supports saving output images as most of popular image formats including PNG, JPEG, BMP, Tiff, etc.
More about Spire.PDF:
Spire.PDF can be used on the server-side (ASP.NET or any other environment) or with Windows Forms applications. Click to learn more...
More about Spire.PDF:
Spire.PDF can be used on the server-side (ASP.NET or any other environment) or with Windows Forms applications. Click to learn more...