To retrieve the textual content from a PDF document using PdfSharp in C#, follow these steps:
Install the PdfSharp nuget package in your project.
Import the PdfSharp namespace:
using PdfSharp.Pdf;
using PdfSharp.Pdf.Content;
using PdfSharp.Pdf.Content.Objects;
PdfDocument document = PdfReader.Open("path/to/document.pdf", PdfDocumentOpenMode.ReadOnly);
ContentReader reader = new ContentReader(document.Pages[0]);
reader.RenderMode = PdfRenderMode.Text;
string content = reader.ReadContent();
Note: This method retrieves the textual content as it appears in the PDF document, which may include formatting and styling information. To extract only the plain text, you can use regular expressions or string manipulation to remove the unwanted characters.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2021-07-11 11:00:00 +0000
Seen: 17 times
Last updated: Oct 03 '21
What are the best Ways to compress a PDF?
What is the method to exhibit a complete pdf embedded in an iframe?
How can an image watermark be added over the text using jsPDF?
Why is the React Native PDF in the WebView displaying only its initial page?
How can custom fonts and attributes be designated for H1-H6 headings within a quarto PDF?
How can a spreadsheet sheet be exported in csv and pdf formats?
What steps should I follow to produce a PDF file in my NestJS application using object or JSON data?