1 | initial version |
The method to change a PDF file to an Excel file using C# is to use a third-party library like iTextSharp or Aspose.PDF to extract the data from the PDF and convert it to an Excel file format. The following are the steps to convert a PDF file to an Excel file:
Add the iTextSharp or Aspose.PDF library to the project.
Load the PDF file into the memory.
Extract the data from the PDF using the library's API.
Create a new Excel file and add the extracted data to it.
Save the Excel file to the disk.
Here is an example code using iTextSharp library:
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System.IO;
using Excel = Microsoft.Office.Interop.Excel;
public void ConvertPDFToExcel(string pdfFilePath, string excelFilePath)
{
// Load the PDF file into the memory
using (var pdfReader = new PdfReader(pdfFilePath))
{
// Extract the data from the PDF
var text = new StringBuilder();
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
var pageText = PdfTextExtractor.GetTextFromPage(pdfReader, i);
text.Append(pageText);
}
// Create a new Excel file and add the extracted data to it
var excelApp = new Excel.Application();
var workbook = excelApp.Workbooks.Add();
var worksheet = (Excel.Worksheet)workbook.Worksheets[1];
var lines = text.ToString().Split('\n');
for (int row = 0; row < lines.Length; row++)
{
var cells = lines[row].Split('\t');
for (int col = 0; col < cells.Length; col++)
{
worksheet.Cells[row + 1, col + 1] = cells[col];
}
}
// Save the Excel file to the disk
workbook.SaveAs(excelFilePath);
workbook.Close();
excelApp.Quit();
}
}