1 | initial version |
The method to retrieve text from PyPDF2 without getting a detailed output is to use the extract_text()
method with the simple_text
parameter set to True
. This will extract only the raw text without any formatting or additional information:
import PyPDF2
pdf_file = open('example.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
text = pdf_reader.getPage(0).extract_text(simple_text=True)
print(text)
This will print the raw text extracted from the first page of the PDF file, without any additional formatting or information.