Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

The method to retrieve text from PyPDF2 without getting a detailed output is to use the extract_text() method with the simple_text parameter set to True. This will extract only the raw text without any formatting or additional information:

import PyPDF2

pdf_file = open('example.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)

text = pdf_reader.getPage(0).extract_text(simple_text=True)

print(text)

This will print the raw text extracted from the first page of the PDF file, without any additional formatting or information.