What is the method to retrieve text from PyPDF2 without getting a detailed output?

method

asked 2023-06-24 07:43:51 +0000

qstack
143 ●2 ●70 ●14

edit retag flag offensive close merge delete

add a comment

1 Answer

Sort by » oldest newest most voted

answered 2023-06-24 08:11:02 +0000

david
31 ●16 ●4

The method to retrieve text from PyPDF2 without getting a detailed output is to use the extract_text() method with the simple_text parameter set to True. This will extract only the raw text without any formatting or additional information:

import PyPDF2

pdf_file = open('example.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)

text = pdf_reader.getPage(0).extract_text(simple_text=True)

print(text)

This will print the raw text extracted from the first page of the PDF file, without any additional formatting or information.

edit flag offensive delete link

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer

Question Tools

Stats

Asked: 2023-06-24 07:43:51 +0000

Seen: 8 times

Last updated: Jun 24 '23

What is the method to retrieve text from PyPDF2 without getting a detailed output? edit

1 Answer