How to use the corpus_segment function to transfer sentences to a separate column?

asked 2021-11-09 11:00:00 +0000

scrum
21 ●2 ●2

edit retag flag offensive close merge delete

add a comment

1 Answer

Sort by » oldest newest most voted

answered 2022-01-01 16:00:00 +0000

nofretete
31 ●3 ●5

First, import the necessary modules:

from nltk.corpus import reuters
from nltk import sent_tokenize
import pandas as pd

Load the Reuters Corpus:

sentences = reuters.sents()

Use sent_tokenize to create a list of sentences:

sentences = [sent_tokenize(" ".join(sentence)) for sentence in sentences]

Create a DataFrame from the sentences list:

df = pd.DataFrame({'text': [item for sublist in sentences for item in sublist]})

Use corpus_segment to add a column of the corresponding categories for each sentence:

df['category'] = reuters.categories(fileids=[idx for idx, _ in reuters.fileids() if _ in df['text'].tolist()])

This will result in a DataFrame with two columns: text containing the sentences and category containing the corresponding categories.

edit flag offensive delete link

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer

Question Tools

Stats

Asked: 2021-11-09 11:00:00 +0000

Seen: 7 times

Last updated: Jan 01 '22

How to use the corpus_segment function to transfer sentences to a separate column? edit

1 Answer