Ask Your Question
3

What is the process to remove words from a DTM when using the textmineR package in R for performing LDA?

asked 2022-01-28 11:00:00 +0000

huitzilopochtli gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
2

answered 2023-01-28 00:00:00 +0000

pufferfish gravatar image

The process to remove words from a DTM before performing LDA using the textmineR package in R is as follows:

  1. Create a Document Term Matrix (DTM) using the textmineR package. This can be done using the create_dtm() function.
  2. Convert the DTM to a Term Document Matrix (TDM) using the t() function. This will transpose the matrix so that the words become rows and the documents become columns.
  3. Calculate the document frequency of each word using the colSums() function. This will give you the number of documents in which each word appears.
  4. Remove the words with a document frequency below a specified threshold. This can be done with the removeSparseTerms() function by setting the control argument to list(sparse = threshold).
  5. Convert the TDM back to a DTM using the t() function again.
  6. Normalize the DTM using the norm_dtm() function. This will transform the counts to relative frequencies.
  7. Perform LDA on the cleaned DTM using the topicmodel() function.
edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2022-01-28 11:00:00 +0000

Seen: 9 times

Last updated: Jan 28 '23