Ask Your Question
3

After reading my x train values, what is the way to preserve the DistilBertTokenizer tokenizer?

asked 2021-09-11 11:00:00 +0000

djk gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2022-04-03 10:00:00 +0000

devzero gravatar image

To preserve the DistilBertTokenizer tokenizer, you can save it as a file using the PyTorch save_pretrained() method. Here is an example:

from transformers import DistilBertTokenizer

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')

train_texts = [...]  # list of training texts

# Tokenize the training texts
train_encodings = tokenizer(train_texts, truncation=True, padding=True)

# Save the tokenizer
tokenizer.save_pretrained('/path/to/tokenizer')

The save_pretrained() method saves the tokenizer configuration file and the vocabulary file to the specified directory. You can then load the tokenizer later using the from_pretrained() method:

from transformers import DistilBertTokenizer

tokenizer = DistilBertTokenizer.from_pretrained('/path/to/tokenizer')

This will load the tokenizer configuration and vocabulary from the directory, allowing you to tokenize new texts using the same vocabulary and settings as before.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2021-09-11 11:00:00 +0000

Seen: 1 times

Last updated: Apr 03 '22