火炬文本sentencepiece_numericalizer()输出一个生成器,其索引SentencePiece模型对应于输入句子中的令牌。从发电机,我可以得到身份证。
我的问题是如何在训练后拿回课文?
例如
>>> sp_id_generator = sentencepiece_numericalizer(sp_model)
>>> list_a = ["sentencepiece encode as pieces", "examples to try!"]
>>> list(sp_id_generator(list_a))
[[9858, 9249, 1629, 1305, 1809, 53, 842],
[2347, 13, 9, 150, 37]]如何将list_a转换回t(即"sentencepiece encode as pieces", "examples to try!")?
发布于 2022-04-29 07:52:43
Torchtext没有实现这一点,但是您可以直接使用SentencePiece封装。可从PyPi安装。
import sentencepiece as spm
sp = spm.SentencePieceProcessor(model_file='test/test_model.model')
sp.decode([9858, 9249, 1629, 1305, 1809, 53, 842])https://datascience.stackexchange.com/questions/110454
复制相似问题