Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.
Version: | 0.1.3 |
Depends: | R (≥ 2.10) |
Imports: | Rcpp (≥ 0.11.5) |
LinkingTo: | Rcpp |
Published: | 2023-09-15 |
DOI: | 10.32614/CRAN.package.tokenizers.bpe |
Author: | Jan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), VK.com [cph], Gregory Popovitch [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), The Abseil Authors [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), Ivan Belonogov [ctb, cph] (Files at src/youtokentome (MIT License)) |
Maintainer: | Jan Wijffels <jwijffels at bnosac.be> |
License: | MPL-2.0 |
URL: | https://github.com/bnosac/tokenizers.bpe |
NeedsCompilation: | yes |
Materials: | README NEWS |
In views: | NaturalLanguageProcessing |
CRAN checks: | tokenizers.bpe results |
Reference manual: | tokenizers.bpe.pdf |
Package source: | tokenizers.bpe_0.1.3.tar.gz |
Windows binaries: | r-devel: tokenizers.bpe_0.1.3.zip, r-release: tokenizers.bpe_0.1.3.zip, r-oldrel: tokenizers.bpe_0.1.3.zip |
macOS binaries: | r-release (arm64): tokenizers.bpe_0.1.3.tgz, r-oldrel (arm64): tokenizers.bpe_0.1.3.tgz, r-release (x86_64): tokenizers.bpe_0.1.3.tgz, r-oldrel (x86_64): tokenizers.bpe_0.1.3.tgz |
Old sources: | tokenizers.bpe archive |
Reverse suggests: | doc2vec, sentencepiece, textrecipes |
Please use the canonical form https://CRAN.R-project.org/package=tokenizers.bpe to link to this page.