-
Notifications
You must be signed in to change notification settings - Fork 743
Issues: huggingface/tokenizers
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Enabling simpler flow of information from the Something isn't working
enhancement
New feature or request
Feature Request
Tokenizer
to the trainer
.
bug
Feature request: tokenizer training benchmarks
enhancement
New feature or request
#1206
opened Apr 4, 2023 by
sradc
added_tokens with bytemap charaters in ByteLevel could not be decoded correctly
bug
Something isn't working
#1392
opened Nov 16, 2023 by
DOGEwbx
Support PyArrow arrays as tokenizer input
enhancement
New feature or request
good first issue
Good for newcomers
#1415
opened Dec 14, 2023 by
mariosasko
How to convert tokenizers.tokenizer to XXTokenizerFast in transformers?
planned
#1468
opened Mar 5, 2024 by
rangehow
Discrepancy Between GitHub Release and NPM Package Version & Missing Dependencies
#1489
opened Apr 10, 2024 by
superBertBerg
BPE Trainer doesn't respect the
vocab_size
parameter when dataset size is increased
#1514
opened Apr 25, 2024 by
Abhinay1997
UnigramTrainer: byte_fallback is false.
Feature Request
training
#1515
opened Apr 25, 2024 by
Moddus
Link to download the training text in
docs/source/quicktour.rst
is broken
#1526
opened May 9, 2024 by
14jdelap
Assign
<unusedXX>
tokens with special_tokens
without growing vocab size
Feature Request
planned
#1473
opened Mar 17, 2024 by
jacobwjs
Previous Next
ProTip!
Follow long discussions with comments:>50.