[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose num_added_tokens on Python side #146

Merged
merged 2 commits into from
Feb 14, 2020

Conversation

mfuntowicz
Copy link
Member

without the need to pass an Encoding to added_tokens.

This allows to compute the max sentence length for single/pair inputs without actually the need to have an Encoding structure.
As the number of added tokens is fixed and static during compilation it allows more flexible usage of the method.

Signed-off-by: Morgan Funtowicz morgan@huggingface.co

…coding to added_tokens.

This allows to compute the max sentence length for single/pair inputs without actually the need to have an Encoding structure.
As the number of added tokens is fixed and static during compilation it allows more flexible usage of the method.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
Copy link
Member
@n1t0 n1t0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with the new API on the PostProcessor. I don't have an example that requires an Encoding anyway!

For the way it's integrated in the BaseTokenizer on Python's side, I think it may be time to expose more of the different parts of the pipeline, to allow accessing these when needed from outside. What do you think?

bindings/python/src/tokenizer.rs Outdated Show resolved Hide resolved
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
@mfuntowicz mfuntowicz merged commit c4bac6a into master Feb 14, 2020
@mfuntowicz mfuntowicz deleted the post_processor_added_tokens branch February 14, 2020 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants