[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing Decoder trait to be more composable. (#938) #1008

Merged
merged 2 commits into from
Jun 2, 2022
Merged

Conversation

Narsil
Copy link
Collaborator
@Narsil Narsil commented Jun 1, 2022
  • Changing Decoder trait to be more composable.

* Changing `Decoder` trait to be more composable.

Fix #872

* Fixing Python side.

* Fixing test.

* Updating cleanup signature, removing turbofish.
@Narsil Narsil requested a review from mishig25 June 1, 2022 12:22
@Narsil Narsil marked this pull request as draft June 1, 2022 12:22
@mishig25
Copy link
Contributor
mishig25 commented Jun 1, 2022

Got 2 questions:

  1. What is the difference between Changing Decoder trait to be more composable. #938 & Changing Decoder trait to be more composable. (#938) #1008 and why was Changing Decoder trait to be more composable. #938 reverted in the first place?
  2. Do you need to create tokenizers::decoders::Sequence?

@Narsil
Copy link
Collaborator Author
Narsil commented Jun 1, 2022

Got 2 questions:

1. What is the difference between [Changing `Decoder` trait to be more composable. #938](https://github.com/huggingface/tokenizers/pull/938) & [Changing `Decoder` trait to be more composable. (#938) #1008](https://github.com/huggingface/tokenizers/pull/1008) and why was [Changing `Decoder` trait to be more composable. #938](https://github.com/huggingface/tokenizers/pull/938) reverted in the first place?

This one is not breaking

2. Do you need to create `tokenizers::decoders::Sequence`?

Yes.

@Narsil Narsil marked this pull request as ready for review June 1, 2022 15:50
@HuggingFaceDocBuilderDev
Copy link
HuggingFaceDocBuilderDev commented Jun 1, 2022

The documentation is not available anymore as the PR was closed or merged.

fn decode(&self, tokens: Vec<String>) -> Result<String>;
fn decode(&self, tokens: Vec<String>) -> Result<String> {
let results = self.decode_chain(tokens)?;
Ok(results.join(""))
Copy link
Contributor
@mishig25 mishig25 Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this line cause any problem for the SequenceDecoder?
Specifically, one unlikekly edge-case I'm thinking is: one of the sequence decoders is another sequence decoder, and Ok(results.join("")) will cause an informaition loss (vec of str now just becomes a str)?

as a solution, we can implement fn decode for Sequence Decoder?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK I think.

The parent Sequence would call the child Sequence.decode_chain NOT the decode function, so we're good, no ?

I could add a test to make sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh I see, so it is no problem in that case

Copy link
Contributor
@mishig25 mishig25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@Narsil Narsil merged commit 943b542 into main Jun 2, 2022
@Narsil Narsil deleted the decoder_trait branch June 2, 2022 12:43
Narsil added a commit that referenced this pull request Aug 23, 2022
* Changing `Decoder` trait to be more composable. (#938)

* Changing `Decoder` trait to be more composable.

Fix #872

* Fixing Python side.

* Fixing test.

* Updating cleanup signature, removing turbofish.

* Adding `Sequence` Decoder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants