Changing `Decoder` trait to be more composable. (#938) #1008

Narsil · 2022-06-01T12:22:26Z

Changing Decoder trait to be more composable.

* Changing `Decoder` trait to be more composable. Fix #872 * Fixing Python side. * Fixing test. * Updating cleanup signature, removing turbofish.

mishig25 · 2022-06-01T13:04:12Z

Got 2 questions:

What is the difference between Changing Decoder trait to be more composable. #938 & Changing Decoder trait to be more composable. (#938) #1008 and why was Changing Decoder trait to be more composable. #938 reverted in the first place?
Do you need to create tokenizers::decoders::Sequence?

Narsil · 2022-06-01T15:50:12Z

Got 2 questions:

1. What is the difference between [Changing `Decoder` trait to be more composable. #938](https://github.com/huggingface/tokenizers/pull/938) & [Changing `Decoder` trait to be more composable. (#938) #1008](https://github.com/huggingface/tokenizers/pull/1008) and why was [Changing `Decoder` trait to be more composable. #938](https://github.com/huggingface/tokenizers/pull/938) reverted in the first place?

This one is not breaking

2. Do you need to create `tokenizers::decoders::Sequence`?

Yes.

HuggingFaceDocBuilderDev · 2022-06-01T15:57:45Z

The documentation is not available anymore as the PR was closed or merged.

mishig25 · 2022-06-02T10:31:10Z

tokenizers/src/tokenizer/mod.rs

-    fn decode(&self, tokens: Vec<String>) -> Result<String>;
+    fn decode(&self, tokens: Vec<String>) -> Result<String> {
+        let results = self.decode_chain(tokens)?;
+        Ok(results.join(""))


will this line cause any problem for the SequenceDecoder?
Specifically, one unlikekly edge-case I'm thinking is: one of the sequence decoders is another sequence decoder, and Ok(results.join("")) will cause an informaition loss (vec of str now just becomes a str)?

as a solution, we can implement fn decode for Sequence Decoder?

It's OK I think.

The parent Sequence would call the child Sequence.decode_chain NOT the decode function, so we're good, no ?

I could add a test to make sure.

ooh I see, so it is no problem in that case

mishig25

lgtm!

* Changing `Decoder` trait to be more composable. (#938) * Changing `Decoder` trait to be more composable. Fix #872 * Fixing Python side. * Fixing test. * Updating cleanup signature, removing turbofish. * Adding `Sequence` Decoder.

Changing Decoder trait to be more composable. (#938)

04895ab

* Changing `Decoder` trait to be more composable. Fix #872 * Fixing Python side. * Fixing test. * Updating cleanup signature, removing turbofish.

Narsil requested a review from mishig25 June 1, 2022 12:22

Narsil marked this pull request as draft June 1, 2022 12:22

Adding Sequence Decoder.

df6253e

Narsil marked this pull request as ready for review June 1, 2022 15:50

mishig25 reviewed Jun 2, 2022

View reviewed changes

mishig25 approved these changes Jun 2, 2022

View reviewed changes

Narsil merged commit 943b542 into main Jun 2, 2022

Narsil deleted the decoder_trait branch June 2, 2022 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing `Decoder` trait to be more composable. (#938) #1008

Changing `Decoder` trait to be more composable. (#938) #1008

Changing Decoder trait to be more composable. (#938) #1008

Changing Decoder trait to be more composable. (#938) #1008

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Changing `Decoder` trait to be more composable. (#938) #1008

Changing `Decoder` trait to be more composable. (#938) #1008