Fix decode #1555

ArthurZucker · 2024-06-18T06:50:23Z

This revert the previous breaking change.

Also add a new ByteLevel normalizer, which replaces the ByteLevel pre_tokenizer.
Checked that we can add chines / Cyrillic tokens which are properly encoded and decoder.

Fixes #1392

* feat(ci): add trufflehog secrets detection * fix(ci): remove unnecessary permissions

HuggingFaceDocBuilderDev · 2024-06-18T12:16:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2024-06-28T09:39:22Z

tokenizers/src/tokenizer/mod.rs

+        let tokens = ids
+            .iter()
+            .filter_map(|id| {
+                self.added_vocabulary
+                    .id_to_token(*id, &self.model)
+                    .filter(|token| {
+                        !skip_special_tokens || !self.added_vocabulary.is_special_token(token)
+                    })
+            })
+            .collect::<Vec<_>>();
+
+        if let Some(decoder) = &self.decoder {
+            decoder.decode(tokens)


reverted to what we originally had

nathaniel-daniel and others added 11 commits June 17, 2024 14:45

Make USED_PARALLELISM atomic (#1532)

0f72528

Fixing for clippy 1.78 (#1548)

c910d73

feat(ci): add trufflehog secrets detection (#1551)

6d17c4b

* feat(ci): add trufflehog secrets detection * fix(ci): remove unnecessary permissions

Switch from cached_download to hf_hub_download in tests (#1547)

b23d8c3

Fix 'dictionnary' typo (#1511)

b801eed

current state of affaiers

d59973a

something that works

34d7cf8

feature dependent test

3ff2016

nit about 嗎

948358f

Merge branch 'main' into fix-decode

17a286a

fix merge conflicts

b107253

ArthurZucker mentioned this pull request Jun 19, 2024

Breaking changes in v0.19.1 for tiktoken/llama3 #1512

Open

ArthurZucker added 3 commits June 28, 2024 09:29

update

487810a

actuallyfix it

ed3428a

revert and simplify

f53e514

ArthurZucker force-pushed the fix-decode branch from dd6e99b to f53e514 Compare June 28, 2024 07:36

update the test

0f72944

ArthurZucker marked this pull request as ready for review June 28, 2024 08:30

ArthurZucker added 3 commits June 28, 2024 10:53

add it

eff3eee

fix

de895de

stub

be0ae2e

ArthurZucker requested a review from Narsil June 28, 2024 09:38

ArthurZucker commented Jun 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix decode #1555

Fix decode #1555

Fix decode #1555

Are you sure you want to change the base?

Fix decode #1555

Conversation

Choose a reason for hiding this comment