Release 2.2

Major Features and Improvements

Breaking Changes

Bug Fixes and Other Changes

Update version

Thanks to our Contributors

Bug fixes

Force MacOS builds to build for OSX 10.9 so they can be installed to a wider range of MacOS versions.

Release 2.2.0-rc1

Major Features and Improvements

Add op for solving max-spanning-tree (MST) problems. The code here is intended for NLP applications, but attempts to remain agnostic to particular NLP tasks (such as dependency parsing).
Add max_spanning_tree_gradient.
Add support for 'preserve_unused_tokens' options in BertTokenizer.

Bug Fixes and Other Changes

Documentation updates.
Reorganize the BUILD file for keras layers.
Update model server testing. The test script now generates a model that integrates into tf serving's testing infra.
Remove unneeded heavy dependencies in regex_split library.
Turn TF text's ConstrainedSequence implementations into standalone callable functions.
Fix bug in ViterbiAnalysis computation triggered when not using transition_weights.
Removing testing_utils run_tf_function which is enabled by default now.
Update patch params to work with Bazel >=1.0.0
Remove circular dependencies by removing submodule imports from ragged package.
Prevent lack of ragged_ops.py being released in TF from breaking tf.Text

Thanks to our Contributors

This release contains contributions from many people at Google, as well as:

Hyunwoo Cho

Minor Updates

BertTokenizer to accept a string tensor for the vocab_lookup_table.

Bug Fixes

Update ICU data name so as to not conflict with core TF in model server.

Major Updates

Added SplitMergeTokenizer.
Add support for token offsets to BertTokenizer.

Minor Updates

Give BertTokenizer ability to read in a vocab file directly.
Migrate from std::string to tensorflow::tstring.
Many build script improvements.
Update ToDense layer with ragged support attribute.

Bug Fixes

Update SentencePiece to inherit from TokenizerWithOffsets.
Fix ICU data linking issue.

Bug Fixes

Missing regex character class used in BertTokenizer
Mac build issue

Bug Fixes

Missing regex character class used in BertTokenizer
Mac build issue

Major Updates

Added a regex_split op.
Fixes a bug in case_fold_utf8 and normalize_utf8 ops where they were unable to locate the ICU data file.
Fixed a problem with the BertTokenizer where it was using merge_dims which is unreleased for the corresponding version of TensorFlow.
Updated the BertTokenizer to use regex_split to match the exact regex used by original BERT.

Major Updates

Added a regex_split op.
Fixes a bug in case_fold_utf8 and normalize_utf8 ops where they were unable to locate the ICU data file.
Fixed a problem with the BertTokenizer where it was using merge_dims which is unreleased for the corresponding version of TensorFlow.
Updated the BertTokenizer to use regex_split to match the exact regex used by original BERT.

Please note that moving forward our releases and branches will match the major & minor versions of core TensorFlow. This should prevent future confusion. As such, this (previously 1.0) release is 2.0, and we will be skiping straight to 1.15 for the next 1.x release to support TF 1.15.

Major Updates:

SentencepieceTokenizer has been added. Please see https://github.com/google/sentencepiece for more information on Sentencepiece.
New ToDense Keras layer for RaggedTensor conversion
Pipeline for generating a Wordpiece Vocabulary has been added to tools.
New Rouge-L metric op for measuring text similarity. A new colab has been added to the examples directory which provides usage examples.
New BertTokenizer which mimics the preprocessing performed in the original BERT model.
New Detokenizer abstract class has been added to the TF.Text Tokenizer API.
Many previously released ops have been added TensorFlow Serving model server. Please see https://github.com/tensorflow/serving for more information.

Minor Updates:

API docs have received an update that should make finding relevant information easier.
Wordpiece: Add support for splitting unknown characters
Wordpiece: Add support for max characters per token
Wordshape: Fix finding of currency symbols.
Update Whitespace & UnicodeScript Tokenizers to accept scalar values.
Build includes CC library targets. Useful for statically linking in TF.Text custom ops. Specifically useful for building into TF.Serving's model server.
Build environment: Updated to match core TF's update.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 2.2

Major Features and Improvements

Breaking Changes

Bug Fixes and Other Changes

Thanks to our Contributors

Bug fixes

Release 2.2.0-rc1

Major Features and Improvements

Bug Fixes and Other Changes

Thanks to our Contributors

Minor Updates

Bug Fixes

Major Updates

Minor Updates

Bug Fixes

Bug Fixes

Bug Fixes

Major Updates

Major Updates

Releases: tensorflow/text

2.2.0 release

Release 2.2

Major Features and Improvements

Breaking Changes

Bug Fixes and Other Changes

Thanks to our Contributors

v2.2.0-rc2

Bug fixes

v2.2.0-rc1

Release 2.2.0-rc1

Major Features and Improvements

Bug Fixes and Other Changes

Thanks to our Contributors

v2.1.1

Minor Updates

Bug Fixes

v2.1.0-rc0

Major Updates

Minor Updates

Bug Fixes

v2.0.1

Bug Fixes

v1.15.1

Bug Fixes

v2.0.0

Major Updates

v1.15.0

Major Updates

v2.0.0-rc0