[go: nahoru, domu]

Skip to content

Releases: tensorflow/text

2.2.0 release

11 May 18:20
Compare
Choose a tag to compare

Release 2.2

Major Features and Improvements

Breaking Changes

Bug Fixes and Other Changes

  • Update version

Thanks to our Contributors

v2.2.0-rc2

10 Apr 22:59
Compare
Choose a tag to compare
v2.2.0-rc2 Pre-release
Pre-release

Bug fixes

  • Force MacOS builds to build for OSX 10.9 so they can be installed to a wider range of MacOS versions.

v2.2.0-rc1

17 Mar 20:52
Compare
Choose a tag to compare
v2.2.0-rc1 Pre-release
Pre-release

Release 2.2.0-rc1

Major Features and Improvements

  • Add op for solving max-spanning-tree (MST) problems. The code here is intended for NLP applications, but attempts to remain agnostic to particular NLP tasks (such as dependency parsing).
  • Add max_spanning_tree_gradient.
  • Add support for 'preserve_unused_tokens' options in BertTokenizer.

Bug Fixes and Other Changes

  • Documentation updates.
  • Reorganize the BUILD file for keras layers.
  • Update model server testing. The test script now generates a model that integrates into tf serving's testing infra.
  • Remove unneeded heavy dependencies in regex_split library.
  • Turn TF text's ConstrainedSequence implementations into standalone callable functions.
  • Fix bug in ViterbiAnalysis computation triggered when not using transition_weights.
  • Removing testing_utils run_tf_function which is enabled by default now.
  • Update patch params to work with Bazel >=1.0.0
  • Remove circular dependencies by removing submodule imports from ragged package.
  • Prevent lack of ragged_ops.py being released in TF from breaking tf.Text

Thanks to our Contributors

This release contains contributions from many people at Google, as well as:

Hyunwoo Cho

v2.1.1

01 Feb 02:19
Compare
Choose a tag to compare

Minor Updates

  • BertTokenizer to accept a string tensor for the vocab_lookup_table.

Bug Fixes

  • Update ICU data name so as to not conflict with core TF in model server.

v2.1.0-rc0

17 Dec 22:13
Compare
Choose a tag to compare
v2.1.0-rc0 Pre-release
Pre-release

Major Updates

  • Added SplitMergeTokenizer.
  • Add support for token offsets to BertTokenizer.

Minor Updates

  • Give BertTokenizer ability to read in a vocab file directly.
  • Migrate from std::string to tensorflow::tstring.
  • Many build script improvements.
  • Update ToDense layer with ragged support attribute.

Bug Fixes

  • Update SentencePiece to inherit from TokenizerWithOffsets.
  • Fix ICU data linking issue.

v2.0.1

13 Nov 18:25
Compare
Choose a tag to compare

Bug Fixes

  • Missing regex character class used in BertTokenizer
  • Mac build issue

v1.15.1

13 Nov 18:23
Compare
Choose a tag to compare

Bug Fixes

  • Missing regex character class used in BertTokenizer
  • Mac build issue

v2.0.0

08 Nov 20:22
Compare
Choose a tag to compare

Major Updates

  • Added a regex_split op.
  • Fixes a bug in case_fold_utf8 and normalize_utf8 ops where they were unable to locate the ICU data file.
  • Fixed a problem with the BertTokenizer where it was using merge_dims which is unreleased for the corresponding version of TensorFlow.
  • Updated the BertTokenizer to use regex_split to match the exact regex used by original BERT.

v1.15.0

08 Nov 20:21
Compare
Choose a tag to compare

Major Updates

  • Added a regex_split op.
  • Fixes a bug in case_fold_utf8 and normalize_utf8 ops where they were unable to locate the ICU data file.
  • Fixed a problem with the BertTokenizer where it was using merge_dims which is unreleased for the corresponding version of TensorFlow.
  • Updated the BertTokenizer to use regex_split to match the exact regex used by original BERT.

v2.0.0-rc0

19 Oct 00:08
Compare
Choose a tag to compare

Please note that moving forward our releases and branches will match the major & minor versions of core TensorFlow. This should prevent future confusion. As such, this (previously 1.0) release is 2.0, and we will be skiping straight to 1.15 for the next 1.x release to support TF 1.15.

Major Updates:

  • SentencepieceTokenizer has been added. Please see https://github.com/google/sentencepiece for more information on Sentencepiece.
  • New ToDense Keras layer for RaggedTensor conversion
  • Pipeline for generating a Wordpiece Vocabulary has been added to tools.
  • New Rouge-L metric op for measuring text similarity. A new colab has been added to the examples directory which provides usage examples.
  • New BertTokenizer which mimics the preprocessing performed in the original BERT model.
  • New Detokenizer abstract class has been added to the TF.Text Tokenizer API.
  • Many previously released ops have been added TensorFlow Serving model server. Please see https://github.com/tensorflow/serving for more information.

Minor Updates:

  • API docs have received an update that should make finding relevant information easier.
  • Wordpiece: Add support for splitting unknown characters
  • Wordpiece: Add support for max characters per token
  • Wordshape: Fix finding of currency symbols.
  • Update Whitespace & UnicodeScript Tokenizers to accept scalar values.
  • Build includes CC library targets. Useful for statically linking in TF.Text custom ops. Specifically useful for building into TF.Serving's model server.
  • Build environment: Updated to match core TF's update.