Releases: explosion/spaCy
v3.7.5: Download sanitization, Typer compatibility, and a bugfix for linking gold entities
✨ New features and improvements
- Sanitize direct download for
spacy download
(#13313). - Convert Cython properties to decorator syntax (#13390).
- Bump Weasel pin to allow v0.4.x (#13409).
- Improvements to the test suite (#13469, #13470).
- Bump Typer pin to allow v0.10.0 and above (#13471).
- Allow
typing-extensions<5.0.0
for Python < 3.8 (#13516).
🔴 Bug fixes
- #13400: Fix
use_gold_ents
behaviour for EntityLinker.
📖 Documentation and examples
- Make the file name for code listings stick to the top (#13379).
- Update the documentation of
MorphAnalysis
(#13433). - Typo fixes in the documentation (#13466).
👥 Contributors
@danieldk, @honnibal, @ines, @JoeSchiff, @nokados, @Paillat-dev, @rmitsch, @schorfma, @strickvl, @svlandeg, @ynx0
v3.7.4: New textcat layers and fo/nn language extensions
✨ New features and improvements
- Improve NumPy 2.0 compatibility (#13103).
- Added language extensions for Faroese and Norwegian Nynorsk (#13116).
- Add new
TextCatReduce.v1
layer for text classification (#13181). - Add new
TextCatParametricAttention.v1
layer for text classification (#13201). - Use
build
module for creating model packages by default (#13109). - Add support for code loading to the
benchmark speed
command (#13247). - Extend lexical attributes for English with more numericals (#13106).
- Warn about reloading dependencies after downloading models (#13081).
🔴 Bug fixes
- #13259, #13304, #13321: Correctness fixes for multiprocessing support in
Language.pipe
. - #13187: Typing and documentation fixes for
Doc
. - #13086: Update
Tokenizer.explain
for special cases with whitespace. - #13068: Fix displaCy span stacking.
- #13149: Add spacy.TextCatBOW.v3 to use the fixed
SparseLinear
layer.
📖 Documentation and examples
- Many improvements and updates to the LLM documentation.
- Update
trf_data
examples and the transformer pipeline design section.
👥 Contributors
@adrianeboyd, @danieldk, @evornov, @honnibal, @ines, @lise-brinck, @ridge-kimani, @rmitsch, @shadeMe, @svlandeg
v3.7.2: Fixes for APIs and requirements
✨ New features and improvements
- Update
__all__
fields (#13063).
🔴 Bug fixes
- #13035: Remove Pathy requirement.
- #13053: Restore
spacy.cli.project
API. - #13057: Support
Any
comparisons forToken
andSpan
.
📖 Documentation and examples
- Many updates for
spacy-llm
including Azure OpenAI, PaLM, and Mistral support. - Various documentation corrections.
👥 Contributors
v3.7.1: Bug fix for spacy.cli module loading
🔴 Bug fixes
- Revert lazy loading of CLI module for
spacy.info
to fix availability ofspacy.cli
followingimport spacy
(#13040).
👥 Contributors
v3.7.0: Trained pipelines using Curated Transformers and support for Python 3.12
This release drops support for Python 3.6 and adds support for Python 3.12.
✨ New features and improvements
- Add support for Python 3.12 (#12979).
- Use the new library Weasel for spaCy projects functionality (#12769).
- All
spacy project
commands should run as before, just now they're using Weasel under the hood. ⚠️ Remote storage is not yet supported for Python 3.12. Use Python 3.11 or earlier for remote storage.
- All
- Extend to Thinc v8.2 (#12897).
- Extend
transformers
extra tospacy-transformers
v1.3 (#13025). - Support registered vectors (#12492).
- Add
--spans-key
option for CLI evaluation withspacy benchmark accuracy
(#12981). - Load the CLI module lazily for
spacy.info
(#12962). - Add type stubs for
spacy.training.example
(#12801). - Warn for unsupported pattern keys in dependency matcher (#12928).
Language.replace_listeners
: Pass the replaced listener and thetok2vec
pipe to the callback in order to supportspacy-curated-transformers
(#12785).- Always use
tqdm
withdisable=None
to disable output in non-interactive environments (#12979). - Language updates:
- Package setup updates:
- Update NumPy build constraints for NumPy 1.25+ (#12839). For Python 3.9+, it is no longer necessary to set build constraints while building binary wheels.
- Refactor Cython profiling in order to disable profiling for Python 3.12 in the package setup, since Cython does not currently support profiling for Python 3.12 (#12979).
📦 Trained pipelines updates
The transformer-based trf
pipelines have been updated to use our new Curated Transformers library through the Thinc model wrappers and pipeline component from spaCy Curated Transformers.
⚠️ Backwards incompatibilities
- Drop support for Python 3.6.
- Drop mypy checks for Python 3.7.
- Remove
ray
extra. spacy project
has a few backwards incompatibilities due to the transition to the standalone library Weasel, which is not as tightly coupled to spaCy. Weasel produces warnings when it detects older spaCy-specific settings in your environment or project config.- Support for the
spacy_version
configuration key has been dropped. - Support for the
check_requirements
configuration key has been dropped due to the deprecation ofpkg_resources
. - The
SPACY_CONFIG_OVERRIDES
environment variable is no longer checked. You can set configuration overrides usingWEASEL_CONFIG_OVERRIDES
. - Support for
SPACY_PROJECT_USE_GIT_VERSION
environment variable has been dropped. - Error codes are now Weasel-specific and do not follow spaCy error codes.
- Support for the
📖 Documentation and examples
- New and updated documentation for large language models and spaCy Curated Transformers.
- Various documentation corrections and updates.
- New additions to the spaCy Universe:
- Hobbit spaCy: NLP for Middle Earth
- rolegal: a spaCy Package for Noisy Romanian Legal Document Processing
👥 Contributors
@adrianeboyd, @bdura, @connorbrinton, @danieldk, @davidberenstein1957, @denizcodeyaa, @eltociear, @evornov, @honnibal, @ines, @jmyerston, @koaning, @magdaaniol, @pdhall99, @ringohoffman, @rmitsch, @senisioi, @shadeMe, @svlandeg, @vinbo8, @wjbmattingly
v3.6.1: Support for Pydantic v2, find-function CLI and more
✨ New features and improvements
- Allow Pydantic v2 using transitional v1 support (#12888).
- Add
find-function
CLI for finding locations of registered functions (#12757). - Add extra
spacy[cuda12x]
forcupy-cuda12x
(#12890). - Extend tests for
init config
andtrain
CLI (#12173). - Switch from
distutils
tosetuptools
/sysconfig
(#12853).
🔴 Bug fixes
- #12817: Escape annotated HTML tags in displaCy span renderer.
- #12857: Display model's full base version string in incompatibility warning.
- #12882: Update
<br>
tags in displaCy.
📖 Documentation and examples
👥 Contributors
@adrianeboyd, @afriedman412, @arplusman, @bdura, @connorbrinton, @honnibal, @ines, @it176131, @pmbaumgartner, @rmitsch, @shadeMe, @svlandeg, @thomashacker, @victorialslocum, @x-tabdeveloping
v3.6.0: New span finder component and pipelines for Slovenian
✨ New features and improvements
- NEW:
span_finder
pipeline component to identify overlapping, unlabeled spans (#12507). - Language updates:
- Add option to return scores separately keyed by component name with
spacy evaluate --per-component
,Language.evaluate(per_component=True)
andScorer.score(per_component=True)
(#12540). - Support custom token/lexeme attribute for vectors (#12625).
- Support
spancat_singlelabel
inspacy debug data
CLI (#12749). - Typing updates for
PhraseMatcher
andSpanGroup
(#12642, #12714).
🔴 Bug fixes
- #12569: Require that all
SpanGroup
spans come from the current doc.
📦 Trained pipelines updates
We have added new pipelines for Slovenian that use the trainable lemmatizer and floret vectors.
Package | UPOS | Parser LAS | NER F |
---|---|---|---|
sl_core_news_sm |
96.9 | 82.1 | 62.9 |
sl_core_news_md |
97.6 | 84.3 | 73.5 |
sl_core_news_lg |
97.7 | 84.3 | 79.0 |
sl_core_news_trf |
99.0 | 91.7 | 90.0 |
- 🙏 Special thanks to @orglce for help with the new pipelines!
The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize "get" as a passive auxiliary.
The Danish pipeline da_core_news_trf
has been updated to use vesteinn/DanskBERT
with performance improvements across the board.
⚠️ Backwards incompatibilities
SpanGroup
spans are now required to be from the same doc. When initializing aSpanGroup
, there is a new check to verify that all added spans refer to the current doc. Without this check, it was possible to run into string store or other errors.
📖 Documentation and examples
- Various documentation corrections and updates.
- New additions to spaCy Universe:
👥 Contributors
@adrianeboyd, @bdura, @danieldk, @davidberenstein1957, @diyclassics, @essenmitsosse, @honnibal, @ines, @isabelizimm, @jmyerston, @kadarakos, @KennethEnevoldsen, @khursani8, @ljvmiranda921, @rmitsch, @shadeMe, @svlandeg, @tomaarsen, @victorialslocum, @vin-ivar, @ZiadAmerr
v3.5.4: Bug fixes for overrides with registered functions and sourced components with listeners
v3.3.3: Bug fixes for Pydantic and pip
This bug fix release is primarily to address Pydantic incompatibility with typing_extensions>=4.6.0
.
✨ New features and improvements
- Huge speed improvements for
spancat
, in particular on GPU (~10x-30x faster) (#12577).
🔴 Bug fixes
- Add
typing_extensions
requirement due to Pydantic incompatibility withtyping_extensions>=4.6.0
. - Remove
#egg
from download URLs due to future deprecation inpip
.
👥 Contributors
v3.2.6: Bug fixes for Pydantic and pip
This bug fix release is primarily to address Pydantic incompatibility with typing_extensions>=4.6.0
.
✨ New features and improvements
- Huge speed improvements for
spancat
, in particular on GPU (~10x-30x faster) (#12577).
🔴 Bug fixes
- Add
typing_extensions
requirement due to Pydantic incompatibility withtyping_extensions>=4.6.0
. - Remove
#egg
from download URLs due to future deprecation inpip
.