SciSpaCy

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds tokenization rules on top of spaCy's rule-based tokenizer, and a custom sentence segmenter that adds sentence segmentation rules on top of spaCy's statistical sentence segmenter.

Installation

Installing scispacy requires two steps: installing the library and intalling the models. To install the library, run:

pip install scispacy

to install a model, run:

pip install <model url>

Note: We strongly recommend that you use an isolated Python environment (such as virtualenv or conda) to install scispacy. Additionally, scispacy uses modern features of Python and is such only available for Python 3.5 or greater.

Once you have completed the above steps and downloaded one of the models below, you can load SciSpaCy as you would any other spaCy model. For example:

import spacy
nlp = spacy.load("en_scispacy_core_web_sm")

Available Models

en_core_sci_sm	A full SpaCy pipeline for biomedical data.
en_core_sci_md	A full SpaCy pipeline for biomedical data with a larger vocabulary and word vectors.
en_ner_craft_md	A SpaCy NER model trained on the CRAFT corpus.
en_ner_jnlpba_md	A SpaCy NER model trained on the JNLPBA corpus.
en_ner_bc5cdr_md	A SpaCy NER model trained on the BC5CDR corpus.
en_ner_bionlp13cg_md	A SpaCy NER model trained on the BIONLP13CG

Name		Name	Last commit message	Last commit date
Latest commit History 305 Commits
data		data
docs		docs
evaluation		evaluation
proto_model		proto_model
scispacy		scispacy
scripts		scripts
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pytest.ini		pytest.ini
requirements.in		requirements.in
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SciSpaCy

Installation

Available Models

About

Releases 11

Packages

Used by 890

Contributors 21

Languages

License

allenai/scispacy

Folders and files

Latest commit

History

Repository files navigation

SciSpaCy

Installation

Available Models

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 11

Packages 0

Used by 890

Contributors 21

Languages

Packages