forked from explosion/spaCy
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.jade
156 lines (130 loc) · 6.84 KB
/
index.jade
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
//- 💫 LANDING PAGE
include _includes/_mixins
+landing-header
h1.c-landing__title.u-heading-0
| Industrial-Strength#[br]
| Natural Language#[br]
| Processing
h2.c-landing__title.o-block.u-heading-3
span.u-text-label.u-text-label--light in Python
+grid.o-content.c-landing__blocks
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Fastest in the world
p
| spaCy excels at large-scale information extraction tasks.
| It's written from the ground up in carefully memory-managed
| Cython. Independent research has confirmed that spaCy is
| the fastest in the world. If your application needs to
| process entire web dumps, spaCy is the library you want to
| be using.
+button("/usage/facts-figures", true, "primary")
| Facts & figures
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Get things done
p
| spaCy is designed to help you do real work — to build real
| products, or gather real insights. The library respects
| your time, and tries to avoid wasting it. It's easy to
| install, and its API is simple and productive. We like to
| think of spaCy as the Ruby on Rails of Natural Language
| Processing.
+button("/usage", true, "primary")
| Get started
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Deep learning
p
| spaCy is the best way to prepare text for deep learning.
| It interoperates seamlessly with TensorFlow, PyTorch,
| scikit-learn, Gensim and the rest of Python's awesome AI
| ecosystem. With spaCy, you can easily construct linguistically
| sophisticated statistical models for a variety of NLP problems.
+button("/usage/training", true, "primary")
| Read more
.o-content
+grid
+grid-col("two-thirds")
+terminal("lightning_tour.py", "More examples", "/usage/spacy-101#lightning-tour").
# Install: pip install spacy && python -m spacy download en
import spacy
# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load('en')
# Process a document, of any size
text = open('war_and_peace.txt').read()
doc = nlp(text)
# Find named entities, phrases and concepts
for entity in doc.ents:
print(entity.text, entity.label_)
# Determine semantic similarities
doc1 = nlp(u'the fries were gross')
doc2 = nlp(u'worst fries ever')
doc1.similarity(doc2)
# Hook in your own deep learning models
nlp.add_pipe(load_my_model(), before='parser')
+grid-col("third")
+h(2) Features
+list
+item Non-destructive #[strong tokenization]
+item #[strong Named entity] recognition
+item Support for #[strong #{LANG_COUNT}+ languages]
+item #[strong #{MODEL_COUNT} statistical models] for #{MODEL_LANG_COUNT} languages
+item Pre-trained #[strong word vectors]
+item Easy #[strong deep learning] integration
+item Part-of-speech tagging
+item Labelled dependency parsing
+item Syntax-driven sentence segmentation
+item Built in #[strong visualizers] for syntax and NER
+item Convenient string-to-hash mapping
+item Export to numpy data arrays
+item Efficient binary serialization
+item Easy #[strong model packaging] and deployment
+item State-of-the-art speed
+item Robust, rigorously evaluated accuracy
+landing-banner("Convolutional neural network models", "New in v2.0")
p
| spaCy v2.0 features new neural models for #[strong tagging],
| #[strong parsing] and #[strong entity recognition]. The models have
| been designed and implemented from scratch specifically for spaCy, to
| give you an unmatched balance of speed, size and accuracy. A novel
| bloom embedding strategy with subword features is used to support
| huge vocabularies in tiny tables. Convolutional layers with residual
| connections, layer normalization and maxout non-linearity are used,
| giving much better efficiency than the standard BiLSTM solution.
| Finally, the parser and NER use an imitation learning objective to
| deliver accuracy in-line with the latest research systems,
| even when evaluated from raw text. With these innovations, spaCy
| v2.0's models are #[strong 10× smaller],
| #[strong 20% more accurate], and #[strong even cheaper to run] than
| the previous generation.
.o-block-small.u-text-right
+button("/models", true, "secondary-light") Download models
+landing-logos("spaCy is trusted by", logos)
+button(gh("spacy") + "/stargazers", false, "secondary", "small")
| and many more
+landing-logos("Featured on", features).o-block-small
+landing-banner("Prodigy: Radically efficient machine teaching", "From the makers of spaCy")
p
| Prodigy is an #[strong annotation tool] so efficient that data scientists can
| do the annotation themselves, enabling a new level of rapid
| iteration. Whether you're working on entity recognition, intent
| detection or image classification, Prodigy can help you
| #[strong train and evaluate] your models faster. Stream in your own examples or
| real-world data from live APIs, update your model in real-time and
| chain models together to build more complex systems.
.o-block-small.u-text-right
+button("https://prodi.gy", true, "secondary-light") Try it out
.o-content
+grid
+grid-col("half")
+h(2) Benchmarks
p
| In 2015, independent researchers from Emory University and
| Yahoo! Labs showed that spaCy offered the
| #[strong fastest syntactic parser in the world] and that its
| accuracy was #[strong within 1% of the best] available
| (#[+a("https://aclweb.org/anthology/P/P15/P15-1038.pdf") Choi et al., 2015]).
| spaCy v2.0, released in 2017, is more accurate than any of
| the systems Choi et al. evaluated.
.o-inline-list
+button("/usage/facts-figures#benchmarks", true, "secondary") See details
+grid-col("half")
include usage/_facts-figures/_benchmarks-choi-2015