Stanford CoreNLP provides a set of natural language analysis tools written in Java. It can take raw human language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize and interpret dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases or word dependencies, and indicate which noun phrases refer to the same entities. It was originally developed for English, but now also provides varying levels of support for (Modern Standard) Arabic, (mainland) Chinese, French, German, and Spanish. Stanford CoreNLP is an integrated framework, which makes it very easy to apply a bunch of language analysis tools to a piece of text. Starting from plain text, you can run all the tools with just two lines of code. Its analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications. Stanford CoreNLP is a set of stable and well-tested natural language processing tools, widely used by various groups in academia, industry, and government. The tools variously use rule-based, probabilistic machine learning, and deep learning components.
The Stanford CoreNLP code is written in Java and licensed under the GNU General Public License (v3 or later). Note that this is the full GPL, which allows many free uses, but not its use in proprietary software that you distribute to others.
Several times a year we distribute a new version of the software, which corresponds to a stable commit.
During the time between releases, one can always use the latest, under development version of our code.
Here are some helpful instructions to use the latest code:
Sometimes we will provide updated jars here which have the latest version of the code.
At present, the current released version of the code is our most recent released jar, though you can always build the very latest from GitHub HEAD yourself.
cd CoreNLP ; ant
cd CoreNLP/classes ; jar -cf ../stanford-corenlp.jar edu
mvn package, it should run the tests and build this jar file:
stanford-corenlp-models-current.jaryou will need to set
-Dclassifier=models. Here is the sample command for Spanish:
mvn install:install-file -Dfile=/location/of/stanford-spanish-corenlp-models-current.jar -DgroupId=edu.stanford.nlp -DartifactId=stanford-corenlp -Dversion=4.0.0 -Dclassifier=models-spanish -Dpackaging=jar
The models jars that correspond to the latest code can be found in the table below.
Some of the larger (English) models -- like the shift-reduce parser and WikiDict -- are not distributed with our default models jar. These require downloading the English (extra) and English (kbp) jars. Resources for other languages require usage of the corresponding models jar.
|Language||Model Jar||Last Updated|
You can find releases of Stanford CoreNLP on Maven Central.
You can find more explanation and documentation on the Stanford CoreNLP homepage.
For information about making contributions to Stanford CoreNLP, see the file CONTRIBUTING.md.