Skip to content

Commit

Permalink
Refactoring of all classes and packages, tokenizer (Apache Lucene rem…
Browse files Browse the repository at this point in the history
…oved) to be Android ready, and the single language concept towards multi-language. Some fixes. Some Java 1.7 compatible object-oriented and performance improvements. Tests 100% OK!
  • Loading branch information
nunoachenriques committed Aug 29, 2017
1 parent a453763 commit 56e7e14
Show file tree
Hide file tree
Showing 20 changed files with 1,083 additions and 489 deletions.
56 changes: 40 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,23 @@ VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon
and rule-based sentiment analysis tool that is _specifically attuned
to sentiments expressed in social media_.

This is a fork with **API and package names breaking changes** of the
This is an implementation of the VADER in Java. It started as a fork of the
[Java port by Animesh Pandey](https://github.com/apanimesh061/VaderSentimentJava)
of the
[NLTK VADER sentiment analysis module](http://www.nltk.org/api/nltk.sentiment.html#module-nltk.sentiment.vader)
written in Python and optimized from the original.

- The [NLTK](http://www.nltk.org/_modules/nltk/sentiment/vader.html)
Python source code.
- The [Original](https://github.com/cjhutto/vaderSentiment) Python
source code by the paper's author C.J. Hutto.
of the [NLTK VADER sentiment analysis module](http://www.nltk.org/api/nltk.sentiment.html#module-nltk.sentiment.vader)
written in Python ([NLTK VADER source code](http://www.nltk.org/_modules/nltk/sentiment/vader.html))
from the [original project](https://github.com/cjhutto/vaderSentiment) by
the paper's author C.J. Hutto. It's the same algorithm as an improved
tool by extensive rewriting with **relevant changes**:

- Android ready.
- API and package names breaking changes.
- Java 1.7 compatible.
- Performance improvements (e.g., `LinkedList` where's better O() than
`ArrayList`).

**In progress**

- Multi-language (refer to section [Languages](#languages)).

## Repository

Expand Down Expand Up @@ -51,22 +58,39 @@ https://github.com/nunoachenriques/vader-sentiment-analysis/releases

## Testing

The tests from the original Java port are validated against the ground truth of
the original Python (NLTK) implementation. The algorithm running is still the
original implementation from Hutto & Gilbert in Python and ported to Java by
Animesh Pandey.
All tests are **100% OK** as expected!

```shell
./gradlew test
```

## Languages

To support several languages there's the `Language` interface
(`text` subpackage) to be implemented and, eventually, the `Tokenizer` too.
The **main effort** will be in all the research around the specific language
significant words, idiomatic expressions, constant and empirical values.
Moreover, a data set has to be produced and validated by humans as
_ground truth_ for testing purposes.

### English (Germanic family of languages)

The tests from the original Java port are validated against the _ground truth_
of the original Python (NLTK) implementation. The algorithm running is still the
original implementation from Hutto & Gilbert in Python and originally ported to
Java by Animesh Pandey with modifications by Nuno A. C. Henriques.

### Portuguese (Italic family of languages)

**TODO**

## Use case example

As a Java library it will easily integrates with a bit of coding.
As a Java library it will easily integrate with a bit of coding.

```java
...
ArrayList<String> sentences = new ArrayList<String>() {{
List<String> sentences = new LinkedList<>() {{
add("VADER is smart, handsome, and funny.");
add("VADER is smart, handsome, and funny!");
add("VADER is very smart, handsome, and funny.");
Expand All @@ -86,7 +110,7 @@ ArrayList<String> sentences = new ArrayList<String>() {{
add("Today kinda sux! But I'll get by, lol");
}};

SentimentAnalysis sa = new SentimentAnalysis();
SentimentAnalysis sa = new SentimentAnalysis(new TokenizerEnglish(), new English());

for (String sentence : sentences) {
System.out.println(sentence);
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.1
2.0.0
8 changes: 8 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,14 @@ repositories {
dependencies {
implementation fileTree(include: ['*.jar'], dir: 'src/main/dist')
testImplementation 'junit:junit:4.12'
testImplementation 'org.apache.lucene:lucene-core:5.5.4'
testImplementation 'org.apache.lucene:lucene-analyzers-common:5.5.4'
}

test {
// For comparision only, different results on Tokenizer vs. Lucene
// but final ground truth results are the same (expected).
exclude 'net/nunoachenriques/vader/text/Tokenizer*'
}

// EXTRA PACKAGING FOR RELEASE DISTRIBUTION
Expand Down
Binary file removed src/main/dist/commons-lang-2.6.jar
Binary file not shown.
Binary file removed src/main/dist/lucene-analyzers-common-5.5.4.jar
Binary file not shown.
Binary file removed src/main/dist/lucene-core-5.5.4.jar
Binary file not shown.
39 changes: 39 additions & 0 deletions src/main/java/net/nunoachenriques/vader/Constant.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
/*
* Copyright 2017 Nuno A. C. Henriques [nunoachenriques.net]
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package net.nunoachenriques.vader;

/**
* The VADER's constant values (e.g., NORMALIZE_SCORE_ALPHA_DEFAULT)
* for configurations and other uses among the algorithm.
*
* @author Nuno A. C. Henriques [nunoachenriques.net]
*/
class Constant {

// Private constructor to avoid instantiation.
private Constant() {
// Void!
}

// TODO check SentimentAnalysis for missing constants!

static final float NORMALIZE_SCORE_ALPHA_DEFAULT = 15.0f;
static final float ALL_CAPS_BOOSTER_SCORE = 0.733f;
static final float N_SCALAR = -0.74f;
static final float EXCLAMATION_BOOST = 0.292f;
static final float QUESTION_BOOST_COUNT_3 = 0.18f;
static final float QUESTION_BOOST = 0.96f;
}
Loading

0 comments on commit 56e7e14

Please sign in to comment.