Sentiment and Emotion Lexicons This page lists various word association lexicons that capture word-sentiment, word-emotion, and word-colour associations. They can be used for analysing emotions in text. See Terms of Use at the bottom of the page. Please see the Emotion Lexicons: Ethics and Data Statement before using a lexicon.
Contact: Saif M. Mohammad (saif.mohammad@nrc-cnrc.gc.ca)
Code:
Emotion Dynamics (Python) Code to analyze emotions in text using emotion lexicons. The script generates a csv file with a number of emotion features of the text, including metrics of utterance emotion dynamics. Associated Paper.
Released April 2022, this is the primary and official package to analyze text using the NRC Emotion Lexicon and the NRC VAD Lexicon.
Manually Created Lexicons These lexicons are created by manual annotation. The lexicons with real-valued scores are created using Best-Worst Scaling, producing fine-grained, yet highly reliable annotation values. |
|||||
Large Manually Created Emotion and Sentiment Lexicons | |||||
Lexicon | Version |
# of Terms | Categories | Association Scores | Method of Creation |
1a. NRC Word-Emotion Association Lexicon (also called NRC Emotion lexicon or EmoLex). README. Explore the interactive visualization. Homepage of the Lexicon. Also available in over 40 other languages here. The sense-level annotations provided by individual annotators for the eight emotions can also be obtained. |
|||||
0.92 (2010) |
14,182 unigrams (words) |
sentiments: emotions: |
0 (not associated) or 1 (associated) | Manual: By crowdsourcing Domain: General |
|
~25,000 senses |
not associated, weakly, moderately, or strongly associated | ||||
Papers:
|
|||||
1b. NRC Emotion Intensity Lexicon (aka Affect Intensity Lexicon), created using Best-Worst Scaling. |
|||||
|
|||||
2. NRC Valence, Arousal, Dominance Lexicon, created using Best-Worst Scaling. |
|||||
1 (2018) |
~20,000 terms |
Valence |
0 (lowest V/A/D) to 1 (highest V/A/D) | Manual: By crowdsourcing Domain: General |
|
Paper:
|
|||||
3. NRC WorryWords Lexicon. |
|||||
1 (2024) |
~44,000 terms |
calmness--anxiety |
-3 (highest calmness) to 3 (highest anxiety) | Manual: By crowdsourcing Domain: General |
|
Paper:
|
|||||
Manually Created Sentiment Composition Lexicons |
|||||
Lexicon | Version |
# of Terms | Categories | Association Scores | Method of Creation |
1. Sentiment Composition Lexicon of Negators, Modals, and Adverbs (SCL-NMA), aka SemEval-2016 General English Sentiment Modifiers Lexicon, created using Best-Worst Scaling (aka MaxDiff) |
|||||
1.0 (Feb. 2016) |
~3200 terms | sentiments: negative, positive |
Real-valued score between -1 (most negative) to 1 (most positive) | Manual. By crowdsourcing and using Best-Worst Scaling. Domain: General |
|
Papers:
|
|||||
2. SemEval-2015 English Twitter Sentiment Lexicon, created using Best-Worst Scaling (aka MaxDiff) |
|||||
1.0 (Feb. 2015) |
~1500 terms | sentiments: negative, positive |
Real-valued score between -1 (most negative) to 1 (most positive) | Manual. By crowdsourcing and using Best-Worst Scaling. Domain: Twitter |
|
Paper:
|
|||||
3. Sentiment Composition Lexicon of Opposing Polarity Phrases (SCL-OPP) aka SemEval-2016 English Twitter Mixed Polarity Lexicon, created using Best-Worst Scaling (aka MaxDiff) |
|||||
1.0 (Feb. 2016) |
~1200 terms | sentiments: negative, positive |
Real-valued score between -1 (most negative) to 1 (most positive) | Manual. By crowdsourcing and using Best-Worst Scaling. Domain: Twitter |
|
Paper:
|
|||||
Large Manually Created Word-Colour Association Lexicon | |||||
Lexicon | Version |
# of Terms | Categories | Association Scores | Method of Creation |
0.92 (2011) |
~14,000 words |
colours: black, blue, brown, green, grey, orange purple, pink, red, white, yellow |
0 (not associated) or 1 (associated) | Manual: Crowdsourcing on Mechanical Turk. Domain: General |
|
~25,000 senses | not, weakly, moderately, or strongly associated | ||||
Papers:
|
|||||
Automatically Created Lexicons These lexicons are automatically extracted from large amounts of text using co-occurrence information. For example, the Hashtag Emotion Lexicon is generated from tweets and the score for a word--emotion pair is a quantification of the word's tendency to co-occur with the emotion-word hashtag. These are usually much larger than manually created lexicons. They have higher coverage, especially of terms often seen in the corpus that the lexicon is extracted from. However, the emotion scores can be less accurate than those in the manually created lexicons above. |
|||||
Large Automatically Generated Word-Emotion Association Lexicon | |||||
Lexicon | Version |
# of Terms | Categories | Association Scores | Method of Creation |
1. NRC Hashtag Emotion Lexicon. The Hashtag Emotion Corpus (aka Twitter Emotion Corpus, or TEC) used to create the lexicon. |
|||||
0.2 (2013) |
16,862 unigrams (words) | emotions: anger, anticipation, disgust, fear, joy, sadness, surprise, trust |
Real-valued score between 0 (not associated) to ∞ (maximally associated) | Automatic: From tweets with emotion word hashtags. Domain: Twitter |
|
Papers:
|
|||||
Large Automatically Generated Word-Sentiment Association Lexicons | |||||
Lexicon | Version |
# of Terms | Categories | Association Scores | Method of Creation |
1. NRC Twitter Sentiment Lexicons (NRC Hashtag Sentiment Lexicons and Sentiment140 Lexicons) | |||||
1.0 (2013) |
54,129 unigrams | sentiments: negative, positive |
Real-valued score between -∞ (most negative) to ∞ (most positive) | Automatic: From tweets with sentiment word hashtags. Domain: Twitter |
|
316,531 bigrams | |||||
308,808 pairs | |||||
b. NRC Hashtag Affirmative Context Sentiment Lexicon and NRC Hashtag Negated Context Sentiment Lexicon |
|||||
1.0 (2014) |
Affirmative contexts: 36,357 unigrams Negated contexts: 7,592 unigrams |
sentiments: negative, positive |
Real-valued score between -∞ (most negative) to ∞ (most positive) | Automatic: From tweets with sentiment word hashtags. Separate entries for affirmative and negated contexts. Domain: Twitter
|
|
Affirmative contexts: 159,479 bigrams Negated contexts: 23,875 bigrams |
|||||
c. Emoticon Lexicon aka Sentiment140 Lexicon (note that this is sentiment lexicon drawn from emoticons, and is not an emotion lexicon) |
|||||
1.0 (2014) |
62,468 unigrams | sentiments: negative, positive |
Real-valued score between -∞ (most negative) to ∞ (most positive) | Automatic: From tweets with emoticons. Domain: Twitter |
|
677,698 bigrams | |||||
480,010 pairs | |||||
d. Sentiment140 Affirmative Context Lexicon and Sentiment140 Negated Context Lexicon |
|||||
1.0 (2014) |
Affirmative contexts: 45,255 unigrams Negated contexts: 9,891 unigrams |
sentiments: negative, positive |
Real-valued score between -∞ (most negative) to ∞ (most positive) | Automatic: From tweets with sentiment word hashtags. Separate entries for affirmative and negated contexts. Domain: Twitter |
|
Affirmative contexts: 240,076 bigrams Negated contexts: 34,093 bigrams |
|||||
Papers (describing the four NRC Twitter Lexicons listed above):
|
|||||
2. Yelp and Amazon Sentiment Lexicons | |||||
a. Yelp Restaurant Sentiment Lexicon |
|||||
1.0 (2014) |
39,274 entries for unigrams (includes affirmative and negated context entries) | sentiments: negative, positive |
Real-valued score between -∞ (most negative) to ∞ (most positive) | Automatic: From customer reviews on Yelp.com. Domain: Restaurant
|
|
276,651 entries for bigrams | |||||
The Yelp Word–Aspect Association Lexicons are also made available. |
|||||
1.0 (2014) |
26,577 entries for unigrams (includes affirmative and negated context entries) | sentiments: negative, positive |
Real-valued score between -∞ (most negative) to ∞ (most positive) | Automatic: From customer reviews on Amazon.com. Domain: Laptop |
|
155,167 entries for bigrams | |||||
Paper (describing the Yelp and Amazon Lexicons):
|
|||||
0.1 (2009) |
76,400 terms | sentiments: negative, positive |
binary distinction: negative or positive | Automatic: Using the structure of a thesaurus and affixes. Domain: General |
|
Paper:
|
Links to commonly accessed resources:
Terms of use: