The NRC Emotion Intensity Lexicon (NRC-EIL) |
|
Download the NRC Emotion Intensity Lexicon (Non-Commercial Use Only -- Research or Educational)
You may also be interested in these companion lexicons: NRC Emotion Lexicon and NRC Valence, Arousal, and Dominance Lexcion. (The full list of word-emotion, word-sentiment, and word-colour lexicons is available in the Lexicons page.) |
|
|
Words can be associated with different intensities (or degrees) of an emotion. For example, most people will agree that the word condemn is associated with a greater degree of anger (or more anger) than the word irritate. However, annotating instances for fine-grained degrees of affect is a substantially more difficult undertaking than categorical annotation: respondents are presented with greater cognitive load and it is particularly hard to ensure consistency (both across responses by different annotators and within the responses produced by the same annotator). We created an affect intensity lexicon with real-valued scores of association using best--worst scaling. We refer to this lexicon as the NRC Emotion/Affect Intensity Lexicon. You can access a copy for non-commercial use by clicking on the download button above. (See terms of use at the bottom of this page.)
For a given word w and emotion e, the scores range from 0 to 1.
Papers
Word Affect Intensities. Saif M. Mohammad. In Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018), May 2018, Miyazaki, Japan.
Paper (pdf) BibTeX PresentationThis study has been approved by the NRC Research Ethics Board (NRC-REB) under protocol number 2017-98. REB review seeks to ensure that research projects involving humans as participants meet Canadian standards of ethics.
Practical and Ethical Considerations
Please see the papers below for ethical considerations involved in automatic emotion detection and the use of emotion lexicons. (These also acts as the Ethics and Data Statements for the lexicon.)
Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis. Computational Linguistics. June 2022.
Paper (pdf) BibTeX Slides- Practical and Ethical Considerations in the Effective use of Emotion and Sentiment Lexicons
Saif M. Mohammad. arXiv preprint arXiv:2011.03492. December 2020.
Paper (pdf) BibTex
Python Code to Analyze Emotions in Text
There are many third party software packages that can be used in conjunction with the NRC Emotion Lexicon to analyze emotion word use in text. We recommend Emotion Dynamics.
It is the primary package that we use to analyze text using the NRC Emotion Lexicon and the NRC VAD Lexicon. It can be used to generate a csv file with a number of emotion features pertaining to the text of interest, including metrics of utterance emotion dynamics.
Details
The lexicon has close to 10,000 entries for eight emotions that Robert Plutchik argued to be basic or universal. It includes common English terms as well as terms that are more prominent in social media platforms, such as Twitter. It includes terms that are associated with emotions to various degrees. For a given emotion, this even includes some terms that may not predominantly convey that emotion (or that convey an antonymous emotion), and yet tend to co-occur with terms that do. (Antonymous terms tend to co-occur with each other more often than chance, and are particularly problematic when one uses automatic co-occurrence-based statistical methods to capture word--emotion connotations.) Example entries from the lexicon are shown below.
The NRC EIL Lexicon has affect annotations for English words. Despite some cultural differences, it has been shown that a majority of affective norms are stable acrosslanguages. Thus, we provide versions of the lexicon in over 100 languages by translating the English terms using Google Translate (August 2022).
The lexicon is thus available for English and these languages:
Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Corsican, Croatian, Czech, Danish, Dutch, Esperanto, Estonian, Filipino, Finnish, French, Frisian, Gaelic, Galician, Georgian, German, Greek, Gujarati, HaitianCreole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Kinyarwanda, Korean, Kurmanji, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Odia, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Sanskrit, Serbian, Sesotho, Shona, Simplified, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Tatar, Telugu, Thai, Traditional, Turkish, Turkmen, Ukranian, Urdu, Uyghur, Uzbek, Vietnamese, Welsh, Xhosa, Yiddish, Yoruba, Zulu
Note that an earlier version included translations obtained in 2018. The current 2022 translations are markedly better. That said, some of the translations may still be incorrect or they may simply be transliterations of the original English terms.
Terms of use: |
|
We will be happy to hear from you. For example: |
We regularly collaborate with graduate students, post-docs, faculty, and research professional from Computer Science, Psychology, Digital Humanities, Linguistics, Social Science, etc. Email: Dr. Saif M. Mohammad (saif.mohammad@nrc-cnrc.gc.ca, uvgotsaif@gmail.com) |
Last updated: August 2022 |