Publications and Data
 

Publications on the Google Scholar page:

 

 

Publications and Data by Area (papers within each area are organized reverse chronologically)

Lexical Semantics
antonymy,
lexical contrast
metaphor relational
similarity
semantic
distance
text
summarization
textual
inference
word-colour
associations
word sense
disambiguation
Emotion, Personality, Stance, and Sentiment Analysis
emotion
analysis
best-worst
annotations*
music
from text
personality
detection
sentiment
analysis
sentiment
analysis (Arabic)*
sentiment
composition*

stance
detection*

Publications for areas marked with a * are interspersed within sentiment analysis and emotion analysis; clicking on them leads to separate dedicated pages that present only the relevant information.

 

Home

Emotion Analysis (joy, sadness, fear, optimism, anger, hope, etc.)

 

Data and System

Several word-emotion association lexicons (such as the NRC Emotion Lexicon), word-sentiment lexicons (such as the NRC Hashtag Sentiment Lexicon), and word-colour association lexicons are available here. For the NRC-Canada sentiment anaysis system, go here.

Papers

WASSA-2017 Shared Task on Emotion Intensity. Saif M. Mohammad and Felipe Bravo-Marquez. In Proceedings of the EMNLP 2017 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), September 2017, Copenhagen, Denmark.
Paper (pdf)    BibTex     Data and Shared Task    Presentation

Emotion Intensities in Tweets. Saif M. Mohammad and Felipe Bravo-Marquez. In Proceedings of the Sixth Joint Conference on Lexical and Computational Semantics (*Sem), August 2017, Vancouver, Canada.
Paper (pdf)    BibTex     Data and Shared Task    AffcetiveTweets package    Presentation

Word Affect Intensities. Saif M. Mohammad. arXiv preprint arXiv:1704.08798, April 2017.
Paper (pdf)   

Metaphor as a Medium for Emotion: An Empirical Study, Saif M. Mohammad, Ekaterina Shutova, and Peter Turney. In Proceedings of the Joint Conference on Lexical and Computational Semantics (*Sem), August 2016, Berlin, Germany.
Paper (pdf)   BibTeX    Presentation       Data and Interactive Visualization

Book Chapter

Sentiment Analysis: Detecting Valence, Emotions, and Other Affectual States from Text. Saif M. Mohammad, Emotion Measurement, 2016.
Pre-print version     BibTeX
This is a survey on automatic methods for affect analysis.

Paper

Determining Word-Emotion Associations from Tweets by Multi-Label Classification. Felipe Bravo-Marquez, Eibe Frank, Saif Mohammad, and Bernhard Pfahringer. In Proceedings of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI'16), Omaha, Nebraska, USA.
Paper (pdf)    BibTeX    Data (scroll to section on this paper)

Interactive Visualization and Paper

Imagisaurus: An Interactive Visualizer of Valence and Emotion in the Roget’s Thesaurus. Saif M. Mohammad. In Proceedings of the EMNLP 2015 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), September 2015, Lisbon, Portugal.
Paper (pdf)    BibTeX    Interactive Visualization

Data

The NRC Emotion Lexicon is now available in over 20 languages.

Tutorial

Computational Analysis of Affect and Emotion in Language. Saif M. Mohammad and Cecilia Ovesdotter Alm. Tutorial at the 2015 Conference on Empirical Methods on Natural Language Processing, September 2015, Lisboa, Portugal.
Presentation       Annotated Bibliography       Extended Bibliography      Proposal

Visualization

Explore the interactive visualization for the NRC Word-Emotion Association Lexicon.

Symposium

My N is Ten Million: Using Social Media to Track Emotion, Mental Health, and Measure Personality Across Entire Populations. Gregory J Park, Saif M Mohammad, and Johannes C Eichstaedt. A symposium at the International Convention of Psychological Science (ICPS), March 2015, Amsterdam, The Netherlands.

Journal paper

Sentiment, Emotion, Purpose, and Style in Electoral Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Xiaodan Zhu, and Joel Martin. Information Processing and Management, Volume 51, Issue 4, July 2015, Pages 480–499.
Paper (pdf)    BibTeX     AnnotatedData    UnannotatedData

Papers

Semantic Role Labeling of Emotions in Tweets. Saif M. Mohammad, Xiaodan Zhu, and Joel Martin, In Proceedings of the ACL 2014 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), June 2014, Baltimore, MD.
Paper (pdf)    BibTeX     AnnotatedData    UnannotatedData

Generating Music from Literature. Hannah Davis and Saif M. Mohammad, In Proceedings of the EACL Workshop on Computational Linguistics for Literature, April 2014, Gothenburg, Sweden.
Paper (pdf)   BibTeX    TransProse Website

Notable Press Mentions: The Physics arXiv Blog, March 20, 2014, TIME, May 7, 2014, PC World, May 15, 2014, Popular Science, May 14, 2014, io9, May 12, 2014, LiveScience, May 11, 2014.

Journal Papers

Using Hashtags to Capture Fine Emotion Categories from Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Computational Intelligence, Volume 31, Issue 2, Pages 301-326, May 2015.
Paper (pdf)    BibTeX

Crowdsourcing a Word-Emotion Association Lexicon, Saif Mohammad and Peter Turney, Computational Intelligence, 29 (3), 436-465, 2013.
Paper (pdf)    BibTeX

Press Mention: article in MIT Technology Review
Also published in crowdsourcing.org.

Data

The NRC Word-Emotion Association Lexicon (also called EmoLex) is available here. Explore the interactive visualization.

Papers

Using Nuances of Emotion to Identify Personality, Saif M. Mohammad and Svetlana Kiritchenko, In Proceedings of the ICWSM Workshop on Computational Personality Recognition, July 2013, Boston, USA.
Paper (pdf)
   BibTeX  Poster

Identifying Purpose Behind Electoral Tweets, Saif Mohammad, Svetlana Kiritchenko and Joel Martin, In Proceedings of the KDD Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM-2013), August 2013, Chicago, USA.
Paper (pdf)    BibTeX      AnnotatedData    UnannotatedData

Press Mention: article in TIME

Journal Papers

From Once Upon a Time to Happily Ever After: Tracking Emotions in Mail and Books, Saif Mohammad, Decision Support Systems, Volume 53, Issue 4, November 2012, Pages 730–741.
Paper (pdf)    BibTeX

The 2011 NRC technical report is available here: Sentiment Analysis of Mail and Books.

Binary Classifiers and Latent Sequence Models for Emotion Detection in Suicide Notes. Colin Cherry, Saif Mohammad, and Berry de Bruijn. Journal of Biomedical Informatics Insights, 5 (Suppl. 1), 147--154, January 2012.
Paper (pdf)    BibTeX

Paper

#Emotional Tweets, Saif Mohammad, In Proceedings of the First Joint Conference on Lexical and Computational Semantics (*Sem), June 2012, Montreal, Canada.
Paper (pdf)    BibTeX

Data

NRC Hashtag Emotion Lexicon. The Hashtag Emotion Corpus (aka Twitter Emotion Corpus, or TEC) used to create the lexicon.

Papers

Portable Features for Classifying Emotional Text, Saif Mohammad, In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2012, Montreal, Canada.
Paper (pdf)
    BibTeX

Getting Emotional About News. Alistair Kennedy, Anna Kazantseva, Saif Mohammad, Terry Copeck, Diana Inkpen, Stan Szpakowicz. Proceedings of the Text Analysis Conference (TAC-2011), November 2011, Gaithersburg, MD.
Paper (pdf)    BibTeX

Tracking Sentiment in Mail: How Genders Differ on Emotional Axes, Saif Mohammad and Tony Yang, In Proceedings of the ACL 2011 Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), June 2011, Portland, OR.
Paper (pdf)    BibTeX     Presentation 

Data

Collections of love letters, hate mail, and suicide notes.
A mapping of directory names in the Enron email corpus to email ids and to gender.

Papers

From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales, Saif Mohammad, In Proceedings of the ACL 2011 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), June 2011, Portland, OR.
Paper (pdf)    BibTeX     Presentation

Associations of Words with Emotion, Polarity, and Colour: Crowdsoursing a Lexicon, Saif Mohammad and Peter Turney, Technical Report, National Research Council Canada, Ottawa, Canada.
Paper (pdf)    BibTeX

Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon, Saif Mohammad and Peter Turney, In Proceedings of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, June 2010, LA, California.
Paper (pdf)    BibTeX    Presentation

Invited Talk

From Once Upon a Time to Happily Ever After: Tracking Emotions in Books and Mail.
         July 2011: Amazon, Social Media group and Digital Books group, Seattle, OR.
         June 2011: Social Media - "Big Data" Analysis Workshop, Defence R&D Canada, Ottawa, Canada.

 

Sentiment Analysis - Valence (positive, negative, neutral)

 

Data

Several word-emotion association lexicons (such as the NRC Emotion Lexicon), word-sentiment lexicons (such as the NRC Hashtag Sentiment Lexicon), and word-colour association lexicons are available here. For the NRC-Canada sentiment anaysis system, go here.

Paper

Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation. Kiritchenko, S. and Mohammad, S. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL-2017), Vancouver, Canada, 2017.
Paper (pdf)    BibTeX       Data

Journal Paper

Stance and Sentiment in Tweets. Saif M. Mohammad, Parinaz Sobhani, and Svetlana Kiritchenko. Special Section of the ACM Transactions on Internet Technology on Argumentation in Social Media, 2017, 17(3).
Paper (pdf)    BibTeX       Data and Visualization

Paper

Detecting Stance in Tweets And Analyzing its Interaction with Sentiment. Parinaz Sobhani, Saif M. Mohammad, and Svetlana Kiritchenko. In Proceedings of the Joint Conference on Lexical and Computational Semantics (*Sem), August 2016, Berlin, Germany.
Paper (pdf)   BibTeX     Presentation    Data and Visualization

Book Chapter

Challenges in Sentiment Analysis. Saif M. Mohammad, A Practical Guide to Sentiment Analysis, Springer, 2016.
Pre-print version (pdf)    BibTeX

Papers

Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best-Worst Scaling. Svetlana Kiritchenko and Saif M. Mohammad. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. June 2016. San Diego, CA.
Paper (pdf)   BibTeX    Presentation     Data   

Sentiment Composition of Words with Opposing Polarities. Svetlana Kiritchenko and Saif M. Mohammad. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. June 2016. San Diego, CA.
Paper (pdf)   BibTeX    Poster     Data: Opposing Polarity Sentiment Lexicon    Interactive Visualization

Semeval-2016 Task 6: Detecting Stance in Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. In Proceedings of the International Workshop on Semantic Evaluation (SemEval ’16). June 2016. San Diego, California.
Paper (pdf)    BibTeX    Presentation    Task Website

Semeval-2016 Task 7: Determining Sentiment Intensity of English and Arabic Phrases. Svetlana Kiritchenko, Saif M. Mohammad, and Mohammad Salameh. In Proceedings of the International Workshop on Semantic Evaluation (SemEval ’16). June 2016. San Diego, California.
Paper (pdf)    BibTeX    Presentation    Task Website

A Practical Guide to Sentiment Annotation: Challenges and Solutions. Saif M. Mohammad, In Proceedings of the NAACL 2016 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), June 2014, San Diego, California.
Paper (pdf)   BibTeX    Presentation    

The Effect of Negators, Modals, and Degree Adverbs on Sentiment Composition. Svetlana Kiritchenko and Saif M. Mohammad, In Proceedings of the NAACL 2016 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), June 2014, San Diego, California.
Paper (pdf)   BibTeX    Presentation     Data and Visualization

Sentiment Lexicons for Arabic Social Media. Saif M. Mohammad, Mohammad Salameh, and Svetlana Kiritchenko. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference, May 2016, Portorož (Slovenia).
Paper (pdf)    BibTeX    Presentation       Data: Arabic Sentiment Lexicons

A Dataset for Detecting Stance in Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference, May 2016, Portorož (Slovenia).
Paper (pdf)    BibTeX    Presentation    Data: Stance Dataset    Interactive Visualization

Happy Accident: A Sentiment Lexicon of Opposing Polarities Phrases. Svetlana Kiritchenko and Saif M. Mohammad. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference, May 2016, Portorož (Slovenia).
Paper (pdf)    BibTeX    Poster    Data: Opposing Polarity Sentiment Lexicon    Interactive Visualization

Journal Papers

How Translation Alters Sentiment. Saif M. Mohammad, Mohammad Salameh, and Svetlana Kiritchenko, Journal of Artificial Intelligence Research, 2016, Volume 55, pages 95-130.
Paper (pdf)    BibTeX     Data: Arabic Sentiment Lexicons

Developing a Successful SemEval Task in Sentiment Analysis of Twitter and Other Social Media Texts. Preslav Nakov, Sara Rosenthal, Svetlana Kiritchenko, Saif M. Mohammad, Zornitsa Kozareva, Alan Ritter, Veselin Stoyanov, and Xiaodan Zhu. Language Resources and Evaluation. March 2016, Volume 50, Issue 1, pages 35-65.
Paper (pdf)    Preprint Version    BibTeX

Professional Community Involvement

I am organizing these shared task competitions under the aegis of SemEval-2016 (see webpage for schedule):

Detecting Stance in Tweets (new task). Saif M. Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry.

Determining sentiment intensity of English and Arabic phrases. Svetlana Kiritchenko, Saif M Mohammad, and Mohammad Salameh. This is an expansion of the SemEval-2015 Task 10 subtask E - Determining strength of association of Twitter terms with positive sentiment (or, degree of prior polarity).

Paper

SemEval-2015 Task 10: Sentiment Analysis in Twitter. Sara Rosenthal, Preslav Nakov, Svetlana Kiritchenko, Saif M Mohammad, Alan Ritter, and Veselin Stoyanov. In Proceedings of the ninth international workshop on Semantic Evaluation Exercises (SemEval-2015), June 2015, Denver, Colorado.
Paper (pdf)   BibTeX

Sentiment After Translation: A Case-Study on Arabic Social Media Posts. Mohammad Salameh, Saif M Mohammad and Svetlana Kiritchenko, In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL-2015), June 2016, Denver, Colorado.
Paper (pdf)   BibTeX   Data: Arabic Sentiment Lexicons

Data

Arabic BBN blog posts and Syrian tweets translated manaually and automatically into English and annotated for sentiment. The original Arabic text is also annotated for sentiment.

BBN blog posts: A subset of 1200 Arabic (Levantine dialect) sentences chosen from the BBN Arabic-Dialect/English Parallel Text. The sentences are extracted social media posts and provided with their translation. We manually annotated this subset and its translations (both manual and automatic) for sentiment (positive, negative, or neutral).

Syrian tweets: dataset of 2000 tweets originating from Syria (a country where Levantine dialectal Arabic is commonly spoken). These tweets were collected in May 2014 by polling the Twitter API. This dataset is not provided with manual English translation. We manually annotated this subset and its translations (both manual and automatic) for sentiment (positive, negative, or neutral).

Tutorial

Sentiment Analysis of Social Media Texts. Saif M. Mohammad and Xiaodan Zhu. Tutorial at the 2014 Conference on Empirical Methods on Natural Language Processing, October 2014, Doha, Qatar.
Presentation   Video    Proposal

Journal paper

Sentiment Analysis of Short Informal Texts. Svetlana Kiritchenko, Xiaodan Zhu and Saif Mohammad. Journal of Artificial Intelligence Research, volume 50, pages 723-762, August 2014.
Paper (pdf)    BibTeX

Data

Among other things, the paper above describes how we created a sentiment lexicon by crowdsourcing. This is the first manually created lexicon with real-valued sentiment scores. It was created using the MaxDiff technique. The data was also used in SemEval-2015 Task 10 (Sentiment Analysis in Twitter), subtask E - Determining strength of association of Twitter terms with positive sentiment (or, degree of prior polarity). Task description, trial data, test data, and other details available here.

Papers

Sentiment, Emotion, Purpose, and Style in Electoral Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Xiaodan Zhu, and Joel Martin. Information Processing and Management, Volume 51, Issue 4, July 2015, Pages 480–499.
Paper (pdf)    BibTeX

NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews, Svetlana Kiritchenko, Xiaodan Zhu, Colin Cherry, and Saif M. Mohammad. In Proceedings of the eighth international workshop on Semantic Evaluation Exercises (SemEval-2014), August 2014, Dublin, Ireland.
Paper (pdf)    BibTeX     Poster

Official Rankings: Our team (NRC-Canada) ranked first in three of the six subtasks. About 30 teams participated.

NRC-Canada-2014: Recent Improvements in Sentiment Analysis of Tweets, Xiaodan Zhu, Svetlana Kiritchenko, and Saif M. Mohammad. In Proceedings of the eighth international workshop on Semantic Evaluation Exercises (SemEval-2014), August 2014, Dublin, Ireland.
Paper (pdf)    BibTeX

Official Rankings: Our team (NRC-Canada) ranked first in five of the ten subtask-domain combinations. About 40 teams participated.

An Empirical Study on the Effect of Negation Words on Sentiment. Xiaodan Zhu, Hongyu Guo, Saif Mohammad and Svetlana Kiritchenko. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, June 2014, Baltimore, MD.
Paper (pdf)    BibTeX

Semantic Role Labeling of Emotions in Tweets. Saif M. Mohammad, Xiaodan Zhu, and Joel Martin, In Proceedings of the ACL 2014 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), June 2014, Baltimore, MD.
Paper (pdf)    BibTeX

NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets, Saif M. Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu, In Proceedings of the seventh international workshop on Semantic Evaluation Exercises (SemEval-2013), June 2013, Atlanta, USA.
Paper (pdf)    BibTeX    System Description and Downloads     Poster     Slides

Official Rankings: Our team (NRC-Canada) ranked first in detecting sentiment of tweets (task 2B - tweets), first in detecting sentiment of SMS messages (task 2B - SMS), first in detecting sentiment of terms within a tweet (task 2A - tweets), and second in detecting sentiment of terms within an SMS message (task 2A - SMS). About 44 teams participated.

Data

Below are the two automatically created sentiment lexicons we used to generate our submissions to SemEval-2013 Task 2. If you use them, please cite this paper.

a. NRC Hashtag Sentiment Lexicon (version 0.1) is a list of words with associations to positive and negative sentiments. The lexicon is distributed in three files: unigrams-pmilexicon.txt (54,129 terms), bigrams-pmilexicon.txt (316,531 terms), and pairs-pmilexicon.txt (480,010 terms). Each line in the three files has the format:

term<tab>sentimentScore<tab>numPositive<tab>numNegative
where:
term is the target word or phrase.
In unigrams-pmilexicon.txt, term is a unigram (single word).
In bigrams-pmilexicon.txt, term is a bigram (two-word sequence). A bigram has the form: "string string". The bigram was seen at least once in the source tweets from which the lexicon was created.
In pairs-pmilexicon.txt, term is a unigram--unigram pair, unigram--bigram pair, bigram--unigram pair, or a bigram--bigram pair. The pairs were generated from a large set of source tweets. Tweets were examined one at a time, and all possible unigram and bigram combinations within the tweet were chosen. Pairs with certain punctuations, @ symbols, and some function words were removed.

sentimentScore is a real number. A positive score indicates positive sentiment. A negative score indicates negative sentiment. The absolute value is the degree of association with the sentiment.
numPositive is the number of times the term co-occurred with a positive marker such as a positive emoticon or a positive hashtag.
numNegative is the number of times the term co-occurred with a negative marker such as a negative emoticon or a negative hashtag.

The hashtag lexicon was created from a collection of tweets that had a positive or a negative word hashtag such as #good, #excellent, #bad, and #terrible. Version 0.1 was created from 775,310 tweets posted between April and December 2012 using a list of 78 positive and negative word hashtags. A list of these hashtags is shown in sentimenthashtags.txt.

b. Sentiment140 Lexicon (version 0.1) is also a list of words with associations to positive an negative sentiments. It has the same format as the NRC Hashtag Sentiment Lexicon. However, it was created from the sentiment140 corpus of 1.6 million tweets, and emoticons were used as positive and negative labels (instead of hashtagged words).

 

Paper

Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus, Saif Mohammad, Bonnie Dorr, and Cody Dunne, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), August 2009, Singapore.
Paper (pdf)   BibTeX     Presentation 

Data

Access the Macquarie Semantic Orientation Lexicon (MSOL) here. It is described in the EMNLP-09 paper listed below. The paper describes a few different MSOL variants; the one available here for download is MSOL(ASL and GI).

 

Personality Detection
 

Journal Paper

Using Hashtags to Capture Fine Emotion Categories from Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Computational Intelligence, Volume 31, Issue 2, Pages 301-326, May 2015.
Paper (pdf)    BibTeX

Paper

Using Nuances of Emotion to Identify Personality, Saif M. Mohammad and Svetlana Kiritchenko, In Proceedings of the ICWSM Workshop on Computational Personality Recognition, July 2013, Boston, USA.
Paper (pdf)
   BibTeX

 
 

Capturing Word-Colour Associations
 

Papers

Colourful Language: Measuring Word-Colour Associations, Saif Mohammad, In Proceedings of the ACL 2011 Workshop on Cognitive Modeling and Computational Linguistics (CMCL), June 2011, Portland, OR.
Paper (pdf)    BibTeX     Presentation

Even the Abstract have Colour: Consensus in WordColour Associations, Saif Mohammad, In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 2011, Portland, OR.
Paper (pdf)    BibTeX     Poster  

Data

The NRC Word-Colour Association Lexicon (a.k.a. NRC Color Lexicon) has human annotations of colours associated with more than 24,200 word senses (about 14,200 word types). It is available here.

Visualization

An interactive visualization of the NRC Color Lexicon, called Lexichrome, is available here.


 

Computing Semantic Distance and Distributional Similarity
 

Papers

Measuring Semantic Distance using Distributional Profiles of Concepts, Saif Mohammad and Graeme Hirst. Arxiv.
Paper (pdf)

Estimating semantic distance using soft semantic constraints in knowledge-source–corpus hybrid models, Yuval Marton, Saif Mohammad, and Philip Resnik, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), August 2009, Singapore.
Paper (pdf)    Presentation 

Measuring Semantic Distance using Distributional Profiles of Concepts, Saif Mohammad, Ph.D. thesis, University of Toronto, January 2008, Toronto, Canada.
Paper (pdf)    Presentation

Cross-lingual distributional profiles of concepts for measuring semantic distance, Saif Mohammad, Iryna Gurevych, Graeme Hirst, and Torsten Zesch, In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL-2007), June 2007, Prague, Czech Republic.
Paper (ps)    Paper (pdf)    Presentation 

Distributional Measures as Proxies for Semantic Distance: A Survey, Saif Mohammad and Graeme Hirst.
Paper - Dec 2007 version (pdf) (Note: This is an updated version of the Jan 2006 paper below.)   

Distributional Measures as Proxies for Semantic Relatedness, Saif Mohammad and Graeme Hirst.
Paper - Jan 2006 version (pdf)

Distributional measures of concept-distance: A task-oriented evaluation, Saif Mohammad and Graeme Hirst, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2006), July 2006, Sydney, Australia.
Paper (ps)    Paper (pdf)    Presentation 

 

Computing Lexical Contrast
 

Data

Datasets described in Computing Lexical Contrast, Saif M. Mohammad, Bonnie J. Dorr, Graeme Hirst, and Peter D. Turney, Computational Linguistics, 39 (3), 555-590, 2013.

1. List of about 3.5 million antonym pairs identified from contrasting adjacent thesaurus categories.
2. List of about 3.2 million antonym pairs identified using affix patterns and the thesaurus structure.
3. Total set of 6.3 million antonym pairs obtained by combining 1 and 2, and removing duplicates.
4. Set of 1269 closest-to-opposite questions created for WordNet opposites: adjectives, adverbs, nouns, verbs
5. Set of 162 closest-to-opposite questions from GRE preparatory website 1: development set.
6. Set of 790 closest-to-opposite questions from GRE preparatory website 2: test set.
7. Questionnaires for determining information about kinds of opposites: adjectives, adverbs, nouns, verbs
8. Responses to crowdsourced questionnaires: adjectives, adverbs, nouns, verbs
9. Set of 209 adjacent categories in the Macquarie Thesaurus that were manually determined to be contrasting.
10. Set of 1358 WordNet opposites used to test the co-occurrence and the distributional hypotheses.
11. Set of 1358 WordNet synonyms used to test the co-occurrence and the distributional hypotheses.
12. Set of 1358 WordNet random word pairs used to test the co-occurrence and the distributional hypotheses.
13. Set of 15 affix rules that tend to generate opposites.
14. TURN dataset: 136 pairs of words (89 opposites and 47 synonyms) from various Web sites for learners of English as a second language (first described in Turney, 2008).
15. LZQZ dataset: 80 pairs of synonyms and 80 pairs of opposites from the Webster’s Collegiate Thesaurus (first described in Lin et al., 2003).

Journal Paper

Computing Lexical Contrast, Saif M. Mohammad, Bonnie J. Dorr, Graeme Hirst, and Peter D. Turney, Computational Linguistics, 39 (3), 555-590, 2013.
Paper (pdf)   BibTeX

Papers

Computing Word-Pair Antonymy, Saif Mohammad, Bonnie Dorr, and Graeme Hirst, In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-2008), October 2008, Waikiki, Hawaii.
Abstract    Paper (pdf)    Presentation 

Towards Antonymy-Aware Natural Language Applications, Saif Mohammad, Bonnie Dorr, and Graeme Hirst. Proceedings of the Symposium on Semantic Knowledge Discovery, Organization and Use (SKDOU-2008), November 2008, New York, NY.
Paper (pdf)    Poster

 

Word Sense Disambiguation and Word Sense Dominance
 

Papers

Distributional profiles of concepts for Unsupervised Word Sense Disambigution, Saif Mohammad, Graeme Hirst, and Philip Resnik, In Proceedings of the Fourth International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SemEval-07), June 2007, Prague, Czech Republic.
Abstract    Paper (ps)    Paper (pdf)    Poster

Determining Word Sense Dominance Using a Thesaurus, Saif Mohammad and Graeme Hirst, In Proceedings of the 11th conference of the European chapter of the Association for Computational Linguistics (EACL-2006), April 2006, Trento, Italy.
Abstract    Paper (ps)    Paper (pdf)    Presentation 

Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation, Saif Mohammad and Ted Pedersen, In Proceedings of the Conference on Computational Natural Language Learning (CoNLL-2004), May, 2004, Boston, MA.
Paper (ps)    Paper (pdf)    Presentation

Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to Senseval-3, Saif Mohammad and Ted Pedersen, In Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SensEval-3), July 2004, Barcelona, Spain.
Paper (ps)    Paper (pdf)    Presentation

Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation, Saif Mohammad, Master's thesis, University of Minnesota, August 2003, Minnesota.
Paper (ps)    Paper (pdf)    Presentation

Guaranteed Pre-Tagging for the Brill Tagger, Saif Mohammad and Ted Pedersen, In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2003), February 2003, Mexico City.
Paper (ps)    Paper (pdf)

 

Text Summarization
 

Journal Paper

Generating Extractive Summaries of Scientific Paradigms, Vahed Qazvinian, Dragomir R. Radev, Saif M. Mohammad, Bonnie Dorr, David Zajic, Michael Whidby, Taesun Moon. Journal of Artificial Intelligence Research (JAIR), 46, pages 165-201, 2013.
Paper (pdf)   BibTeX

Papers

Generating Surveys of Scientific Paradigms, Saif Mohammad, Bonnie Dorr, Melissa Egan, Ahmed Hassan, Pradeep Muthukrishan, Vahed Qazvinian, Dragomir Radev, and David Zajic, In Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT-2009), May 2009, Boulder, Colorado.
Paper (pdf)    Presentation 

Multiple alternative sentence compressions and word-pair antonymy for automatic text summarization and recognizing textual entailment, Saif Mohammad, Bonnie Dorr, Melissa Egan, Jimmy Lin, and David Zajic. Proceedings of the Text Analysis Conference (TAC-2008), November 2008, Gaithersburg, MD.
Paper (pdf)    Poster

 

Multi-Document Coreference Resolution
 

Paper

Cross-Document Coreference Resolution: A Key Technology for Learning by Reading, James Mayfield, Bonnie Dorr, Jason Eisner, Tim Finin, Saif Mohammad, Douglas Oard, Ralph Weischedel, David Yarowsky, and others. March 2009. Proceedings of the AAAI Spring Symposium on Learning by Reading and Learning to Read (AAAI-09), Menlo Park, CA.
Paper (pdf)

 

Recognizing Textual Entailment
 

Journal Paper

Experiments with Three Approaches to Recognizing Lexical Entailment. Peter D. Turney, Saif M. Mohammad, Natural Language Engineering, Volume 21, Issue 3, May 2015.
Paper (pdf)    BibTeX

Paper

Multiple alternative sentence compressions and word-pair antonymy for automatic text summarization and recognizing textual entailment, Saif Mohammad, Bonnie Dorr, Melissa Egan, Jimmy Lin, and David Zajic. Proceedings of the Text Analysis Conference (TAC-2008), November 2008, Gaithersburg, MD.
Paper (pdf)    Poster



Relational Similarity
 

Paper

SemEval-2012 Task 2: Measuring Degrees of Relational Similarity, David Jurgens, Saif Mohammad, Peter Turney and Keith Holyoak, In Proceedings of the 2012 SemEval-2012 : Semantic Evaluation Exercises, June 2012, Montreal, Canada.
Paper (pdf)    BibTeX

Data

Data we created for SemEval-2012: Semantic Evaluation Exercises -- Task 2: Measuring Degrees of Relational Similarity is available here.


Metaphor
 

Paper

Metaphor as a Medium for Emotion: An Empirical Study, Saif M. Mohammad, Ekaterina Shutova, and Peter Turney. In Proceedings of the Joint Conference on Lexical and Computational Semantics (*Sem), August 2016, Berlin, Germany.
Paper (pdf)   BibTeX     Data and Interactive Visualization

Data

The data annotated as part of this project can be downloaded by clicking here.

 

Last Updated: July 2015