Advisor: Dr.Ted Pedersen
|Master's Thesis: Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation.|
|Broad Area: Word Sense Diambiguation -- to identify the intended sense of a word, given its context.|
|Publications Defense Slides Software & Data|
Most words in any natural language have more than one possible meaning (or sense). Word sense disambiguation is the process of identifying which of these possible senses is intended based on the context in which a word occurs. Humans are cognitively and linguistically adept at this task. For example, given the sentence Harry cast a bewitching spell, we immediately understand spell to mean a charm or incantation and not to read out letters or a period of time. We can do this via our considerable world knowledge and a fairly limited amount of surrounding context.
However, automatic approaches to sense disambiguation do not have access to our world knowledge, and must take a different approach. The dominant approach at present is to rely on supervised learning, where a human expert provides examples of correctly disambiguated words, and a machine learning algorithm is used to induce a model from these examples. A key issue in such approaches is determining how to represent the context in which the word occurs to the learning algorithm. Pedersen (2001) shows that lexical features (word bigrams in particular) are excellent sources of disambiguation information for a machine learning algorithm. However, there is a large body of previous work in supervised word sense disambiguation that suggest that syntactic features such as part of speech tags and parse structures are also reliable indicators of senses (e.g.,McRoy (1992), Ng and Lee (1996)).
This thesis presents a detailed study of the impact of syntactic features in combination with lexical features. We carry out an extensive empirical evaluation using most of the sense-tagged text currently available in the research community. This includes the Senseval-1, Senseval-2, line, hard, serve and interest data. We find that there is complementary behavior between lexical and syntactic features, and identify several syntactic features that are particularly useful in combination with lexical features. We also introduce a methodology based on comparing the optimal and actual performance of feature sets in order to determine which features are particularly suited to being used in combination, and show that this method leads to improved disambiguation results.
Finally, in the course of part of speech tagging this data, we identified a limitation in the widely used Brill Tagger (1994) that has been corrected via a mechanism known as "Guaranteed Pre-Tagging" (Mohammad and Pedersen, 2003).
Last updated: February 2005