Washington and Lee University

Washington and Lee University
Text Size: AAAPrint
Bookmark and Share

Honors Thesis -- Elizabeth Davis

Lexical Disambiguation in Machine Translation with Latent Semantic Analysis


This paper examines a possible solution to the problem of disambiguating polysemous nouns in machine translation. Latent Semantic Analysis (LSA), a statistical method of nding and representing word sense, is used to di erentiate between the diferent meanings of ambiguous words according to the given context. A collection of training texts are sorted according to polysemous word and meaning. A word-by-text matrix is created from this data and transformed by the LSA method, creating vectors for each text de ning it in terms of the (non-polysemous) words that appear in it. These representations of textual meanings are compared to the context of an ambiguous word to determine the most similar meaning. The viability of this LSA model is compared with a simple Bayesian probability model.

Faculty Advisor: Simon Levy

Resources For: