Elizabeth Davis '06 Lexical Disambiguation in Machine Translation with Latent Semantic Analysis

This paper examines a possible solution to the problem of disambiguating polysemous nouns in machine translation. Latent Semantic Analysis (LSA), a statistical method of finding and representing word sense, is used to differentiate between the different meanings of ambiguous words according to the given context. A collection of training texts are sorted according to polysemous word and meaning. A word-by-text matrix is created from this data and transformed by the LSA method, creating vectors for each text defining it in terms of the(non-polysemous) words that appear in it. These representations of textual meanings are compared to the context of an ambiguous word to determine the most similar meaning. The viability of this LSA model is compared with a simple Bayesian probability model.

Faculty Advisor: Simon Levy