by John Lee and Stephanie Seneff @ Spoken Language Systems, MIT CSAIL
Interspeech – ICSLP (Pittsburgh) 17-21 September

Taken from Interspeech website:

Session Wed3A3O: Technologies for Specific Populations: Learners and Challenged
it’s a poster
A computer conversational system can potentially help a foreign-language student improve his/her fluency through practice dialogues. One of its potential roles could be to correct ungrammatical sentences. This paper describes our research on a sentence-level, generation-based approach to grammar correction: first, a word lattice of candidate corrections is generated from an ill-formed input. A traditional n-gram language model is used to produce a small set of N-best candidates, which are then reranked by parsing using a stochastic context-free grammar. We evaluate this approach in a flight domain with simulated ill-formed sentences. We discuss its potential applications in a few related tasks.

Notes: They take a couple of error categories relevant to Japanese speakers conversing in English (articles and prepositions, noun number, verb aspect, mode and tense) and use them for their experiments/analysis. They do not use data from real second-language learners for this paper.

First they reduce the supposedly erroneous sentence (in my case it would be incorrect MT output) to its canonical form, where articles, preps, and auxiliaries are stripped off, and nouns and verbs are reduced to their citation form. All their alternative inflections are inserted into the lattice; insertions of articles, preps and aux. are allowed at every position. Second, an n-gram and a stochastic CFG are used as LMs to score all the paths in the lattice. In their experiments, they treat the transcript as a gold-standard and they find that their method can correctly reconstruct the transcript 88.7% of the time.
What’s nice about this approach is that it doesn’t need any human corrections. In a way, my thesis research can be seen as a great source of data to train systems similar to this one. A nice side-effect of my research is that we obtain MT output annotated with human corrections. so in this setting, one can use correction annotated data in order to build systems that can recover from ill-formed MT output and generate correct translations for such output automatically.