by John Lee and Stephanie Seneff @ Spoken Language Systems, MIT CSAIL
Interspeech – ICSLP (Pittsburgh) 17-21 September

Taken from Interspeech website:

Session Wed3A3O: Technologies for Specific Populations: Learners and Challenged
it’s a poster
A computer conversational system can potentially help a foreign-language student improve his/her fluency through practice dialogues. One of its potential roles could be to correct ungrammatical sentences. This paper describes our research on a sentence-level, generation-based approach to grammar correction: first, a word lattice of candidate corrections is generated from an ill-formed input. A traditional n-gram language model is used to produce a small set of N-best candidates, which are then reranked by parsing using a stochastic context-free grammar. We evaluate this approach in a flight domain with simulated ill-formed sentences. We discuss its potential applications in a few related tasks.

Notes: They take a couple of error categories relevant to Japanese speakers conversing in English (articles and prepositions, noun number, verb aspect, mode and tense) and use them for their experiments/analysis. They do not use data from real second-language learners for this paper.

First they reduce the supposedly erroneous sentence (in my case it would be incorrect MT output) to its canonical form, where articles, preps, and auxiliaries are stripped off, and nouns and verbs are reduced to their citation form. All their alternative inflections are inserted into the lattice; insertions of articles, preps and aux. are allowed at every position. Second, an n-gram and a stochastic CFG are used as LMs to score all the paths in the lattice. In their experiments, they treat the transcript as a gold-standard and they find that their method can correctly reconstruct the transcript 88.7% of the time.
What’s nice about this approach is that it doesn’t need any human corrections. In a way, my thesis research can be seen as a great source of data to train systems similar to this one. A nice side-effect of my research is that we obtain MT output annotated with human corrections. so in this setting, one can use correction annotated data in order to build systems that can recover from ill-formed MT output and generate correct translations for such output automatically.


by Alon Lavie, Donna Gates, Noah Coccaro and Lori Levin (1996). ECAI Workshop on Dialogue Processing in Spoken Language Systems.

Abstract: JANUS is a multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneous conversation in a limited domain. In this paper we describe how multi-level segmentation of single utterance turns improves translation quality and facilitates accurate translation in our system. We define the basic dialogue units that are handled by our system, and discuss the cues and methods employed by the system in segmenting the input utterance into such units. Utterance segmentation in our system is performed in a multi-level incremental fashion, partly prior and partly during analysis by the parser. The segmentation relies on a combination of acoustic, lexical, semantic and statistical knowledge sources, which are described in detail in the paper. We also discuss how our system is designed to disambiguate among alterantive possible input segmentations.

My Notes: Split input into semantic dialog units (~= speech act), namely semantically coherent pieces of information that can be translated independently.

by Davide Turcato, Fred Popowich, Paul McFetridge, Devlan Nicholson, Janine Toole. NAACL-ANLP 2000 Workshop on Embedded machine translation systems – Volume 5. Seattle, Washington. pp 38-45

Abstract: We describe an approach to Machine Translation of transcribed speech, as found in closed captions. We discuss how the colloquial nature and input format peculiarities of closed captions are dealt with in a pre-processing pipeline that prepares the input for effective processing by a core MT system. In particular, we describe components for proper name recognition and input segmentation. We evaluate the contribution of such modules to the system performance. The described methods have been implemented on an MT system for translating English closed captions to Spanish and Portuguese.

My Notes: Instead of splitting long sentences into shorter, more translation-friendly sentences, in closed captions, the sentences are often arbitrarily split for practical reasons, which makes parsing and Noun Entity recognition much harder.

During pre-processing, MT input undergoes the following steps: text normalization, tokenization, POS tagging, Proper name recognition and segmentation.


Segmentation breaks a line into one or more segments, which are passed separately to subsequent modules (Ejerhed, 1996) (Beeferman et al., 1997). In translation, segmentation is applied to split a line into a sequence of translationally self-contained units (Lavie et al., 1996).

In our system, the translation unitswe identify are syntactic units, motivated by crosslinguistic considerations. Each unit is a constituent that dan be translated independently. Its translation is insensitive to the context in which the unit occurs, and the order of the units is preserved by translation. One motivation for segmenting is that processing is faster: syntactic ambiguity is reduced, and backtracking from a module to a previous one does not involve re-processing an entire line, but only the segment that failed. A second motivation is robustness: a failure in one segment does not involve a failure in the entire line, and error-recovery can be limited only to a segment. Further motivations are provided by the colloquial nature of closed captions.