Noami Yamashita and Toru Ishida (Kyoto) Computer Supported Cooperative Work (CSCW 2006). [pdf]
Abstract: Even though multilingual communities that use machine translation to overcome language barriers are increasing, we still lack a complete understanding of how machine translation affects communication. In this study, eight pairs from three different language communities–China, Korea, and Japan–worked on referential tasks in their shared second language (English) and in their native languages using a machine translation embedded chat system. Drawing upon prior research, we predicted differences in conversational efficiency and content, and in the shortening of referring expressions over trials. Quantitative results combined with interview data show that lexical entrainment was disrupted in machine translation-mediated communication because echoing is disrupted by asymmetries in machine translations. In addition, the process of shortening referring expressions is also disrupted because the translations do not translate the same terms consistently throughout the conversation. To support natural referring behavior in machine translation-mediated communication, we need to resolve asymmetries and inconsistencies caused by machine translations.

Task for experiments: order figures through a chat interface, via a third language (English) and with own lanuage+ MT.
The process of agreeing on a perspective on a referent is known as lexical entrainment [4, 11].

Although machine translation liberates members from language barriers, it also poses hurdles for establishing mutual understanding. As one might expect, translation errors are the main source of inaccuracies that complicate mutual understanding [25]. Climent found that typographical errors are also a big source of translation errors that hinder mutual understanding [7]. Yamashita discovered that members tend to misunderstand translated messages and proposed a method to automatically detect misunderstandings [30].

In machine translation-mediated communication, shortened referring expressions are not necessarily translated correctly; even when referring expressions overlap considerably, machine translation may generate something totally different based on very small changes. Because abbreviation is problematic for machine translation, we expect that participants will identify a figure using identical referring expressions throughout the conversation.

… translations between two different languages are not transitive: translation from language A to B and back to A does not yield the original expression. The intransitive nature of machine translations results from its development process; translation from language A to B is built independently of translation from language B to A. In such conversations, the addressee cannot echo the speaker’s expression as a way of accepting it, illustrating that they are referring to the same thing.

We also found that in their second trial, speakers using machine translation preferred to narrow expressions rather than simplify them. …We infer that “narrowing” is observed more frequently in machine translation-mediated communication because distinctive terms such as “kimono” have few alternatives in translation, and thus, participants feel safe using them to match the figures.

Moreover, participants avoided focusing on the incomprehensible part of messages to discover what was wrong. Since translations are not transitive, it appears that they cannot efficiently solve the problem. Speakers have little choice but to offer more information and proceed with the task.

Consistent with quantitative results, speakers tended to describe the figures more frequently in machine translation than in English.

It seems that participants can minimize mutual effort in collaboration by offering more and more information until their partner confirms understanding.

Since such an unwieldy conversational style would not be useful in general conversation, there is a need to support natural referential behavior in machine translation-mediated communication. For example, support that creates correspondences among references (or keywords) between the two languages may help. Also, support that creates correspondences among referring expressions before and after shortening may help.


Doug Beeferman, Adam Berger, John Lafferty (1997). Proceedings of the Second Conference on Empirical Methods in Natural Language Processing.

Abstract: This paper introduces a new statistical approach to partitioning text automatically into coherent segments. Our approach enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text. To aid its search, the system consults a set of simple lexical hints it has learned to associate with the presence of boundaries through inspection of a large corpus of annotated data. We also propose a new probabilistically motivated error metric for use by the natural language processing and information retrieval communities, intended to supersede precision and recall for appraising segmentation algorithms. Qualitative assessment of our algorithm as well as evaluation using this new metric demonstrates the effectiveness of our approach in two very different domains, Wall Street Journal articles and the TDT Corpus, a collection of newswire articles and broadcast news transcripts.

My Notes: Partitioning is at the text document level, not at the sentence level, used to segment large collections of texts (IR).

Splitting long sentences into fluent and coherent shorter sentences is much harder to do automatically, since it would require some sort of language generation module, which could turn sentential fragments into sentences. Has anybody looked at this problem?
An aside: I love the term lexical miopia and shortsightedness to describe low n-gram models.

by Alon Lavie, Donna Gates, Noah Coccaro and Lori Levin (1996). ECAI Workshop on Dialogue Processing in Spoken Language Systems.

Abstract: JANUS is a multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneous conversation in a limited domain. In this paper we describe how multi-level segmentation of single utterance turns improves translation quality and facilitates accurate translation in our system. We define the basic dialogue units that are handled by our system, and discuss the cues and methods employed by the system in segmenting the input utterance into such units. Utterance segmentation in our system is performed in a multi-level incremental fashion, partly prior and partly during analysis by the parser. The segmentation relies on a combination of acoustic, lexical, semantic and statistical knowledge sources, which are described in detail in the paper. We also discuss how our system is designed to disambiguate among alterantive possible input segmentations.

My Notes: Split input into semantic dialog units (~= speech act), namely semantically coherent pieces of information that can be translated independently.

Mellebeek, Bart; Owczarzak, Karolina; Van Genabith, Josef & Way, Andy. (2006). AMTA, Boston, MA.

Original paper on TransBooster project is: B. Mellebeek, A. Khasin, J. Van Genabith, A. Way. 2005. TransBooster: Boosting the Performance of Wide-Coverage Machine Translation Systems. In Proceedings of the 10th Annual Conference of the European Association for Machine Translation. pp. 189-197, Budapest, Hungary.

Abstract: In this paper, we present a novel approach to combine the outputs of multiple MT engines into a consensus translation. In contrast to previous Multi-Engine Machine
Translation (MEMT) techniques, we do ot rely on word alignments of output hypotheses, but prepare the input sentence or multi-engine processing. We do this by using a recursive decomposition algorithm hat produces simple chunks as input to the MT engines. A consensus translation is produced by combining the best chunk translations, selected through majority voting, a trigram language model score and a confidence score assigned to each MT engine. We report statistically significant relative improvements
of up to 9% BLEU score in experiments (English->Spanish) carried out on an 800-
sentence test set extracted from the Penn-II Treebank.

Summary: They describe an algorithm for splitting input sentences into syntactically meaningful chunks (according to a parser/human) and simplifying the arguments of a pivot (head of the chunk) to facilitate the machine translation process of the simplified chunks in (dynamically simplified) context.

My Notes: this work shows that splitting up long input sentences into shorter one, can actually lead to improvement of MT output in terms of BLEU. Therefore having a game with a purpose trying to do this using humans, becomes less relevant.

In contrast to previous MEMT approaches, the technique we present does not rely on word alignments of target language sentences, but is based on recursive chunking algorithm that produces simple constituents as input to the MT engines. The outputs of these syntactically meaningful chunks are compared to each other and the highest ranked translations are used to compose the output sentence. Our approach, therefore, prepares the input sentence for multi-engine processing on the input side. It draws its strength from the simple fact that short input strings result in better translations than longer ones.

The decomposition into chunks, the tracking of the output chunks in target and the final composition of the output are based on the TransBooster architecture presented in (Mellebeek et al., 2005) [EAMTA, Budapest].

Our approach presupposes the existence of some sort of syntactic analysis of the input sentence. In a first step, the input sentence is decomposed into a number of syntactically meaningful chunks as in (1).
(1) [ARG_1] [ADJ_1]. . . [ARG_L] [ADJ_l] pivot [ARG_L+1] [ADJ_l+1]. . . [ARG_L+R] [ADJ_l+r]
where pivot = the nucleus of the sentence, ARG = argument, ADJ = adjunct, {l,r} = number of ADJs to left/right of pivot, and {L,R} = number of ARGs to left/right of pivot.
In order to determine the pivot, we compute the head of the local tree by adapting the headlexicalised rammar annotation scheme of (Magerman, 1995). In certain cases, we derive a ‘complex pivot’ consisting of the head terminal together with some of its neighbours, e.g. phrasal verbs or strings of auxiliaries. The procedure used for argument/
adjunct identification is an adapted version of Hockenmaier’s algorithm for CCG (Hockenmaier, 2003).

In a next step, we replace the arguments by similar but simpler strings, which we call ‘Substitution Variables’. The purpose of Substitution Variables is: (i) to help to reduce the complexity of the original arguments, which often leads to an improved translation of the pivot; (ii) to help keep track of the location of the translation of the arguments in target.
In choosing an optimal Substitution Variable for a constituent, there exists a trade-off between accuracy and retrievability. ‘Static’ or previously defined Substitution Variables (e.g. ‘cars’ to replace the NP ‘fast and confidential deals’ as explained in section 3.5) are easy to track in target, since their translation by a specific MT engine is known in advance,
but they might distort the translation of the pivot because of syntactic/semantic differences with the original constituent. ‘Dynamic’ Substitution Variables comprise the real heads of the constituent (e.g. ‘deals’ to replace the NP ‘fast and confidential deals’
as outlined in section 3.5) guarantee a maximum similarity, but are more difficult to track in target.
Our algorithm employs Dynamic Substitution Variables first and automatically backs off to Static Substitution Variables if problems occur. By replacing the arguments by their Substitution Variables and leaving out the adjuncts in (1), we obtain the skeleton
in (2)

(2) [VARG_1 ] . . . [VARG_L] pivot [VARG_L+1] . . . [VARG_L+R]
where VARGi is the simpler string substituting ARGi
By matching the previously established translations of the Substitution Variables VARGi (1 <= i <= L + R) in the translation of the skeleton in (2), we are able to (i) extract the translation of the pivot and (ii) track the location of the translated arguments in target. The result of this second step on the worked example is shown in (6). Adjuncts are located in target by using a similar strategy in which adjunct Substitution Variables are
added to the skeleton in (2).

Since translating individual chunks out of context is likely to produce a deficient output or lead to boundary friction, we need to ensure that each chunk is translated in a simple context that mimics the original.
As in the case of the Substitution Variables, this context can be static (a previously established template, the translation of which is known in advance) or dynamic (a simpler version of the original context).
Our approach is based on the idea that by reducing the complexity of the original context, the analysis modules of the MT engines are more likely to produce a better translation of the input chunk Ci than if it were left intact in the original sentence, which contains more syntactic and semantic ambiguities.
In other words, we try to improve on the translation C_ji of chunk C_i by MT engine j through input simplification. (cf. section 3.5 for more details)
After obtaining the translations of all input chunks by all MT engines (C_i1 – C_iN ), all that remains to be done is to select the best output translation C_i_best for each chunk C_i and derive the output by composing all C_i_best . This is possible since we have kept track of the position of each C_ij by the Substitution Variables.

by Davide Turcato, Fred Popowich, Paul McFetridge, Devlan Nicholson, Janine Toole. NAACL-ANLP 2000 Workshop on Embedded machine translation systems – Volume 5. Seattle, Washington. pp 38-45

Abstract: We describe an approach to Machine Translation of transcribed speech, as found in closed captions. We discuss how the colloquial nature and input format peculiarities of closed captions are dealt with in a pre-processing pipeline that prepares the input for effective processing by a core MT system. In particular, we describe components for proper name recognition and input segmentation. We evaluate the contribution of such modules to the system performance. The described methods have been implemented on an MT system for translating English closed captions to Spanish and Portuguese.

My Notes: Instead of splitting long sentences into shorter, more translation-friendly sentences, in closed captions, the sentences are often arbitrarily split for practical reasons, which makes parsing and Noun Entity recognition much harder.

During pre-processing, MT input undergoes the following steps: text normalization, tokenization, POS tagging, Proper name recognition and segmentation.


Segmentation breaks a line into one or more segments, which are passed separately to subsequent modules (Ejerhed, 1996) (Beeferman et al., 1997). In translation, segmentation is applied to split a line into a sequence of translationally self-contained units (Lavie et al., 1996).

In our system, the translation unitswe identify are syntactic units, motivated by crosslinguistic considerations. Each unit is a constituent that dan be translated independently. Its translation is insensitive to the context in which the unit occurs, and the order of the units is preserved by translation. One motivation for segmenting is that processing is faster: syntactic ambiguity is reduced, and backtracking from a module to a previous one does not involve re-processing an entire line, but only the segment that failed. A second motivation is robustness: a failure in one segment does not involve a failure in the entire line, and error-recovery can be limited only to a segment. Further motivations are provided by the colloquial nature of closed captions.