Multi-Engine Machine Translation by Recursive Sentence Decomposition

August 28, 2006

Mellebeek, Bart; Owczarzak, Karolina; Van Genabith, Josef & Way, Andy. (2006). AMTA, Boston, MA.

Original paper on TransBooster project is: B. Mellebeek, A. Khasin, J. Van Genabith, A. Way. 2005. TransBooster: Boosting the Performance of Wide-Coverage Machine Translation Systems. In Proceedings of the 10th Annual Conference of the European Association for Machine Translation. pp. 189-197, Budapest, Hungary.

Abstract: In this paper, we present a novel approach to combine the outputs of multiple MT engines into a consensus translation. In contrast to previous Multi-Engine Machine
Translation (MEMT) techniques, we do ot rely on word alignments of output hypotheses, but prepare the input sentence or multi-engine processing. We do this by using a recursive decomposition algorithm hat produces simple chunks as input to the MT engines. A consensus translation is produced by combining the best chunk translations, selected through majority voting, a trigram language model score and a confidence score assigned to each MT engine. We report statistically significant relative improvements
of up to 9% BLEU score in experiments (English->Spanish) carried out on an 800-
sentence test set extracted from the Penn-II Treebank.

Summary: They describe an algorithm for splitting input sentences into syntactically meaningful chunks (according to a parser/human) and simplifying the arguments of a pivot (head of the chunk) to facilitate the machine translation process of the simplified chunks in (dynamically simplified) context.

My Notes: this work shows that splitting up long input sentences into shorter one, can actually lead to improvement of MT output in terms of BLEU. Therefore having a game with a purpose trying to do this using humans, becomes less relevant.

Excerpts
In contrast to previous MEMT approaches, the technique we present does not rely on word alignments of target language sentences, but is based on recursive chunking algorithm that produces simple constituents as input to the MT engines. The outputs of these syntactically meaningful chunks are compared to each other and the highest ranked translations are used to compose the output sentence. Our approach, therefore, prepares the input sentence for multi-engine processing on the input side. It draws its strength from the simple fact that short input strings result in better translations than longer ones.

The decomposition into chunks, the tracking of the output chunks in target and the final composition of the output are based on the TransBooster architecture presented in (Mellebeek et al., 2005) [EAMTA, Budapest].

Our approach presupposes the existence of some sort of syntactic analysis of the input sentence. In a first step, the input sentence is decomposed into a number of syntactically meaningful chunks as in (1).
(1) [ARG_1] [ADJ_1]. . . [ARG_L] [ADJ_l] pivot [ARG_L+1] [ADJ_l+1]. . . [ARG_L+R] [ADJ_l+r]
where pivot = the nucleus of the sentence, ARG = argument, ADJ = adjunct, {l,r} = number of ADJs to left/right of pivot, and {L,R} = number of ARGs to left/right of pivot.
In order to determine the pivot, we compute the head of the local tree by adapting the headlexicalised rammar annotation scheme of (Magerman, 1995). In certain cases, we derive a ‘complex pivot’ consisting of the head terminal together with some of its neighbours, e.g. phrasal verbs or strings of auxiliaries. The procedure used for argument/
adjunct identification is an adapted version of Hockenmaier’s algorithm for CCG (Hockenmaier, 2003).

In a next step, we replace the arguments by similar but simpler strings, which we call ‘Substitution Variables’. The purpose of Substitution Variables is: (i) to help to reduce the complexity of the original arguments, which often leads to an improved translation of the pivot; (ii) to help keep track of the location of the translation of the arguments in target.
In choosing an optimal Substitution Variable for a constituent, there exists a trade-off between accuracy and retrievability. ‘Static’ or previously defined Substitution Variables (e.g. ‘cars’ to replace the NP ‘fast and confidential deals’ as explained in section 3.5) are easy to track in target, since their translation by a specific MT engine is known in advance,
but they might distort the translation of the pivot because of syntactic/semantic differences with the original constituent. ‘Dynamic’ Substitution Variables comprise the real heads of the constituent (e.g. ‘deals’ to replace the NP ‘fast and confidential deals’
as outlined in section 3.5) guarantee a maximum similarity, but are more difficult to track in target.
Our algorithm employs Dynamic Substitution Variables first and automatically backs off to Static Substitution Variables if problems occur. By replacing the arguments by their Substitution Variables and leaving out the adjuncts in (1), we obtain the skeleton
in (2)

(2) [VARG_1 ] . . . [VARG_L] pivot [VARG_L+1] . . . [VARG_L+R]
where VARGi is the simpler string substituting ARGi
By matching the previously established translations of the Substitution Variables VARGi (1 <= i <= L + R) in the translation of the skeleton in (2), we are able to (i) extract the translation of the pivot and (ii) track the location of the translated arguments in target. The result of this second step on the worked example is shown in (6). Adjuncts are located in target by using a similar strategy in which adjunct Substitution Variables are
added to the skeleton in (2).

Since translating individual chunks out of context is likely to produce a deficient output or lead to boundary friction, we need to ensure that each chunk is translated in a simple context that mimics the original.
As in the case of the Substitution Variables, this context can be static (a previously established template, the translation of which is known in advance) or dynamic (a simpler version of the original context).
Our approach is based on the idea that by reducing the complexity of the original context, the analysis modules of the MT engines are more likely to produce a better translation of the input chunk Ci than if it were left intact in the original sentence, which contains more syntactic and semantic ambiguities.
In other words, we try to improve on the translation C_ji of chunk C_i by MT engine j through input simplification. (cf. section 3.5 for more details)
After obtaining the translations of all input chunks by all MT engines (C_i1 – C_iN ), all that remains to be done is to select the best output translation C_i_best for each chunk C_i and derive the output by composing all C_i_best . This is possible since we have kept track of the position of each C_ij by the Substitution Variables.

Advertisements

2 Responses to “Multi-Engine Machine Translation by Recursive Sentence Decomposition”

  1. Idetrorce Says:

    very interesting, but I don’t agree with you
    Idetrorce

  2. seo Says:

    Hello Web Admin, I noticed that your On-Page SEO is is missing a few factors, for one you do not use all three H tags in your post, also I notice that you are not using bold or italics properly in your SEO optimization. On-Page SEO means more now than ever since the new Google update: Panda. No longer are backlinks and simply pinging or sending out a RSS feed the key to getting Google PageRank or Alexa Rankings, You now NEED On-Page SEO. So what is good On-Page SEO?First your keyword must appear in the title.Then it must appear in the URL.You have to optimize your keyword and make sure that it has a nice keyword density of 3-5% in your article with relevant LSI (Latent Semantic Indexing). Then you should spread all H1,H2,H3 tags in your article.Your Keyword should appear in your first paragraph and in the last sentence of the page. You should have relevant usage of Bold and italics of your keyword.There should be one internal link to a page on your blog and you should have one image with an alt tag that has your keyword….wait there’s even more Now what if i told you there was a simple WordPress plugin that does all the On-Page SEO, and automatically for you? That’s right AUTOMATICALLY, just watch this 4minute video for more information at. Seo Plugin


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: