Nizar Habash (Columbia University)’s contribution to the AMTA Hybird MT Panel.

The Intuition: StatMT and RuleMT have complementary advantages:
Syntactic structure produces better global target linguistic structure,
Statistical phrase-based translation is more robust locally.

The Resource Challenge
Parallel corpora as models of performance vs. Dictionaries/analyzers as models of competence
“More is better” is true for both approaches

Parallel corpora are domain/genre specific
Dictionaries and parsers can be domain/genre specific

Hybrids may need more data: Annotated resources.

by Steve McClure, Mary Flanagan (contractor). Aug. 2003

Abstract: The difficulty of measuring the quality of automatic language translation systems (known as machine translation [MT]) has been an obstacle to widespread adoption. With systematic benchmark testing, categorization of errors, and effective dictionary customization, MT technology can yield significant cost and time savings, as well as improved consistency in translations.

“The adoption of any new technology by mainstream organizations is driven in part by how well the technology ‘works.’ The key metric for MT is the quality of the resulting translation. Not only is this a somewhat subjective measure, but its definition changes in the context of each application and user,” says Steve McClure, a research vice president in IDC’s Software Research Group. “Quality must be measured in the context of whether the user achieved its objective, not by what percentage of the translation was correct. By applying a proven process individually with each of its enterprise customers, SYSTRAN is ensuring acceptable levels of MT quality.”

Online Full Article

My Notes: Systran also allows user rule manipulation (Ford Motor). Nice example of giving the power to the users, by having interact and fix the translation rules themselves.

So now I can say something like this: Given that MT pos-editing is not an easy task, using non-expert users of MT might sound like an unwise idea at first, but GALE evaluation relied on non-expert (yet widely trained) users to post-edit MT output, and even Systran has open up their system so that end users can modify, add and refine their lexicons and grammars.

Excerpts from article

SYSTRAN has also developed the SYSTRAN Review Manager (SRM), which helps the customer to manage the MT quality process by allowing them to change vocabulary and linguistic rules. This tool represents an important advance in MT, both technologically and philosophically. Users have never before had the power to modify linguistic rules through an intuitive, interactive process.

By opening up rule modification, SYSTRAN takes a risk, but one that will
almost certainly pay off. Engaging users in the process of improving MT is
the surest path to increased acceptance and understanding of the technology.

Machine Translation Output Is Not Easily Predictable

MT systems work with natural language – a data set that is infinitely
varying, ambiguous, and structurally complex. To translate adequately, an
MT system must encode knowledge of hundreds of syntactic patterns,
variations, and exceptions, as well as relationships among these patterns.
It must include ever-changing vocabulary and specific semantic knowledge
about the usage patterns of tens of thousands of words. It must accurately
identify the parts of speech and grammatical characteristics of words
which may, in different contexts, be nouns, verbs, or adjectives, each
having many possible translations. Translation also requires a vast store
of knowledge about the world, the intent of the communication, and the
subject matter.
A human translator prioritizes and selectively applies linguistic rules
based on this knowledge. MT software, unless explicitly coded for each
possibility, cannot. Thus, MT will never attain the overall quality of
human translation. The primary advantages of MT over human translation are speed, cost, and consistency. An MT system gets a great deal more
translation done than is possible manually, and MT can deliver
translations instantly for time-sensitive content. When a term is entered
in an MT dictionary, it will translate it the same way every time, unlike
human translators who may choose different translations at different
times.