Extracting and Using Translation Templates in an Example-Based Machine Translation System
Ethel Ong | Kathleen Go | Manimin Morga | Vince Nunez | Francis Veto
Discipline: Computer Technology
A bidirectional English-Filipino machine translation system is developed that extracts translation templates and chunks from a given bilingual English-Filipino corpus. These templates and chunks are then used to translate an input English document to Filipino and vice versa. The system extended the similarity and difference translation template learning algorithms of Cicekli and Guvenir (2003) by refining existing templates and deriving templates from previously learned chunks. Chunk alignment, splitting algorithms, and chunk refinement are also introduced in the training process. Correct extraction of similarity templates and chunks during the learning process led to translation with a low word error rate of 15% for a test document whose sentences exactly match the training set, to a high 86% when the test document is different from the training corpus. Using difference templates alone, the resulting translation has a word error rate of 49% to 85%. Combined use of similarity and difference templates resulted in a low word error rate of 18% when the test document contains sentence patterns matching the training set, to a high 85% when the test document is different from the training corpus. Tests also showed that the translation with the highest score selected from a set of candidate translations is consistently the best choice as validated against automatic evaluation methods.