Leveraging Synthetic Monolingual Data for Fuzzy-Match Augmentation in Neural Machine Translation: A Preliminary Study
KeMT Workshop, 2024
Machine Translation
A preliminary study combining LLM-generated synthetic data, back-translation, and Neural Fuzzy Repair for legal MT.
A preliminary study on English to French legal translation. It generates synthetic French monolingual data with an LLM, back-translates it to expand the training data, then applies Neural Fuzzy Repair. Combining back-translation with fuzzy-match augmentation improves over both the baseline and fuzzy-match augmentation alone.
Research theme: Machine translation
Citation
BibTeX citation:
@inproceedings{moerman2024,
author = {Moerman, Thomas and Tezcan, Arda},
title = {Leveraging {Synthetic} {Monolingual} {Data} for {Fuzzy-Match}
{Augmentation} in {Neural} {Machine} {Translation:} {A}
{Preliminary} {Study}},
booktitle = {Proceedings of the First International Workshop on
Knowledge-Enhanced Machine Translation (KeMT)},
date = {2024-06-27},
langid = {en}
}
For attribution, please cite this work as:
Moerman, Thomas, and Arda Tezcan. 2024. “Leveraging Synthetic
Monolingual Data for Fuzzy-Match Augmentation in Neural Machine
Translation: A Preliminary Study.” Proceedings of the First
International Workshop on Knowledge-Enhanced Machine Translation
(KeMT), accepted, June 27.