Leveraging Synthetic Monolingual Data for Fuzzy-Match Augmentation in Neural Machine Translation: A Preliminary Study

Thomas Moerman; Arda Tezcan

Leveraging Synthetic Monolingual Data for Fuzzy-Match Augmentation in Neural Machine Translation: A Preliminary Study

KeMT Workshop, 2024

Machine Translation

A preliminary study combining LLM-generated synthetic data, back-translation, and Neural Fuzzy Repair for legal MT.

Authors

Thomas Moerman

Arda Tezcan

Published

June 27, 2024

A preliminary study on English to French legal translation. It generates synthetic French monolingual data with an LLM, back-translates it to expand the training data, then applies Neural Fuzzy Repair. Combining back-translation with fuzzy-match augmentation improves over both the baseline and fuzzy-match augmentation alone.

Research theme: Machine translation

Citation

BibTeX citation:

@inproceedings{moerman2024,
  author = {Moerman, Thomas and Tezcan, Arda},
  title = {Leveraging {Synthetic} {Monolingual} {Data} for {Fuzzy-Match}
    {Augmentation} in {Neural} {Machine} {Translation:} {A}
    {Preliminary} {Study}},
  booktitle = {Proceedings of the First International Workshop on
    Knowledge-Enhanced Machine Translation (KeMT)},
  date = {2024-06-27},
  langid = {en}
}

For attribution, please cite this work as:

Moerman, Thomas, and Arda Tezcan. 2024. “Leveraging Synthetic Monolingual Data for Fuzzy-Match Augmentation in Neural Machine Translation: A Preliminary Study.” Proceedings of the First International Workshop on Knowledge-Enhanced Machine Translation (KeMT), accepted, June 27.