Tailoring Machine Translation for Scientific Literature through Topic Filtering and Fuzzy Match Augmentation

PSLT Workshop, 2025

Machine Translation
Adds FastText topic filtering before fuzzy-match augmentation for scientific MT, beating smaller LLMs.
Authors

Thomas Moerman

Tom Vanallemeersch

Sara Szoc

Arda Tezcan

Published

June 24, 2025

English to French scientific translation across neuroscience, climatology, and mobility. It adds topic filtering with FastText classifiers to select in-domain data before fuzzy-match augmentation. The best NMT configuration beats smaller LLMs, and the results are metric dependent.

Research theme: Machine translation

Citation

BibTeX citation:
@inproceedings{moerman2025,
  author = {Moerman, Thomas and Vanallemeersch, Tom and Szoc, Sara and
    Tezcan, Arda},
  title = {Tailoring {Machine} {Translation} for {Scientific}
    {Literature} Through {Topic} {Filtering} and {Fuzzy} {Match}
    {Augmentation}},
  booktitle = {Proceedings of the Eleventh Workshop on Patent and
    Scientific Literature Translation (PSLT)},
  date = {2025-06-24},
  langid = {en}
}
For attribution, please cite this work as:
Moerman, Thomas, Tom Vanallemeersch, Sara Szoc, and Arda Tezcan. 2025. “Tailoring Machine Translation for Scientific Literature Through Topic Filtering and Fuzzy Match Augmentation.” Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT), accepted, June 24.