# Thomas Moerman - NLP and AI Research

> PhD researcher in natural language processing (NLP) and artificial intelligence (AI) at Ghent University (LT3). Develops retrieval-augmented and synthetic-data methods, with application to various tasks including machine translation, educational NLP, and fairness in language models.

## Publications

- [ShAnEL-2: A Multilingual Benchmarking Dataset for Short-Answer Language Learning Exercises](https://thomasmoerman.dev/publications/2026-lrec-shanel.html): LREC 2026. A multilingual dataset of 1,185 learner responses with teacher corrections, plus a Gemma 3 benchmark.
- [Multilingual Communication in the Asylum Context: Evaluating LLM-Based Machine Translation with Fuzzy Match Augmentation and Adaptive NMT across Resource Conditions under Low-Data Constraints](https://thomasmoerman.dev/publications/2026-eamt.html): EAMT 2026 (accepted, best-paper shortlist). Fuzzy-match augmentation works with a 358-sentence translation memory across 14 languages. Best-paper shortlist.
- [Fuzzy Semantic Retrieval Strategies for Automated Short-Answer Grading with Large Language Models in Language Learning](https://thomasmoerman.dev/publications/2026-clinj-grading.html): Computational Linguistics in the Netherlands Journal, vol. 15, 2026. Adapts translation-memory fuzzy matching to automated grading, with an accuracy and recall trade-off by shot count.
- [Advancing Fuzzy Match Augmentation for Domain-Specific Machine Translation: An Empirical Study on Large Language Models and Neural Machine Translation](https://thomasmoerman.dev/publications/2025-jair.html): Journal of Artificial Intelligence Research, 2025. Unifies fuzzy-match augmentation for NMT and LLMs and shows specialized models can match or beat much larger ones.
- [Tailoring Machine Translation for Scientific Literature through Topic Filtering and Fuzzy Match Augmentation](https://thomasmoerman.dev/publications/2025-pslt.html): PSLT Workshop, 2025. Adds FastText topic filtering before fuzzy-match augmentation for scientific MT, beating smaller LLMs.
- [Retrieval-augmented Generation for Automated Written Corrective Feedback: From Dataset to Human Evaluation](https://thomasmoerman.dev/publications/2025-clin35-feedback.html): CLIN 35, 2025. Retrieval-augmented generation for automated written corrective feedback in second language acquisition.
- [Improving Fuzzy Match Augmented Neural Machine Translation in Specialised Domains through Synthetic Data](https://thomasmoerman.dev/publications/2024-pbml.html): Prague Bulletin of Mathematical Linguistics, 2024. Combines back-translation with Neural Fuzzy Repair across three language directions and beats several LLMs.
- [Leveraging Synthetic Monolingual Data for Fuzzy-Match Augmentation in Neural Machine Translation: A Preliminary Study](https://thomasmoerman.dev/publications/2024-kemt.html): KeMT Workshop, 2024. A preliminary study combining LLM-generated synthetic data, back-translation, and Neural Fuzzy Repair for legal MT.
- [Evaluating Large-Scale Construction Grammars on the Tasks of Semantic Frame Extraction and Semantic Role Labeling](https://thomasmoerman.dev/publications/2024-constructions.html): Constructions, vol. 16, 2024. Evaluates construction grammars on frame extraction and role labeling, comparing three parsing heuristics.

## Optional

- [Google Scholar](https://scholar.google.be/citations?user=_c_5EIoAAAAJ)
- [LinkedIn](https://www.linkedin.com/in/thomas-andreas-moerman/)