The Automatic Determination of Translation Equivalents in Lexicography: What Works and What Doesn’t?

Varování

Publikace nespadá pod Ekonomicko-správní fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.
Autoři

DENISOVÁ Michaela DE SCHRYVER Gilles-Maurice RYCHLÝ Pavel

Rok publikování 2024
Druh Článek ve sborníku
Konference Proceedings of the XXI EURALEX International Congress
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www Plný text
Klíčová slova Translation equivalent determination; Cross-lingual embedding models; Evaluation
Popis Cross-lingual embedding models act as facilitator of lexical knowledge transfer and offer many advantages, notably their applicability to low-resource and non-standard language pairs, making them a valuable tool for retrieving translation equivalents in lexicography. Despite their potential, these models have primarily been developed with a focus on Natural Language Processing (NLP), leading to significant issues, including flawed training and evaluation data, as well as inadequate evaluation metrics and procedures. In this paper, we introduce cross-lingual embedding models for lexicography, addressing the challenges and limitations inherent in the current NLP-focused research. We demonstrate the problematic aspects across three baseline cross-lingual embedding models and three language pairs and outline possible solutions. We show the importance of high-quality data, advocating that its role is vital compared to algorithmic optimisation in enhancing the effectiveness of these models.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.