Self-Training Language Models in Arithmetic Reasoning

Varování

Publikace nespadá pod Ekonomicko-správní fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.
Autoři

KADLČÍK Marek ŠTEFÁNIK Michal SOTOLÁŘ Ondřej MARTINEK Vlastimil

Rok publikování 2024
Druh Další prezentace na konferencích
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
Popis Recent works show the impressive effectiveness of an agent framework in solving problems with language models. In this work, we apply two key features from the framework, interaction with tools and goal-oriented training, to improve models' arithmetical reasoning. First, we curate and transform existing datasets to create Calc-X, a standardized collection with over 300,000 problems with step-by-step solutions. We use Calc-X to train models we call Calcformers that interact with a calculator during inference. Calcformers achieve twice the accuracy of standard baselines. Finally, we optimize Calcformers via self-training using preference optimization and supervised loss by checking the model's predicted results. We find that self-training can achieve substantial improvements on out-of-domain problems and that traditional supervised loss is a strong baseline for preference optimization. Our results show that preference optimization converges faster and isn't prone to forgetting pre-trained abilities.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.