Competing Patterns in Language Engineering and Computer Typesetting

Warning

This publication doesn't include Faculty of Economics and Administration. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

SOJKA Petr

Year of publication 2005
MU Faculty or unit

Faculty of Informatics

Citation
Description The goal of this dissertation is to explore models, methods and methodologies for machine learning of the compact and effective storage of empirical data in the areas of language engineering and computer typesetting, with a focus on the massive exception handling. Research has focused on the pattern-driven approach. The whole methodology of so called \stress{competing patterns} capable of handling exceptions to be found so widely in natural language data and computer typesetting, is further developed. Competing patterns can store \stress{context dependent} information and can be learnt from data, or written by experts, or combined together. In the first part of the thesis, the theory of competing patterns is built; competing patterns are defined, cornerstones of methodology based on stratified sampling, bootstrapping and problem modeling by competing patterns are described. Segmentation problems (hyphenation) and problems of disambiguation of tagged data in corpus linguistics are used as examples when developing formal model of the competing patterns method. The second part consist of a series of seven published papers that describe problems addressed by the proposed methods: applications of competing patterns and related learning methods in areas of hyphenation, hyphenation of compound words and, for example, the segmentation of Thai texts.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.