Punctuation Detection with Full Syntactic Parsing

Investor logo

Warning

This publication doesn't include Faculty of Economics and Administration. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

JAKUBÍČEK Miloš HORÁK Aleš

Year of publication 2010
Type Article in Periodical
Magazine / Source Research in Computing Science, Special issue: Natural Language Processing and its Applications
MU Faculty or unit

Faculty of Informatics

Citation
Web http://www.cicling.org/2010/Vol46.pdf
Field Informatics
Keywords punctuation; grammar checking; parsing; syntactic analysis
Description The correct placement of punctuation characters is in many languages, including Czech, driven by complex guidelines. Although those guidelines use information of morphology, syntax and semantics, state-of-art systems for punctuation detection and correction are limited to simple rule-based backbones. In this paper we present a syntax-based approach by utilizing the Czech parser synt. This parser uses an adapted chart parsing technique for building the chart structure for the sentence. synt can then process the chart and provide several kinds of output information. The implemented punctuation detection technique utilizes the synt output in the form of automatic and unambiguous extraction of optimal syntactic structures from the sentence (noun phrases, verb phrases, clauses, relative clauses or inserted clauses). Using this feature it is possible to obtain information about syntactic structures related to expected punctuation placement. We also present experiments proving that this method makes it possible to cover most syntactic phenomena needed for punctuation detection or correction.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.