An efficient algorithm for building a distributional thesaurus
Authors | |
---|---|
Year of publication | 2007 |
Type | Article in Proceedings |
Conference | Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions |
MU Faculty or unit | |
Citation | |
Web | http://www.aclweb.org/anthology/P/P07/P07-2011 |
Field | Informatics |
Keywords | text corpus; distributional thesaurus |
Description | Gorman and Curran (2006) argue that thesaurus generation for billion+-word corpora is problematic as the full computation takes many days. We present an algorithm with which the computation takes under two hours. We have created, and made publicly available, thesauruses based on large corpora for (at time of writing) seven major world languages. The development is implemented in the Sketch Engine. |
Related projects: |