Gensim -- Statistical Semantics in Python

Řehůřek,  Radim; Sojka,  Petr

Gensim -- Statistical Semantics in Python

Warning

This publication doesn't include Faculty of Economics and Administration. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	ŘEHŮŘEK Radim SOJKA Petr
Year of publication	2011
Type	Appeared in Conference without Proceedings
MU Faculty or unit	Faculty of Informatics
Citation
Attached files	rehurek-sojka-scipy2011.pdf
Description	\texttt{Gensim} is a pure Python library that fights on two fronts: 1)~digital document indexing and similarity search; and 2)~fast, memory-efficient, scalable algorithms for Singular Value Decomposition and Latent Dirichlet Allocation. The connection between the two is unsupervised, semantic analysis of plain text in digital collections. Gensim was created for large digital libraries, but its underlying algorithms for large-scale, distributed, online SVD and LDA are like the Swiss Army knife of data analysis---also useful on their own, outside of the domain of Natural Language Processing.
Related projects:	Centrum komputační lingvistiky The European Digital Mathematics Library