Combining Metric Features in Large Collections

Investor logo

Warning

This publication doesn't include Faculty of Economics and Administration. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

BATKO Michal KOHOUTKOVÁ Petra ZEZULA Pavel

Year of publication 2008
Type Article in Proceedings
Conference 1st International Workshop on Similarity Search and Applications (SISAP 2008)
MU Faculty or unit

Faculty of Informatics

Citation
Web http://www.sisap.org/
Field Informatics
Keywords similarity search; complex query; p2p network; approximation
Description Current information systems are required to process complex digital objects, which are typically characterized by multiple descriptors. Since the values of many descriptors belong to non-sortable domains, they are effectively comparable only by a sort ofsimilarity. Moreover, the scalability is very important in the current digital-explosion age. Therefore, we propose a distributed extension of the well-known threshold algorithm for peer-to-peer paradigm. The technique allows to answer similarity queries that combine multiple similarity measures and due to its peer-to-peer nature it is highly scalable. We also explore possibilities of approximate evaluation strategies, where some relevant results can be lost in favor of increasing the efficiency by order of magnitude. To reveal the strengths and weaknesses of our approach we have experimented with a 1.6 million image database from Flicker comparing the content of the images by five similarity measures from the MPEG-7 standard. To the best of our knowledge, the experience with such a huge real-life dataset is quite unique.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.