Metric hull as similarity-aware operator for representing unstructured data
Autoři | |
---|---|
Rok publikování | 2021 |
Druh | Článek v odborném periodiku |
Časopis / Zdroj | Pattern Recognition Letters |
Fakulta / Pracoviště MU | |
Citace | |
www | https://www.sciencedirect.com/science/article/pii/S0167865521001914 |
Doi | http://dx.doi.org/10.1016/j.patrec.2021.05.011 |
Klíčová slova | Similarity operators; Metric space; Data aggregation |
Popis | Similarity searching has become widely utilized in many online services processing unstructured and complex data, e.g., Google Images. Metric spaces are often applied to model and organize such data by their mutual similarity. As top-k queries provide only a local view on data, a data analyst must pose multiple requests to observe the entire dataset. Thus, group-by operators for metric data have been proposed. These operators identify groups by respecting a given similarity constraint and produce a set of objects per group. The analyst can then tediously browse these sets directly, but representative members may provide better insight. In this paper, we focus on concise representations of metric datasets. We propose a novel concept of a metric hull which encompasses a given set by selecting a few objects. Testing an object to be part of the set is then made much faster. We verify this concept on synthetic Euclidean data and real-life image and text datasets and show its effectiveness and scalability. The metric hulls provide much faster and more compact representations when compared with commonly used ball representations. |
Související projekty: |
|