MetricKnn API
Fast Similarity Search using the Metric Space Approach
|
MetricKnn provides a set of pre-defined distances. More...
#include <mknn_predefined_distance.hpp>
Static Public Member Functions | |
static DistanceParams | L1 () |
Creates an object for Manhattan or Taxi-cab distance. More... | |
static DistanceParams | L2 () |
Creates an object for Euclidean distance. More... | |
static DistanceParams | Lmax () |
Creates an object for L-max distance. More... | |
static DistanceParams | Lp (double order) |
Creates an object for Minkowski distance. More... | |
static DistanceParams | Hamming () |
Creates an object for Hamming distance. More... | |
static DistanceParams | Chi2 () |
Creates an object for Chi2 distance. More... | |
static DistanceParams | Hellinger () |
Creates an object for Hellinger distance. More... | |
static DistanceParams | CosineSimilarity (bool normalize_vectors) |
Creates an object for Cosine Similarity. More... | |
static DistanceParams | CosineDistance (bool normalize_vectors) |
Creates an object for Cosine Distance. More... | |
static DistanceParams | EMD (long long matrix_rows, long long matrix_cols, double *cost_matrix, bool normalize_vectors) |
Creates an object for Earth Mover's Distance. More... | |
static DistanceParams | DPF (double order, long long num_dims_discard, double pct_discard, double threshold_discard) |
Creates an object for Dynamic Partial Function distance. More... | |
static DistanceParams | MultiDistance (const std::vector< Distance > &subdistances, bool free_subdistances_on_release, const std::vector< double > &normalization_values, const std::vector< double > &ponderation_values, bool with_auto_config, Dataset &auto_config_dataset, double auto_normalize_alpha, bool auto_ponderation_maxrho, bool auto_ponderation_maxtau) |
Defines a multi-distance, which is a weighted combination of distances. More... | |
Help functions | |
static void | helpListDistances () |
Lists to standard output all pre-defined distances. | |
static void | helpPrintDistance (std::string id_dist) |
Prints to standard output the help for a distance. More... | |
static bool | testDistanceId (std::string id_dist) |
Tests whether the given string references a valid pre-defined distance. More... | |
MetricKnn provides a set of pre-defined distances.
The generic way for instantiating a predefined distance is to use the method Distance::newPredefined, which requires the ID and parameters of the distance.
The complete list of predefined distances can be listed by calling Distance::helpListDistances. The parameters supported by each distance can be listed by calling PredefDistance::helpPrintDistance.
This class contains some functions to ease the instantiation of some predefined distances.
|
static |
Creates an object for Chi2 distance.
The distance between two n-dimensional vectors is defined as:
\[ \chi^2(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \sum_{i=1}^n \frac{ (x_i - \bar{m}_i )^2 }{ \bar{m}_i } \]
where \( \bar{m}_i=\frac{x_i+y_i}{2} \) .
|
static |
Creates an object for Cosine Distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{CosineDistance}(\vec{x},\vec{y}) = \sqrt{ 2 ( 1 - \cos(\vec{x},\vec{y})) } \]
where \( \cos(\vec{x},\vec{y}) \) is the cosine similarity between vectors \( \vec{x} \) and \( \vec{y} \) as defined in CosineSimilarity.
The nearest neighbors obtained by this distance are identical to the farthest neighbor obtained by cosine similarity (if vectors are normalized). Therefore, this distance can be used accelerate the search using cosine similarity.
normalize_vectors | The cosine similarity must normalize vectors to euclidean norm 1 prior to each computation. |
|
static |
Creates an object for Cosine Similarity.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{cos}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \frac { \sum_{i=1}^n x_i \cdot y_i } {\sqrt{ \sum_{i=1}^{n} {x_i}^2 } \cdot \sqrt{ \sum_{i=1}^{n} {y_i}^2 } } \]
normalize_vectors | Computes the euclidean norm for each vector. If this is set to false, it assumes the vectors are already normalized thus the value \( \sqrt{ \sum_{i=1}^{n} {x_i}^2 } \cdot \sqrt{ \sum_{i=1}^{n} {y_i}^2 } \) is equal to 1. |
|
static |
Creates an object for Dynamic Partial Function distance.
See definition http://dx.doi.org/10.1109/ICIP.2002.1040021 .
The distance between two n-dimensional vectors is defined as:
\[ \textrm{DPF}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \left( {\sum_{i \in \Delta_m} |x_i-y_i|^p } \right)^{\frac{1}{p}} \]
where \( \Delta_m \) is the subset of the \( m \) smallest values of \( |x_i-y_i| \).
order | the order \( p \) of the distance \( p > 0 \). |
num_dims_discard | fixed number of dimensions to discard 0 < num_dims_discard < num_dimensions . |
pct_discard | fixed number of dimensions to discard computed as a fraction of num_dimensions 0 < pct_discard < 1 . num_dims_discard = round(pct_discard * num_dimensions) |
threshold_discard | discard all dimensions which difference is higher than threshold_discard . It produces a variable number of dimensions to discard. |
|
static |
Creates an object for Earth Mover's Distance.
This function uses OpenCV's implementation, see http://docs.opencv.org/modules/imgproc/doc/histograms.html#emd .
matrix_rows | |
matrix_cols | |
cost_matrix | an array of length matrix_rows * matrix_cols with the cost for each pair of dimensions. |
normalize_vectors | normalizes (sum 1) both vectors before computing the distance. |
|
static |
Creates an object for Hamming distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{Hamming}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \sum_{i=1}^n \bar{p}_i \]
where \( \bar{p}_i= \left\{ \begin{array}{ll} 0 & x_i = y_i\\ 1 & x_i \neq y_i\\ \end{array} \right. \) .
|
static |
Creates an object for Hellinger distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{Hellinger}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \sqrt { \frac { \sum_{i=1}^n ( \sqrt{x_i} - \sqrt{y_i} )^2} { 2 } } \]
|
static |
Prints to standard output the help for a distance.
id_dist | the unique identifier of a pre-defined distance. |
|
static |
Creates an object for Manhattan or Taxi-cab distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{L1}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \sum_{i=1}^{n} |x_i - y_i| \]
This distance satisfies the metric properties, therefore it can be used by Metric Indexes to obtain exact nearest neighbors.
|
static |
Creates an object for Euclidean distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{L2}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \sqrt{ \sum_{i=1}^{n} (x_i - y_i)^2 } \]
This distance satisfies the metric properties, therefore it can be used by Metric Indexes to obtain exact nearest neighbors.
|
static |
Creates an object for L-max distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{Lmax}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \max_{i \in \{1,...,n\}} |x_i - y_i| \]
This distance satisfies the metric properties, therefore it can be used by Metric Indexes to obtain exact nearest neighbors.
|
static |
Creates an object for Minkowski distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{Lp}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \left( {\sum_{i=1}^n |x_i-y_i|^p } \right)^{\frac{1}{p}} \]
order | the order \( p \) of the distance \( p > 0 \). |
|
static |
Defines a multi-distance, which is a weighted combination of distances.
subdistances | the distances to combine |
free_subdistances_on_release | to release the subdistances together with this distance |
normalization_values | the value to divide each distance. |
ponderation_values | the value to weight each distance. |
with_auto_config | run algorithms to automatically locate normalization or ponderation values. |
auto_config_dataset | the data to be used by the algorithms. |
auto_normalize_alpha | the value to be used by the alpha-normalization. |
auto_ponderation_maxrho | run the automatic ponderation according to max rho criterium. |
auto_ponderation_maxtau | run the automatic ponderation according to max tau criterium. |
|
static |
Tests whether the given string references a valid pre-defined distance.
id_dist | the unique identifier of a pre-defined distance. |
id_dist
corresponds to a pre-defined distance, and false otherwise.