MetricKnn API
Fast Similarity Search using the Metric Space Approach
|
MetricKnn provides a set of pre-defined distances. More...
#include "../metricknn_c.h"
Go to the source code of this file.
Functions | |
MknnDistanceParams * | mknn_predefDistance_L1 () |
Creates an object for Manhattan or Taxi-cab distance. More... | |
MknnDistanceParams * | mknn_predefDistance_L2 () |
Creates an object for Euclidean distance. More... | |
MknnDistanceParams * | mknn_predefDistance_L2squared () |
Creates an object for squared Euclidean distance. More... | |
MknnDistanceParams * | mknn_predefDistance_Lmax () |
Creates an object for L-max distance. More... | |
MknnDistanceParams * | mknn_predefDistance_Lp (double order) |
Creates an object for Minkowski distance. More... | |
MknnDistanceParams * | mknn_predefDistance_Hamming () |
Creates an object for Hamming distance. More... | |
MknnDistanceParams * | mknn_predefDistance_Chi2 () |
Creates an object for Chi2 distance. More... | |
MknnDistanceParams * | mknn_predefDistance_Hellinger () |
Creates an object for Hellinger distance. More... | |
MknnDistanceParams * | mknn_predefDistance_CosineSimilarity (bool normalize_vectors) |
Creates an object for Cosine Similarity. More... | |
MknnDistanceParams * | mknn_predefDistance_CosineDistance (bool normalize_vectors) |
Creates an object for Cosine Distance. More... | |
MknnDistanceParams * | mknn_predefDistance_EMD (int64_t matrix_rows, int64_t matrix_cols, double *cost_matrix, bool normalize_vectors) |
Creates an object for Earth Mover's Distance. More... | |
MknnDistanceParams * | mknn_predefDistance_DPF (double order, int64_t num_dims_discard, double pct_discard, double threshold_discard) |
Creates an object for Dynamic Partial Function distance. More... | |
MknnDistanceParams * | mknn_predefDistance_MultiDistance (int64_t num_subdistances, MknnDistance **subdistances, bool free_subdistances_on_release, double *normalization_values, double *ponderation_values, bool with_auto_config, MknnDataset *auto_config_dataset, double auto_normalize_alpha, bool auto_ponderation_maxrho, bool auto_ponderation_maxtau) |
Defines a multi-distance, which is a weighted combination of distances. More... | |
Help functions | |
void | mknn_predefDistance_helpListDistances () |
Lists to standard output all pre-defined distances. | |
void | mknn_predefDistance_helpPrintDistance (const char *id_dist) |
Prints to standard output the help for a distance. More... | |
bool | mknn_predefDistance_testDistanceId (const char *id_dist) |
Tests whether the given string references a valid pre-defined distance. More... | |
MetricKnn provides a set of pre-defined distances.
The generic way for instantiating a predefined distance is to use the method mknn_distance_newPredefined, which requires the ID and parameters of the distance.
The complete list of predefined distances can be listed by calling mknn_predefDistance_helpListDistances. The parameters supported by each distance can be listed by calling mknn_predefDistance_helpPrintDistance.
This file contains some functions to ease the instantiation of some predefined distances.
MknnDistanceParams* mknn_predefDistance_Chi2 | ( | ) |
Creates an object for Chi2 distance.
The distance between two n-dimensional vectors is defined as:
\[ \chi^2(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \sum_{i=1}^n \frac{ (x_i - \bar{m}_i )^2 }{ \bar{m}_i } \]
where \( \bar{m}_i=\frac{x_i+y_i}{2} \) .
MknnDistanceParams* mknn_predefDistance_CosineDistance | ( | bool | normalize_vectors | ) |
Creates an object for Cosine Distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{CosineDistance}(\vec{x},\vec{y}) = \sqrt{ 2 ( 1 - \cos(\vec{x},\vec{y})) } \]
where \( \cos(\vec{x},\vec{y}) \) is the cosine similarity between vectors \( \vec{x} \) and \( \vec{y} \) as defined in mknn_predefDistance_CosineSimilarity.
The nearest neighbors obtained by this distance are identical to the farthest neighbor obtained by cosine similarity (if vectors are normalized). Therefore, this distance can be used accelerate the search using cosine similarity.
normalize_vectors | The cosine similarity must normalize vectors to euclidean norm 1 prior to each computation. |
MknnDistanceParams* mknn_predefDistance_CosineSimilarity | ( | bool | normalize_vectors | ) |
Creates an object for Cosine Similarity.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{cos}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \frac { \sum_{i=1}^n x_i \cdot y_i } {\sqrt{ \sum_{i=1}^{n} {x_i}^2 } \cdot \sqrt{ \sum_{i=1}^{n} {y_i}^2 } } \]
normalize_vectors | Computes the euclidean norm for each vector. If this is set to false, it assumes the vectors are already normalized thus the value \( \sqrt{ \sum_{i=1}^{n} {x_i}^2 } \cdot \sqrt{ \sum_{i=1}^{n} {y_i}^2 } \) is equal to 1. |
MknnDistanceParams* mknn_predefDistance_DPF | ( | double | order, |
int64_t | num_dims_discard, | ||
double | pct_discard, | ||
double | threshold_discard | ||
) |
Creates an object for Dynamic Partial Function distance.
See definition http://dx.doi.org/10.1109/ICIP.2002.1040021 .
The distance between two n-dimensional vectors is defined as:
\[ \textrm{DPF}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \left( {\sum_{i \in \Delta_m} |x_i-y_i|^p } \right)^{\frac{1}{p}} \]
where \( \Delta_m \) is the subset of the \( m \) smallest values of \( |x_i-y_i| \).
order | the order \( p \) of the distance \( p > 0 \). |
num_dims_discard | fixed number of dimensions to discard 0 < num_dims_discard < num_dimensions . |
pct_discard | fixed number of dimensions to discard computed as a fraction of num_dimensions 0 < pct_discard < 1 . num_dims_discard = round(pct_discard * num_dimensions) |
threshold_discard | discard all dimensions which difference is higher than threshold_discard . It produces a variable number of dimensions to discard. |
MknnDistanceParams* mknn_predefDistance_EMD | ( | int64_t | matrix_rows, |
int64_t | matrix_cols, | ||
double * | cost_matrix, | ||
bool | normalize_vectors | ||
) |
Creates an object for Earth Mover's Distance.
This function uses OpenCV's implementation, see http://docs.opencv.org/modules/imgproc/doc/histograms.html#emd .
matrix_rows | |
matrix_cols | |
cost_matrix | an array of length matrix_rows * matrix_cols with the cost for each pair of dimensions. |
normalize_vectors | normalizes (sum 1) both vectors before computing the distance. |
MknnDistanceParams* mknn_predefDistance_Hamming | ( | ) |
Creates an object for Hamming distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{Hamming}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \sum_{i=1}^n \bar{p}_i \]
where \( \bar{p}_i= \left\{ \begin{array}{ll} 0 & x_i = y_i\\ 1 & x_i \neq y_i\\ \end{array} \right. \) .
MknnDistanceParams* mknn_predefDistance_Hellinger | ( | ) |
Creates an object for Hellinger distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{Hellinger}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \sqrt { \frac { \sum_{i=1}^n ( \sqrt{x_i} - \sqrt{y_i} )^2} { 2 } } \]
void mknn_predefDistance_helpPrintDistance | ( | const char * | id_dist | ) |
Prints to standard output the help for a distance.
id_dist | the unique identifier of a pre-defined distance. |
MknnDistanceParams* mknn_predefDistance_L1 | ( | ) |
Creates an object for Manhattan or Taxi-cab distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{L1}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \sum_{i=1}^{n} |x_i - y_i| \]
This distance satisfies the metric properties, therefore it can be used by Metric Indexes to obtain exact nearest neighbors.
MknnDistanceParams* mknn_predefDistance_L2 | ( | ) |
Creates an object for Euclidean distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{L2}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \sqrt{ \sum_{i=1}^{n} (x_i - y_i)^2 } \]
This distance satisfies the metric properties, therefore it can be used by Metric Indexes to obtain exact nearest neighbors.
MknnDistanceParams* mknn_predefDistance_L2squared | ( | ) |
Creates an object for squared Euclidean distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{L2}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \sum_{i=1}^{n} (x_i - y_i)^2 \]
MknnDistanceParams* mknn_predefDistance_Lmax | ( | ) |
Creates an object for L-max distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{Lmax}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \max_{i \in \{1,...,n\}} |x_i - y_i| \]
This distance satisfies the metric properties, therefore it can be used by Metric Indexes to obtain exact nearest neighbors.
MknnDistanceParams* mknn_predefDistance_Lp | ( | double | order | ) |
Creates an object for Minkowski distance.
The distance between two n-dimensional vectors is defined as:
\[ \textrm{Lp}(\{x_1,...,x_n\},\{y_1,...,y_n\}) = \left( {\sum_{i=1}^n |x_i-y_i|^p } \right)^{\frac{1}{p}} \]
order | the order \( p \) of the distance \( p > 0 \). |
MknnDistanceParams* mknn_predefDistance_MultiDistance | ( | int64_t | num_subdistances, |
MknnDistance ** | subdistances, | ||
bool | free_subdistances_on_release, | ||
double * | normalization_values, | ||
double * | ponderation_values, | ||
bool | with_auto_config, | ||
MknnDataset * | auto_config_dataset, | ||
double | auto_normalize_alpha, | ||
bool | auto_ponderation_maxrho, | ||
bool | auto_ponderation_maxtau | ||
) |
Defines a multi-distance, which is a weighted combination of distances.
num_subdistances | number of subdistances to combine. |
subdistances | the distances to combine |
free_subdistances_on_release | to release the subdistances together with this distance |
normalization_values | the value to divide each distance. |
ponderation_values | the value to weight each distance. |
with_auto_config | run algorithms to automatically locate normalization or ponderation values. |
auto_config_dataset | the data to be used by the algorithms. |
auto_normalize_alpha | the value to be used by the alpha-normalization. |
auto_ponderation_maxrho | run the automatic ponderation according to max rho criterium. |
auto_ponderation_maxtau | run the automatic ponderation according to max tau criterium. |
bool mknn_predefDistance_testDistanceId | ( | const char * | id_dist | ) |
Tests whether the given string references a valid pre-defined distance.
id_dist | the unique identifier of a pre-defined distance. |
id_dist
corresponds to a pre-defined distance, and false otherwise.