MetricKnn API
Fast Similarity Search using the Metric Space Approach
|
There are different methods for creating or loading datasets. More...
#include "../metricknn_c.h"
Go to the source code of this file.
Typedefs | |
Pointer to functions used to define custom datasets | |
typedef int64_t(* | mknn_function_dataset_getNumObjects) (void *data_pointer) |
Function parameter for mknn_datasetLoader_Custom. More... | |
typedef void *(* | mknn_function_dataset_getObject) (void *data_pointer, int64_t pos) |
Function parameter for mknn_datasetLoader_Custom. More... | |
typedef void(* | mknn_function_dataset_pushObject) (void *data_pointer, void *object) |
Function parameter for mknn_datasetLoader_Custom. More... | |
typedef void(* | mknn_function_dataset_releaseDataPointer) (void *data_pointer) |
Function parameter for mknn_datasetLoader_Custom. More... | |
typedef void(* | mknn_function_dataset_releaseObject) (void *object_pointer) |
Function parameter for mknn_dataset_pushObject Releases the storage of the object. More... | |
Functions | |
MknnDataset * | mknn_datasetLoader_Custom (void *data_pointer, mknn_function_dataset_getNumObjects func_getNumObjects, mknn_function_dataset_getObject func_getObject, mknn_function_dataset_pushObject func_pushObject, mknn_function_dataset_releaseDataPointer func_releaseDataPointer, MknnDomain *domain, bool free_domain_on_dataset_release) |
Creates a new custom dataset. More... | |
MknnDataset * | mknn_datasetLoader_PointerArray (void **object_array, int64_t num_objects, MknnDomain *domain, bool free_each_object_on_dataset_release, bool free_object_array_on_dataset_release, bool free_domain_on_dataset_release) |
Creates a new dataset from an array of objects. More... | |
MknnDataset * | mknn_datasetLoader_PointerCompactVectors (void *vectors_header, bool free_vectors_header_on_dataset_release, int64_t num_vectors, int64_t vector_dimensions, MknnDatatype vector_dimension_datatype) |
Creates a new dataset from a data array. More... | |
MknnDataset * | mknn_datasetLoader_PointerCompactVectors_alt (void *vectors_header, bool free_vectors_header_on_dataset_release, int64_t num_vectors, MknnDomain *domain, bool free_domain_on_dataset_release) |
Creates a new dataset from a data array. More... | |
MknnDataset * | mknn_datasetLoader_ParseVectorFile (const char *filename, MknnDatatype datatype) |
Creates a new dataset by reading a text file with vectors. More... | |
MknnDataset * | mknn_datasetLoader_ParseStringsFile (const char *filename) |
Creates a new dataset by reading a text file with strings. More... | |
MknnDataset * | mknn_datasetLoader_Concatenate (int64_t num_subdatasets, MknnDataset **subdatasets, bool free_subdatasets_on_dataset_release) |
Creates a new dataset which is the concatenation of one or more datasets. More... | |
MknnDataset * | mknn_datasetLoader_SubsetSegment (MknnDataset *superdataset, int64_t position_start, int64_t length, bool free_superdataset_on_release) |
Creates a new dataset which is a subset of a bigger dataset. More... | |
MknnDataset * | mknn_datasetLoader_SubsetPositions (MknnDataset *superdataset, int64_t *positions, int64_t num_positions, bool free_superdataset_on_release) |
Creates a new dataset which is a subset of a bigger dataset. More... | |
MknnDataset * | mknn_datasetLoader_SubsetRandomSample (MknnDataset *superdataset, double sample_size_or_fraction, bool free_superdataset_on_release) |
Creates a new dataset as a random sample of superdataset. More... | |
MknnDataset * | mknn_datasetLoader_UniformRandomVectors (int64_t num_objects, int64_t dimension, double dimension_minValueIncluded, double dimension_maxValueNotIncluded, MknnDatatype vectors_dimension_datatype) |
Creates a new dataset with random vectors of the given datatype. More... | |
MknnDataset * | mknn_datasetLoader_MultiObject (int64_t num_subdatasets, MknnDataset **subdatasets, bool free_subdatasets_on_dataset_release) |
Creates a new dataset where each object is a multi-object. More... | |
MknnDataset * | mknn_datasetLoader_Empty (MknnDomain *domain, bool free_domain_on_dataset_release) |
Creates a new empty dataset that can dynamically grow as new objects are added. More... | |
MknnDataset * | mknn_datasetLoader_reorderRandomPermutation (MknnDataset *superdataset, bool free_superdataset_on_release) |
Creates a new dataset as a random permutation of superdataset. More... | |
MknnDataset * | mknn_datasetLoader_reorderNearestNeighbor (MknnDataset *superdataset, MknnDistance *distance, int64_t start_position, bool free_superdataset_on_release) |
Creates a new dataset which is a permutation of superdataset, where the first position is given by start_position and following are the consecutive nearest neighbors of the previous one. More... | |
There are different methods for creating or loading datasets.
typedef int64_t(* mknn_function_dataset_getNumObjects) (void *data_pointer) |
Function parameter for mknn_datasetLoader_Custom.
Return the current size of the dataset
data_pointer | the object where the data is located. |
data_pointer
typedef void*(* mknn_function_dataset_getObject) (void *data_pointer, int64_t pos) |
Function parameter for mknn_datasetLoader_Custom.
Returns an object in the dataset.
data_pointer | the object where the data is located. |
pos | the position of the desired object, between 0 and num_objects-1. |
pos
in data_pointer
typedef void(* mknn_function_dataset_pushObject) (void *data_pointer, void *object) |
Function parameter for mknn_datasetLoader_Custom.
Adds an object to a dynamic dataset.
data_pointer | the object where the data is located. |
object | the new object to add at the end of data_pointer . |
typedef void(* mknn_function_dataset_releaseDataPointer) (void *data_pointer) |
Function parameter for mknn_datasetLoader_Custom.
Releases the storage of the dataset.
data_pointer | the object to release. |
typedef void(* mknn_function_dataset_releaseObject) (void *object_pointer) |
Function parameter for mknn_dataset_pushObject Releases the storage of the object.
object_pointer | the object to release. |
MknnDataset* mknn_datasetLoader_Concatenate | ( | int64_t | num_subdatasets, |
MknnDataset ** | subdatasets, | ||
bool | free_subdatasets_on_dataset_release | ||
) |
Creates a new dataset which is the concatenation of one or more datasets.
num_subdatasets | the number of datasets to concatenate |
subdatasets | the array of datasets |
free_subdatasets_on_dataset_release | all subdatasets [i] must be released (by calling mknn_dataset_release) during mknn_dataset_release of the new dataset. |
MknnDataset* mknn_datasetLoader_Custom | ( | void * | data_pointer, |
mknn_function_dataset_getNumObjects | func_getNumObjects, | ||
mknn_function_dataset_getObject | func_getObject, | ||
mknn_function_dataset_pushObject | func_pushObject, | ||
mknn_function_dataset_releaseDataPointer | func_releaseDataPointer, | ||
MknnDomain * | domain, | ||
bool | free_domain_on_dataset_release | ||
) |
Creates a new custom dataset.
The data is stored by pointer data_pointer
and func_get_object
is invoked in order to get all the objects.
data_pointer | a pointer to an object with data. |
func_getNumObjects | current size of the dataset stored in data_pointer . |
func_getObject | the function to invoke for retrieving the object in any position between 0 and num_objects - 1 . |
func_pushObject | the function to invoke for adding an object to the dataset. NULL if the dataset is static. |
func_releaseDataPointer | the function to be invoked by mknn_dataset_release for releasing data_pointer (or NULL if it is not needed). |
domain | the domain for all the objects in the dataset. |
free_domain_on_dataset_release | flag to domain be released (by calling mknn_domain_release) during mknn_dataset_release of the new dataset. |
MknnDataset* mknn_datasetLoader_Empty | ( | MknnDomain * | domain, |
bool | free_domain_on_dataset_release | ||
) |
Creates a new empty dataset that can dynamically grow as new objects are added.
The new objects are added by mknn_dataset_pushObject.
domain | the domain for all the objects to be added to the dataset. |
free_domain_on_dataset_release | flag to domain be released (by calling mknn_domain_release) during mknn_dataset_release of the new dataset. |
MknnDataset* mknn_datasetLoader_MultiObject | ( | int64_t | num_subdatasets, |
MknnDataset ** | subdatasets, | ||
bool | free_subdatasets_on_dataset_release | ||
) |
Creates a new dataset where each object is a multi-object.
Each multi-object is created by combining one object of each subdataset. All datasets must contain the same number of objects.
num_subdatasets | number of subdatasets to combine, i.e., the size of the multi-object to be created. |
subdatasets | the datasets from which multi-objects will be created. |
free_subdatasets_on_dataset_release | all subdatasets [i] must be released (by calling mknn_dataset_release) during mknn_dataset_release of the new dataset. |
MknnDataset* mknn_datasetLoader_ParseStringsFile | ( | const char * | filename | ) |
Creates a new dataset by reading a text file with strings.
The format is one string per line.
filename | the filename to read. |
MknnDataset* mknn_datasetLoader_ParseVectorFile | ( | const char * | filename, |
MknnDatatype | datatype | ||
) |
Creates a new dataset by reading a text file with vectors.
The format is one vector per line, each dimension separated by tab.
filename | the filename to read. |
datatype | the datatype of the objects. |
MknnDataset* mknn_datasetLoader_PointerArray | ( | void ** | object_array, |
int64_t | num_objects, | ||
MknnDomain * | domain, | ||
bool | free_each_object_on_dataset_release, | ||
bool | free_object_array_on_dataset_release, | ||
bool | free_domain_on_dataset_release | ||
) |
Creates a new dataset from an array of objects.
The objects are read from object_array
in the following order:
object_array[0]
object_array[1]
object_array[num_objects - 1]
object_array | pointer to an array of objects. |
num_objects | number of object to read from the array. |
domain | the domain for the objects in the array. |
free_each_object_on_dataset_release | releases each object_array[i] during mknn_dataset_release. |
free_object_array_on_dataset_release | releases object_array during mknn_dataset_release. |
free_domain_on_dataset_release | flag to domain be released (by calling mknn_domain_release) during mknn_dataset_release of the new dataset. |
MknnDataset* mknn_datasetLoader_PointerCompactVectors | ( | void * | vectors_header, |
bool | free_vectors_header_on_dataset_release, | ||
int64_t | num_vectors, | ||
int64_t | vector_dimensions, | ||
MknnDatatype | vector_dimension_datatype | ||
) |
Creates a new dataset from a data array.
The objects are read from vectors_header
in the following order:
vectors_header
vectors_header + vector_size
vectors_header + (num_objects - 1) * vector_size
.The value of vector_size
is determined by mknn_domain_vector_getVectorLengthInBytes.
vectors_header | pointer to the header of the set of vectors. |
free_vectors_header_on_dataset_release | releases vectors_header during mknn_dataset_release. |
num_vectors | number of object to read from vectors_header . |
vector_dimensions | number of dimensions of the vectors. |
vector_dimension_datatype | datatype for the vector values. |
MknnDataset* mknn_datasetLoader_PointerCompactVectors_alt | ( | void * | vectors_header, |
bool | free_vectors_header_on_dataset_release, | ||
int64_t | num_vectors, | ||
MknnDomain * | domain, | ||
bool | free_domain_on_dataset_release | ||
) |
Creates a new dataset from a data array.
The objects are read from vectors_header
in the following order:
vectors_header
vectors_header + vector_size
vectors_header + (num_objects - 1) * vector_size
.The value of vector_size
is determined by mknn_domain_vector_getVectorLengthInBytes.
vectors_header | pointer to the header of the set of vectors. |
free_vectors_header_on_dataset_release | releases vectors_header during mknn_dataset_release. |
num_vectors | number of object to read from vectors_header . |
domain | the domain for the vectors in vectors_header . |
free_domain_on_dataset_release | flag to domain be released (by calling mknn_domain_release) during mknn_dataset_release of the new dataset. |
MknnDataset* mknn_datasetLoader_reorderNearestNeighbor | ( | MknnDataset * | superdataset, |
MknnDistance * | distance, | ||
int64_t | start_position, | ||
bool | free_superdataset_on_release | ||
) |
Creates a new dataset which is a permutation of superdataset, where the first position is given by start_position and following are the consecutive nearest neighbors of the previous one.
superdataset | |
distance | |
start_position | the seed object, -1 means the object closer to zero (valid only for vectors). |
MknnDataset* mknn_datasetLoader_reorderRandomPermutation | ( | MknnDataset * | superdataset, |
bool | free_superdataset_on_release | ||
) |
Creates a new dataset as a random permutation of superdataset.
superdataset | |
free_superdataset_on_release |
MknnDataset* mknn_datasetLoader_SubsetPositions | ( | MknnDataset * | superdataset, |
int64_t * | positions, | ||
int64_t | num_positions, | ||
bool | free_superdataset_on_release | ||
) |
Creates a new dataset which is a subset of a bigger dataset.
superdataset | the dataset to extract objects |
positions | each position of the objects to extract from superdataset . The positions are copied to an internal array. |
num_positions | length of array positions . it must be greater than zero. |
MknnDataset* mknn_datasetLoader_SubsetRandomSample | ( | MknnDataset * | superdataset, |
double | sample_size_or_fraction, | ||
bool | free_superdataset_on_release | ||
) |
Creates a new dataset as a random sample of superdataset.
superdataset | |
free_superdataset_on_release | |
sample_size_or_fraction | if >= 1 is number of elements to sample from superdataset, if between 0 and 1 (exclusive) is the fraction of superdataset elements to sample. |
MknnDataset* mknn_datasetLoader_SubsetSegment | ( | MknnDataset * | superdataset, |
int64_t | position_start, | ||
int64_t | length, | ||
bool | free_superdataset_on_release | ||
) |
Creates a new dataset which is a subset of a bigger dataset.
superdataset | the dataset to extract objects |
position_start | position of the first objects to extract. |
length | number of consecutive objects to extract, starting from position_start . it must be greater than zero. |
MknnDataset* mknn_datasetLoader_UniformRandomVectors | ( | int64_t | num_objects, |
int64_t | dimension, | ||
double | dimension_minValueIncluded, | ||
double | dimension_maxValueNotIncluded, | ||
MknnDatatype | vectors_dimension_datatype | ||
) |
Creates a new dataset with random vectors of the given datatype.
Each dimension is bounded in [0, dimension_max_value)
.
num_objects | desired size of the dataset. |
dimension | number of dimensions to generate. |
dimension_minValueIncluded | the minimum value for each dimension (included). |
dimension_maxValueNotIncluded | the maximum value for each dimension (not included). |
vectors_dimension_datatype | the datatype of the generated vectors. |