MetricKnn API
Fast Similarity Search using the Metric Space Approach
Functions
mknn_dataset_loader.h File Reference

There are different methods for creating or loading datasets. More...

#include "../metricknn_c.h"

Go to the source code of this file.

Typedefs

Pointer to functions used to define custom datasets
typedef int64_t(* mknn_function_dataset_getNumObjects) (void *data_pointer)
 Function parameter for mknn_datasetLoader_Custom. More...
 
typedef void *(* mknn_function_dataset_getObject) (void *data_pointer, int64_t pos)
 Function parameter for mknn_datasetLoader_Custom. More...
 
typedef void(* mknn_function_dataset_pushObject) (void *data_pointer, void *object)
 Function parameter for mknn_datasetLoader_Custom. More...
 
typedef void(* mknn_function_dataset_releaseDataPointer) (void *data_pointer)
 Function parameter for mknn_datasetLoader_Custom. More...
 
typedef void(* mknn_function_dataset_releaseObject) (void *object_pointer)
 Function parameter for mknn_dataset_pushObject Releases the storage of the object. More...
 

Functions

MknnDatasetmknn_datasetLoader_Custom (void *data_pointer, mknn_function_dataset_getNumObjects func_getNumObjects, mknn_function_dataset_getObject func_getObject, mknn_function_dataset_pushObject func_pushObject, mknn_function_dataset_releaseDataPointer func_releaseDataPointer, MknnDomain *domain, bool free_domain_on_dataset_release)
 Creates a new custom dataset. More...
 
MknnDatasetmknn_datasetLoader_PointerArray (void **object_array, int64_t num_objects, MknnDomain *domain, bool free_each_object_on_dataset_release, bool free_object_array_on_dataset_release, bool free_domain_on_dataset_release)
 Creates a new dataset from an array of objects. More...
 
MknnDatasetmknn_datasetLoader_PointerCompactVectors (void *vectors_header, bool free_vectors_header_on_dataset_release, int64_t num_vectors, int64_t vector_dimensions, MknnDatatype vector_dimension_datatype)
 Creates a new dataset from a data array. More...
 
MknnDatasetmknn_datasetLoader_PointerCompactVectors_alt (void *vectors_header, bool free_vectors_header_on_dataset_release, int64_t num_vectors, MknnDomain *domain, bool free_domain_on_dataset_release)
 Creates a new dataset from a data array. More...
 
MknnDatasetmknn_datasetLoader_ParseVectorFile (const char *filename, MknnDatatype datatype)
 Creates a new dataset by reading a text file with vectors. More...
 
MknnDatasetmknn_datasetLoader_ParseStringsFile (const char *filename)
 Creates a new dataset by reading a text file with strings. More...
 
MknnDatasetmknn_datasetLoader_Concatenate (int64_t num_subdatasets, MknnDataset **subdatasets, bool free_subdatasets_on_dataset_release)
 Creates a new dataset which is the concatenation of one or more datasets. More...
 
MknnDatasetmknn_datasetLoader_SubsetSegment (MknnDataset *superdataset, int64_t position_start, int64_t length, bool free_superdataset_on_release)
 Creates a new dataset which is a subset of a bigger dataset. More...
 
MknnDatasetmknn_datasetLoader_SubsetPositions (MknnDataset *superdataset, int64_t *positions, int64_t num_positions, bool free_superdataset_on_release)
 Creates a new dataset which is a subset of a bigger dataset. More...
 
MknnDatasetmknn_datasetLoader_SubsetRandomSample (MknnDataset *superdataset, double sample_size_or_fraction, bool free_superdataset_on_release)
 Creates a new dataset as a random sample of superdataset. More...
 
MknnDatasetmknn_datasetLoader_UniformRandomVectors (int64_t num_objects, int64_t dimension, double dimension_minValueIncluded, double dimension_maxValueNotIncluded, MknnDatatype vectors_dimension_datatype)
 Creates a new dataset with random vectors of the given datatype. More...
 
MknnDatasetmknn_datasetLoader_MultiObject (int64_t num_subdatasets, MknnDataset **subdatasets, bool free_subdatasets_on_dataset_release)
 Creates a new dataset where each object is a multi-object. More...
 
MknnDatasetmknn_datasetLoader_Empty (MknnDomain *domain, bool free_domain_on_dataset_release)
 Creates a new empty dataset that can dynamically grow as new objects are added. More...
 
MknnDatasetmknn_datasetLoader_reorderRandomPermutation (MknnDataset *superdataset, bool free_superdataset_on_release)
 Creates a new dataset as a random permutation of superdataset. More...
 
MknnDatasetmknn_datasetLoader_reorderNearestNeighbor (MknnDataset *superdataset, MknnDistance *distance, int64_t start_position, bool free_superdataset_on_release)
 Creates a new dataset which is a permutation of superdataset, where the first position is given by start_position and following are the consecutive nearest neighbors of the previous one. More...
 

Detailed Description

There are different methods for creating or loading datasets.

Typedef Documentation

typedef int64_t(* mknn_function_dataset_getNumObjects) (void *data_pointer)

Function parameter for mknn_datasetLoader_Custom.

Return the current size of the dataset

Parameters
data_pointerthe object where the data is located.
Returns
the number of objects stored in data_pointer
typedef void*(* mknn_function_dataset_getObject) (void *data_pointer, int64_t pos)

Function parameter for mknn_datasetLoader_Custom.

Returns an object in the dataset.

Parameters
data_pointerthe object where the data is located.
posthe position of the desired object, between 0 and num_objects-1.
Returns
the object stored in the position pos in data_pointer
typedef void(* mknn_function_dataset_pushObject) (void *data_pointer, void *object)

Function parameter for mknn_datasetLoader_Custom.

Adds an object to a dynamic dataset.

Parameters
data_pointerthe object where the data is located.
objectthe new object to add at the end of data_pointer.
typedef void(* mknn_function_dataset_releaseDataPointer) (void *data_pointer)

Function parameter for mknn_datasetLoader_Custom.

Releases the storage of the dataset.

Parameters
data_pointerthe object to release.
typedef void(* mknn_function_dataset_releaseObject) (void *object_pointer)

Function parameter for mknn_dataset_pushObject Releases the storage of the object.

Parameters
object_pointerthe object to release.

Function Documentation

MknnDataset* mknn_datasetLoader_Concatenate ( int64_t  num_subdatasets,
MknnDataset **  subdatasets,
bool  free_subdatasets_on_dataset_release 
)

Creates a new dataset which is the concatenation of one or more datasets.

Parameters
num_subdatasetsthe number of datasets to concatenate
subdatasetsthe array of datasets
free_subdatasets_on_dataset_releaseall subdatasets[i] must be released (by calling mknn_dataset_release) during mknn_dataset_release of the new dataset.
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_Custom ( void *  data_pointer,
mknn_function_dataset_getNumObjects  func_getNumObjects,
mknn_function_dataset_getObject  func_getObject,
mknn_function_dataset_pushObject  func_pushObject,
mknn_function_dataset_releaseDataPointer  func_releaseDataPointer,
MknnDomain domain,
bool  free_domain_on_dataset_release 
)

Creates a new custom dataset.

The data is stored by pointer data_pointer and func_get_object is invoked in order to get all the objects.

Parameters
data_pointera pointer to an object with data.
func_getNumObjectscurrent size of the dataset stored in data_pointer.
func_getObjectthe function to invoke for retrieving the object in any position between 0 and num_objects - 1.
func_pushObjectthe function to invoke for adding an object to the dataset. NULL if the dataset is static.
func_releaseDataPointerthe function to be invoked by mknn_dataset_release for releasing data_pointer (or NULL if it is not needed).
domainthe domain for all the objects in the dataset.
free_domain_on_dataset_releaseflag to domain be released (by calling mknn_domain_release) during mknn_dataset_release of the new dataset.
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_Empty ( MknnDomain domain,
bool  free_domain_on_dataset_release 
)

Creates a new empty dataset that can dynamically grow as new objects are added.

The new objects are added by mknn_dataset_pushObject.

Parameters
domainthe domain for all the objects to be added to the dataset.
free_domain_on_dataset_releaseflag to domain be released (by calling mknn_domain_release) during mknn_dataset_release of the new dataset.
Returns
a new empty dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_MultiObject ( int64_t  num_subdatasets,
MknnDataset **  subdatasets,
bool  free_subdatasets_on_dataset_release 
)

Creates a new dataset where each object is a multi-object.

Each multi-object is created by combining one object of each subdataset. All datasets must contain the same number of objects.

Parameters
num_subdatasetsnumber of subdatasets to combine, i.e., the size of the multi-object to be created.
subdatasetsthe datasets from which multi-objects will be created.
free_subdatasets_on_dataset_releaseall subdatasets[i] must be released (by calling mknn_dataset_release) during mknn_dataset_release of the new dataset.
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_ParseStringsFile ( const char *  filename)

Creates a new dataset by reading a text file with strings.

The format is one string per line.

Parameters
filenamethe filename to read.
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_ParseVectorFile ( const char *  filename,
MknnDatatype  datatype 
)

Creates a new dataset by reading a text file with vectors.

The format is one vector per line, each dimension separated by tab.

Parameters
filenamethe filename to read.
datatypethe datatype of the objects.
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_PointerArray ( void **  object_array,
int64_t  num_objects,
MknnDomain domain,
bool  free_each_object_on_dataset_release,
bool  free_object_array_on_dataset_release,
bool  free_domain_on_dataset_release 
)

Creates a new dataset from an array of objects.

The objects are read from object_array in the following order:

  • first object: object_array[0]
  • second object: object_array[1]
  • ...
  • last object: object_array[num_objects - 1]
Parameters
object_arraypointer to an array of objects.
num_objectsnumber of object to read from the array.
domainthe domain for the objects in the array.
free_each_object_on_dataset_releasereleases each object_array[i] during mknn_dataset_release.
free_object_array_on_dataset_releasereleases object_array during mknn_dataset_release.
free_domain_on_dataset_releaseflag to domain be released (by calling mknn_domain_release) during mknn_dataset_release of the new dataset.
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_PointerCompactVectors ( void *  vectors_header,
bool  free_vectors_header_on_dataset_release,
int64_t  num_vectors,
int64_t  vector_dimensions,
MknnDatatype  vector_dimension_datatype 
)

Creates a new dataset from a data array.

The objects are read from vectors_header in the following order:

  • first object: vectors_header
  • second object: vectors_header + vector_size
  • ...
  • last object: vectors_header + (num_objects - 1) * vector_size.

The value of vector_size is determined by mknn_domain_vector_getVectorLengthInBytes.

Parameters
vectors_headerpointer to the header of the set of vectors.
free_vectors_header_on_dataset_releasereleases vectors_header during mknn_dataset_release.
num_vectorsnumber of object to read from vectors_header.
vector_dimensionsnumber of dimensions of the vectors.
vector_dimension_datatypedatatype for the vector values.
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_PointerCompactVectors_alt ( void *  vectors_header,
bool  free_vectors_header_on_dataset_release,
int64_t  num_vectors,
MknnDomain domain,
bool  free_domain_on_dataset_release 
)

Creates a new dataset from a data array.

The objects are read from vectors_header in the following order:

  • first object: vectors_header
  • second object: vectors_header + vector_size
  • ...
  • last object: vectors_header + (num_objects - 1) * vector_size.

The value of vector_size is determined by mknn_domain_vector_getVectorLengthInBytes.

Parameters
vectors_headerpointer to the header of the set of vectors.
free_vectors_header_on_dataset_releasereleases vectors_header during mknn_dataset_release.
num_vectorsnumber of object to read from vectors_header.
domainthe domain for the vectors in vectors_header.
free_domain_on_dataset_releaseflag to domain be released (by calling mknn_domain_release) during mknn_dataset_release of the new dataset.
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_reorderNearestNeighbor ( MknnDataset superdataset,
MknnDistance distance,
int64_t  start_position,
bool  free_superdataset_on_release 
)

Creates a new dataset which is a permutation of superdataset, where the first position is given by start_position and following are the consecutive nearest neighbors of the previous one.

Parameters
superdataset
distance
start_positionthe seed object, -1 means the object closer to zero (valid only for vectors).
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_reorderRandomPermutation ( MknnDataset superdataset,
bool  free_superdataset_on_release 
)

Creates a new dataset as a random permutation of superdataset.

Parameters
superdataset
free_superdataset_on_release
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_SubsetPositions ( MknnDataset superdataset,
int64_t *  positions,
int64_t  num_positions,
bool  free_superdataset_on_release 
)

Creates a new dataset which is a subset of a bigger dataset.

Parameters
superdatasetthe dataset to extract objects
positionseach position of the objects to extract from superdataset. The positions are copied to an internal array.
num_positionslength of array positions. it must be greater than zero.
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_SubsetRandomSample ( MknnDataset superdataset,
double  sample_size_or_fraction,
bool  free_superdataset_on_release 
)

Creates a new dataset as a random sample of superdataset.

Parameters
superdataset
free_superdataset_on_release
sample_size_or_fractionif >= 1 is number of elements to sample from superdataset, if between 0 and 1 (exclusive) is the fraction of superdataset elements to sample.
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_SubsetSegment ( MknnDataset superdataset,
int64_t  position_start,
int64_t  length,
bool  free_superdataset_on_release 
)

Creates a new dataset which is a subset of a bigger dataset.

Parameters
superdatasetthe dataset to extract objects
position_startposition of the first objects to extract.
lengthnumber of consecutive objects to extract, starting from position_start. it must be greater than zero.
Returns
a new dataset (it must be released with mknn_dataset_release).
MknnDataset* mknn_datasetLoader_UniformRandomVectors ( int64_t  num_objects,
int64_t  dimension,
double  dimension_minValueIncluded,
double  dimension_maxValueNotIncluded,
MknnDatatype  vectors_dimension_datatype 
)

Creates a new dataset with random vectors of the given datatype.

Each dimension is bounded in [0, dimension_max_value).

Parameters
num_objectsdesired size of the dataset.
dimensionnumber of dimensions to generate.
dimension_minValueIncludedthe minimum value for each dimension (included).
dimension_maxValueNotIncludedthe maximum value for each dimension (not included).
vectors_dimension_datatypethe datatype of the generated vectors.
Returns
a new dataset (it must be released with mknn_dataset_release).
Powered by Download MetricKnn