Approximate similarity search with shapes (geometries) using Product Quantization

From REU@MU
Jump to: navigation, search

Title : Creating Approximate similarity search with shapes (geometries) using Product Quantization Mentor : Dr. Satish Puri

Approximate similarity search with shapes (geometries) using Product Quantization Given a dataset/database of shapes or geometries, similarity search for a given query shape returns the few closest shapes from the reference dataset/database. Example query is what are the lakes that are similar to Lake Michigan in shape and size. To do similarity search, the first step is to encode a shape as a high dimensional feature vector. Due to the curse of dimensionality, given the ever-increasing size of datasets, exact nearest neighbor-based similarity search requiring a scan of the entire dataset quickly becomes impractical. There are classical data structures like K-d Tree and similarity search trees which can be used for similarity search with high dimensional vectors. However, the performance of these data structures also degrades with the data dimensionality and the size of the reference dataset.

The REU project will evaluate locality sensitive hashing and product quantization techniques to solve approximate shape similarity search on large datasets from geospatial maps.

Student Background : C/C++ and Python programming.