Difference between revisions of "Approximate Nearest Neighbor Searches of Real World Geometry"

From REU@MU
Jump to: navigation, search
 
(One intermediate revision by the same user not shown)
Line 19: Line 19:
 
== Project Milestones ==
 
== Project Milestones ==
  
{| class="wikitable" style="margin:auto"
+
* Read / Write WKT.
|-
+
* Calculate Jaccard Distance and fill ratios.
! Week !! Description
+
* Create grid over global minimum bounding rectangle.
|-
+
* Create LSH hashes.
| Week 1 ||
+
* Custom Hash Map implementation to allow more concurrency.  
*Orientation Meeting.
+
* Query shapes while avoiding reporting the same neighbor multiple times.
*Meeting on Record Keeping.
+
*Meeting with Dr. Puri to discuss project idea.
+
*Started literature review.
+
|-
+
| Week 2 ||
+
*Responsible Conduct of Research Training.
+
*Meeting on Technical Writing.
+
*Continued literature review.
+
*Began assessing technical requirements of project.
+
| Week 3 ||
+
* Configure development environment with GEOS 3.10.3.
+
* Meeting on LaTeX / BibTeX.
+
* Developed code to parse data from UCR Star.
+
* Began development on remaining preprocessing code.
+
|}
+

Latest revision as of 22:31, 5 August 2022

Student Researcher: Matt Schwennesen

Facility Mentor: Dr. Satish Puri

Project Description

Nearest neighbor searches take a query data point and search the data space for the closest other data point in the space. This task is extremely information in all manner of data processing fields such an information retrieval, text mining and recommender systems. However, since the cost of exact nearest neighbor searches is computationally expensive, most of the recent work in this field has been done for approximate nearest neighbor searches.

Methods such as locality sensitive hashing have proven very useful when the data is text documents and recently graph based methods have been shown to be powerful and performant for vectors in high dimensional Euclidean space.

However, little to no work has been done on approximate nearest neighbor searches for polygons. There are many potential applications of this work, from trying to estimate physical parameters of say bodies of water by finding other bodies with more known features or potential applications in geo-locating images which have been stripped of metadata and passed through a feature extractor.

Unlike most data science problems, this problem is being investigated by efficient data structures rather then machine learning techniques.

Project Goal

Develop a system which can search real world geometric datasets, such as the ones available here, and find approximate nearest neighbors of those shapes in an accurate and performant way. The system will be written in C++ using multithreading.

Project Milestones

  • Read / Write WKT.
  • Calculate Jaccard Distance and fill ratios.
  • Create grid over global minimum bounding rectangle.
  • Create LSH hashes.
  • Custom Hash Map implementation to allow more concurrency.
  • Query shapes while avoiding reporting the same neighbor multiple times.