Difference between revisions of "User:Grberlstein"

From REU@MU
Jump to: navigation, search
(Day 4 (6/14))
(Clustering and Data Science)
Line 14: Line 14:
 
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]
 
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]
 
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]
 
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]
*
 
 
 
  
 
= Project Log For Summer 2017 =
 
= Project Log For Summer 2017 =

Revision as of 00:26, 18 June 2017

Griffin Berlstein

Nominally a person.

Readings

Background

Algorithmic Ethics

Clustering and Data Science

Project Log For Summer 2017

Week One (5/30 - 6/2)

Day 1 (5/30)

  • Attended REU orientation
  • Obtained ID card and computer access
  • Met with Dr. Guha and discussed broad ideas surrounding the project

Day 2 (5/31)

  • Attended Library orientation
  • Finished reading Ethics of Algorithms by Thijs Slot. This was the last of the pre-REU reading.
  • Started reviewing the basics of Python
  • Given crime data sets to review by Dr. Guha

Day 3 (6/1)

  • Attended a meeting on proper research practices by Dr. Factor
  • Set up direct deposit
  • Reviewed the basics of GitHub
  • Continued to review Python
  • Examined crime data and the various ways it was made publically available

Day 4 (6/2)

  • Moved mentor meeting to Wednesday due to scheduling issue
  • Started reading background information provided by Dr. Guha
  • Set up Jupyter notebook and the various dependent libraries
  • Created rough implementation of K-means clustering on random data
  • Obtained card access to Dr. Guha's lab
  • Posted rough, pre-discussion milestones

Week Two (6/5 - 6/9)

Day 1 (6/5)

  • Refined K-means implementation with the K-means++ seeding described in the Data Science Lab article
  • Tested the algorithm on random Gaussian distributions, rather than random points
  • Experimented with visual plotting of the algorithm using Seaborn and Matplotlib

Day 2 (6/6)

  • Attended RCR training
  • Finished reading the relevant sections of Algorithms for Clustering Data
  • Experimented with Scikit-learn's implementation of K-means

Day 3 (6/7)

  • Met with Dr. Guha and discussed the immediate future
  • Set the goal to produce an interactive crime map by next Wednesday
  • Gathered data from website and began sorting

Day 4 (6/8)

  • Created a script to aggregate the data from multiple spreadsheets into a single usable file
  • Looked into potential libraries needed to create the interactive map
  • Ran into issues with the format of the data location
  • Converted the addresses in the data into latitude/longitude coordinates

Day 5 (6/9)

  • Found a publically available shape file of the city
  • Set up the necessary scripts to display the file
  • Ran into an issue with the points not being in the same coordinate system as the shape file

Week Three (6/12 - 6/16)

Day 1 (6/12)

Day 2 (6/13)

Day 3 (6/14)

  • Finished website framework
  • Uploaded initial map version
  • Started on the second version of the map

Day 4 (6/14)

  • Split the data into multiple sets
  • Used K-Means to sort in a variety of ways
  • Wrote a python script to run K-Means multiple times and output results to be fed into D3

Day 5 (6/15)

  • Put modified data into D3 setup for the new map
  • Tweaked basic settings
  • Added ability to display different variations of K-Means on the map