Revision as of 20:31, 26 June 2017

Griffin Berlstein

Griffin is an undergraduate majoring in Mathematics and Computer Science at Vassar College in Poughkeepsie, New York.

Readings

Background

Algorithmic Ethics

Clustering and Data Science

Project Log For Summer 2017

Week One (5/30 - 6/2)

Day 1 (5/30)

Attended REU orientation
Obtained ID card and computer access
Met with Dr. Guha and discussed broad ideas surrounding the project

Day 2 (5/31)

Attended Library orientation
Finished reading Ethics of Algorithms by Thijs Slot. This was the last of the pre-REU reading.
Started reviewing the basics of Python
Given crime data sets to review by Dr. Guha

Day 3 (6/1)

Attended a meeting on proper research practices by Dr. Factor
Set up direct deposit
Reviewed the basics of GitHub
Continued to review Python
Examined crime data and the various ways it was made publically available

Day 4 (6/2)

Moved mentor meeting to Wednesday due to scheduling issue
Started reading background information provided by Dr. Guha
Set up Jupyter notebook and the various dependent libraries
Created rough implementation of K-means clustering on random data
Obtained card access to Dr. Guha's lab
Posted rough, pre-discussion milestones

Week Two (6/5 - 6/9)

Day 1 (6/5)

Refined K-means implementation with the K-means++ seeding described in the Data Science Lab article
Tested the algorithm on random Gaussian distributions, rather than random points
Experimented with visual plotting of the algorithm using Seaborn and Matplotlib

Day 2 (6/6)

Attended RCR training
Finished reading the relevant sections of Algorithms for Clustering Data
Experimented with Scikit-learn's implementation of K-means

Day 3 (6/7)

Met with Dr. Guha and discussed the immediate future
Set the goal to produce an interactive crime map by next Wednesday
Gathered data from website and began sorting

Day 4 (6/8)

Created a script to aggregate the data from multiple spreadsheets into a single usable file
Looked into potential libraries needed to create the interactive map
Ran into issues with the format of the data location
Converted the addresses in the data into latitude/longitude coordinates

Day 5 (6/9)

Found a publically available shape file of the city
Set up the necessary scripts to display the file
Ran into an issue with the points not being in the same coordinate system as the shape file

Week Three (6/12 - 6/16)

Day 1 (6/12)

Fixed point plotting to align with shapefile
Added choropleth coloring by neighborhood
Started reading The ethics of algorithms: Mapping the debate

Day 2 (6/13)

Finished The ethics of algorithms: Mapping the debate
Started reading Weapons of Math Destruction
Started implementation of website from GitHub
Established the needed dependencies to run a local instance of Jekyll

Day 3 (6/14)

Finished website framework
Uploaded initial map version
Started on the second version of the map

Day 4 (6/14)

Split the data into multiple sets
Used K-Means to sort in a variety of ways
Wrote a python script to run K-Means multiple times and output results to be fed into D3

Day 5 (6/15)

Put modified data into D3 setup for the new map
Tweaked basic settings
Added ability to display different variations of K-Means on the map

Week Four (6/19 - 6/23)

Day 1 (6/19)

Tweaked the map visuals
Added a convex hull to display the cluster borders
Made the convex hull creation dynamic and attached to the data, rather than precomputed in the data frame

Day 2 (6/20)

Added more visual tweaks to the map
Added a grid to the display and fixed inaccurate axis labels
Evaluated the relevancy and accuracy of the different clusters produced

Day 3 (6/21)

Compared Milwaukee crime reports against produced clusters to gauge accuracy
Read Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science
Met with Dr. Guha and discussed the next step in the project

Day 4 (6/22)

Started programming a (mostly) vectorized implementation of K-Means to later modify
Continued reading Weapons of Math Destruction
Had the weekly working lunch and began early outlines of the mini-presentations

Day 5 (6/23)

Fixed vectorized implementation of K-Means
Tested implementation on random datasets and compared with the results of Sci-Kit Learn's implementation
Implemented a geodesic distance metric using the Haversine great circle distance formula
Modified my implementation of K-Means to use the new distance metric and build a test framework to compare clustering with the geodesic distance and euclidan distance from the same set of starting points.

@@ Line 1: / Line 1: @@
 == '''Griffin Berlstein''' ==
-Nominally a person.
+Griffin is an undergraduate majoring in Mathematics and Computer Science at Vassar College in Poughkeepsie, New York.
 = Readings =

Difference between revisions of "User:Grberlstein"

Revision as of 20:31, 26 June 2017

Contents

Griffin Berlstein

Readings

Background

Algorithmic Ethics

Clustering and Data Science

Project Log For Summer 2017

Week One (5/30 - 6/2)

Day 1 (5/30)

Day 2 (5/31)

Day 3 (6/1)

Day 4 (6/2)

Week Two (6/5 - 6/9)

Day 1 (6/5)

Day 2 (6/6)

Day 3 (6/7)

Day 4 (6/8)

Day 5 (6/9)

Week Three (6/12 - 6/16)

Day 1 (6/12)

Day 2 (6/13)

Day 3 (6/14)

Day 4 (6/14)

Day 5 (6/15)

Week Four (6/19 - 6/23)

Day 1 (6/19)

Day 2 (6/20)

Day 3 (6/21)

Day 4 (6/22)

Day 5 (6/23)

Week Five (6/26 - 6/30)

Navigation menu

Search