https://reu.cs.mu.edu/api.php?action=feedcontributions&user=Grberlstein&feedformat=atomREU@MU - User contributions [en]2024-03-29T02:05:22ZUser contributionsMediaWiki 1.23.13https://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-08-07T16:27:54Z<p>Grberlstein: </p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
<br />
Griffin is an incoming junior majoring in Mathematics and Computer Science at Vassar College in Poughkeepsie, New York.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
*[http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
*[http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means<br />
<br />
==='''Day 3 (6/7)'''===<br />
*Met with Dr. Guha and discussed the immediate future<br />
*Set the goal to produce an interactive crime map by next Wednesday<br />
*Gathered data from website and began sorting<br />
<br />
==='''Day 4 (6/8)'''===<br />
*Created a script to aggregate the data from multiple spreadsheets into a single usable file<br />
*Looked into potential libraries needed to create the interactive map<br />
*Ran into issues with the format of the data location<br />
*Converted the addresses in the data into latitude/longitude coordinates<br />
<br />
==='''Day 5 (6/9)'''===<br />
*Found a publically available shape file of the city<br />
*Set up the necessary scripts to display the file<br />
*Ran into an issue with the points not being in the same coordinate system as the shape file<br />
<br />
=='''Week Three (6/12 - 6/16)'''==<br />
==='''Day 1 (6/12)'''===<br />
*Fixed point plotting to align with shapefile<br />
*Added choropleth coloring by neighborhood<br />
*Started reading [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
==='''Day 2 (6/13)'''===<br />
*Finished [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
*Started reading ''Weapons of Math Destruction''<br />
*Started implementation of website from GitHub<br />
*Established the needed dependencies to run a local instance of Jekyll<br />
==='''Day 3 (6/14)'''===<br />
*Finished website framework<br />
*Uploaded initial map version<br />
*Started on the second version of the map<br />
==='''Day 4 (6/14)'''===<br />
*Split the data into multiple sets<br />
*Used K-Means to sort in a variety of ways<br />
*Wrote a python script to run K-Means multiple times and output results to be fed into D3<br />
<br />
==='''Day 5 (6/15)'''===<br />
*Put modified data into D3 setup for the new map<br />
*Tweaked basic settings<br />
*Added ability to display different variations of K-Means on the map<br />
<br />
=='''Week Four (6/19 - 6/23)'''==<br />
==='''Day 1 (6/19)'''===<br />
*Tweaked the map visuals<br />
*Added a convex hull to display the cluster borders<br />
*Made the convex hull creation dynamic and attached to the data, rather than precomputed in the data frame<br />
==='''Day 2 (6/20)'''===<br />
*Added more visual tweaks to the map<br />
*Added a grid to the display and fixed inaccurate axis labels<br />
*Evaluated the relevancy and accuracy of the different clusters produced<br />
==='''Day 3 (6/21)'''===<br />
*Compared Milwaukee crime reports against produced clusters to gauge accuracy<br />
*Read [http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
*Met with Dr. Guha and discussed the next step in the project<br />
==='''Day 4 (6/22)'''===<br />
*Started programming a (mostly) vectorized implementation of K-Means to later modify<br />
*Continued reading ''Weapons of Math Destruction''<br />
*Had the weekly working lunch and began early outlines of the mini-presentations<br />
<br />
==='''Day 5 (6/23)'''===<br />
*Fixed vectorized implementation of K-Means<br />
*Tested implementation on random datasets and compared with the results of Sci-Kit Learn's implementation<br />
*Implemented a geodesic distance metric using the Haversine great circle distance formula<br />
*Modified my implementation of K-Means to use the new distance metric and build a test framework to compare clustering with the geodesic distance and Euclidean distance from the same set of starting points.<br />
<br />
=='''Week Five (6/26 - 6/30)'''==<br />
*Tested side-by-side visualizations for geodesic vs Euclidean clusterings<br />
*Experimented with multiple methods of visualizations<br />
*Finished reading ''Weapons of Math Destruction''<br />
*Gave mini-presentation<br />
*Ran into difficulties with the public datasets on the Milwaukee website<br />
<br />
=='''Week Six (7/3 - 7/7)'''==<br />
*Got better datasets and resolved issues with publically available census data<br />
*Overlayed demographic information on the maps<br />
*Generated a demographic breakdown for each cluster and compared geodesic vs euclidean<br />
*Merged functionality from different versions of the map<br />
<br />
=='''Weeks Seven to Nine (7/10 - 7/28)'''==<br />
*Failed to do logs consistently<br />
*Finalized demographic overlay<br />
*Implemented multiple version of a potential bias index<br />
*Ran experiments on the data to get trends about the potential bias index<br />
*Expanded data set to include all available years worth of data<br />
*Geocoded all of the new data<br />
*Expanded map functionality to include potential bias index and cluster similarity<br />
*Added interactive graphs for demographics and potential bias<br />
*Moved potential bias calculations to Python to allow for faster web access<br />
*Read lots of papers for the literature review<br />
*Wrote a rough draft of the literature review<br />
*Created the poster for the poster session<br />
<br />
=='''Week Ten (7/31 - 8/4)'''==<br />
*Gave poster presentation<br />
*Gave REU project presentation<br />
*Reconfigured the maps to work with the other half of the data set<br />
*Created more graphics for the paper<br />
*Wrote a (very) rough draft of the discussion section<br />
*Read a few more sources for the paper<br />
*Minor tweaks to the map's color palette<br />
*Departed for home</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-07-26T18:51:15Z<p>Grberlstein: </p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
<br />
Griffin is an incoming junior majoring in Mathematics and Computer Science at Vassar College in Poughkeepsie, New York.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
*[http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
*[http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means<br />
<br />
==='''Day 3 (6/7)'''===<br />
*Met with Dr. Guha and discussed the immediate future<br />
*Set the goal to produce an interactive crime map by next Wednesday<br />
*Gathered data from website and began sorting<br />
<br />
==='''Day 4 (6/8)'''===<br />
*Created a script to aggregate the data from multiple spreadsheets into a single usable file<br />
*Looked into potential libraries needed to create the interactive map<br />
*Ran into issues with the format of the data location<br />
*Converted the addresses in the data into latitude/longitude coordinates<br />
<br />
==='''Day 5 (6/9)'''===<br />
*Found a publically available shape file of the city<br />
*Set up the necessary scripts to display the file<br />
*Ran into an issue with the points not being in the same coordinate system as the shape file<br />
<br />
=='''Week Three (6/12 - 6/16)'''==<br />
==='''Day 1 (6/12)'''===<br />
*Fixed point plotting to align with shapefile<br />
*Added choropleth coloring by neighborhood<br />
*Started reading [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
==='''Day 2 (6/13)'''===<br />
*Finished [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
*Started reading ''Weapons of Math Destruction''<br />
*Started implementation of website from GitHub<br />
*Established the needed dependencies to run a local instance of Jekyll<br />
==='''Day 3 (6/14)'''===<br />
*Finished website framework<br />
*Uploaded initial map version<br />
*Started on the second version of the map<br />
==='''Day 4 (6/14)'''===<br />
*Split the data into multiple sets<br />
*Used K-Means to sort in a variety of ways<br />
*Wrote a python script to run K-Means multiple times and output results to be fed into D3<br />
<br />
==='''Day 5 (6/15)'''===<br />
*Put modified data into D3 setup for the new map<br />
*Tweaked basic settings<br />
*Added ability to display different variations of K-Means on the map<br />
<br />
=='''Week Four (6/19 - 6/23)'''==<br />
==='''Day 1 (6/19)'''===<br />
*Tweaked the map visuals<br />
*Added a convex hull to display the cluster borders<br />
*Made the convex hull creation dynamic and attached to the data, rather than precomputed in the data frame<br />
==='''Day 2 (6/20)'''===<br />
*Added more visual tweaks to the map<br />
*Added a grid to the display and fixed inaccurate axis labels<br />
*Evaluated the relevancy and accuracy of the different clusters produced<br />
==='''Day 3 (6/21)'''===<br />
*Compared Milwaukee crime reports against produced clusters to gauge accuracy<br />
*Read [http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
*Met with Dr. Guha and discussed the next step in the project<br />
==='''Day 4 (6/22)'''===<br />
*Started programming a (mostly) vectorized implementation of K-Means to later modify<br />
*Continued reading ''Weapons of Math Destruction''<br />
*Had the weekly working lunch and began early outlines of the mini-presentations<br />
<br />
==='''Day 5 (6/23)'''===<br />
*Fixed vectorized implementation of K-Means<br />
*Tested implementation on random datasets and compared with the results of Sci-Kit Learn's implementation<br />
*Implemented a geodesic distance metric using the Haversine great circle distance formula<br />
*Modified my implementation of K-Means to use the new distance metric and build a test framework to compare clustering with the geodesic distance and Euclidean distance from the same set of starting points.<br />
<br />
=='''Week Five (6/26 - 6/30)'''==<br />
*Tested side-by-side visualizations for geodesic vs Euclidean clusterings<br />
*Experimented with multiple methods of visualizations<br />
*Finished reading ''Weapons of Math Destruction''<br />
*Gave mini-presentation<br />
*Ran into difficulties with the public datasets on the Milwaukee website<br />
<br />
=='''Week Six (7/3 - 7/7)'''==<br />
*Got better datasets and resolved issues with publically available census data<br />
*Overlayed demographic information on the maps<br />
*Generated a demographic breakdown for each cluster and compared geodesic vs euclidean<br />
*Merged functionality from different versions of the map<br />
<br />
=='''Weeks Seven to Nine (7/10 - 7/28)'''==<br />
*Failed to do logs consistently<br />
*Finalized demographic overlay<br />
*Implemented multiple version of a potential bias index<br />
*Ran experiments on the data to get trends about the potential bias index<br />
*Expanded data set to include all available years worth of data<br />
*Geocoded all of the new data<br />
*Expanded map functionality to include potential bias index and cluster similarity<br />
*Added interactive graphs for demographics and potential bias<br />
*Moved potential bias calculations to Python to allow for faster web access<br />
*Read lots of papers for the literature review<br />
*Wrote a rough draft of the literature review<br />
*Created the poster for the poster session</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-07-06T14:16:31Z<p>Grberlstein: </p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
<br />
Griffin is an incoming junior majoring in Mathematics and Computer Science at Vassar College in Poughkeepsie, New York.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
*[http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
*[http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means<br />
<br />
==='''Day 3 (6/7)'''===<br />
*Met with Dr. Guha and discussed the immediate future<br />
*Set the goal to produce an interactive crime map by next Wednesday<br />
*Gathered data from website and began sorting<br />
<br />
==='''Day 4 (6/8)'''===<br />
*Created a script to aggregate the data from multiple spreadsheets into a single usable file<br />
*Looked into potential libraries needed to create the interactive map<br />
*Ran into issues with the format of the data location<br />
*Converted the addresses in the data into latitude/longitude coordinates<br />
<br />
==='''Day 5 (6/9)'''===<br />
*Found a publically available shape file of the city<br />
*Set up the necessary scripts to display the file<br />
*Ran into an issue with the points not being in the same coordinate system as the shape file<br />
<br />
=='''Week Three (6/12 - 6/16)'''==<br />
==='''Day 1 (6/12)'''===<br />
*Fixed point plotting to align with shapefile<br />
*Added choropleth coloring by neighborhood<br />
*Started reading [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
==='''Day 2 (6/13)'''===<br />
*Finished [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
*Started reading ''Weapons of Math Destruction''<br />
*Started implementation of website from GitHub<br />
*Established the needed dependencies to run a local instance of Jekyll<br />
==='''Day 3 (6/14)'''===<br />
*Finished website framework<br />
*Uploaded initial map version<br />
*Started on the second version of the map<br />
==='''Day 4 (6/14)'''===<br />
*Split the data into multiple sets<br />
*Used K-Means to sort in a variety of ways<br />
*Wrote a python script to run K-Means multiple times and output results to be fed into D3<br />
<br />
==='''Day 5 (6/15)'''===<br />
*Put modified data into D3 setup for the new map<br />
*Tweaked basic settings<br />
*Added ability to display different variations of K-Means on the map<br />
<br />
=='''Week Four (6/19 - 6/23)'''==<br />
==='''Day 1 (6/19)'''===<br />
*Tweaked the map visuals<br />
*Added a convex hull to display the cluster borders<br />
*Made the convex hull creation dynamic and attached to the data, rather than precomputed in the data frame<br />
==='''Day 2 (6/20)'''===<br />
*Added more visual tweaks to the map<br />
*Added a grid to the display and fixed inaccurate axis labels<br />
*Evaluated the relevancy and accuracy of the different clusters produced<br />
==='''Day 3 (6/21)'''===<br />
*Compared Milwaukee crime reports against produced clusters to gauge accuracy<br />
*Read [http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
*Met with Dr. Guha and discussed the next step in the project<br />
==='''Day 4 (6/22)'''===<br />
*Started programming a (mostly) vectorized implementation of K-Means to later modify<br />
*Continued reading ''Weapons of Math Destruction''<br />
*Had the weekly working lunch and began early outlines of the mini-presentations<br />
<br />
==='''Day 5 (6/23)'''===<br />
*Fixed vectorized implementation of K-Means<br />
*Tested implementation on random datasets and compared with the results of Sci-Kit Learn's implementation<br />
*Implemented a geodesic distance metric using the Haversine great circle distance formula<br />
*Modified my implementation of K-Means to use the new distance metric and build a test framework to compare clustering with the geodesic distance and Euclidean distance from the same set of starting points.<br />
<br />
=='''Week Five (6/26 - 6/30)'''==<br />
*Tested side-by-side visualizations for geodesic vs euclidean clusterings<br />
*Experimented with multiple methods of visualizations<br />
*Finished reading ''Weapons of Math Destruction''<br />
*Gave mini-presentation<br />
*Ran into difficulties with the public datasets on the Milwaukee website<br />
<br />
=='''Week Six (7/3 - 7/7)'''==<br />
*Got better datasets and resolved issues with publically available census data<br />
*Overlayed demographic information on the maps<br />
*Generated a demographic breakdown for each cluster and compared geodesic vs euclidean<br />
*Merged functionality from different versions of the map</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-26T20:32:14Z<p>Grberlstein: /* Griffin Berlstein */</p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
<br />
Griffin is an incoming junior majoring in Mathematics and Computer Science at Vassar College in Poughkeepsie, New York.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
*[http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
*[http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means<br />
<br />
==='''Day 3 (6/7)'''===<br />
*Met with Dr. Guha and discussed the immediate future<br />
*Set the goal to produce an interactive crime map by next Wednesday<br />
*Gathered data from website and began sorting<br />
<br />
==='''Day 4 (6/8)'''===<br />
*Created a script to aggregate the data from multiple spreadsheets into a single usable file<br />
*Looked into potential libraries needed to create the interactive map<br />
*Ran into issues with the format of the data location<br />
*Converted the addresses in the data into latitude/longitude coordinates<br />
<br />
==='''Day 5 (6/9)'''===<br />
*Found a publically available shape file of the city<br />
*Set up the necessary scripts to display the file<br />
*Ran into an issue with the points not being in the same coordinate system as the shape file<br />
<br />
=='''Week Three (6/12 - 6/16)'''==<br />
==='''Day 1 (6/12)'''===<br />
*Fixed point plotting to align with shapefile<br />
*Added choropleth coloring by neighborhood<br />
*Started reading [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
==='''Day 2 (6/13)'''===<br />
*Finished [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
*Started reading ''Weapons of Math Destruction''<br />
*Started implementation of website from GitHub<br />
*Established the needed dependencies to run a local instance of Jekyll<br />
==='''Day 3 (6/14)'''===<br />
*Finished website framework<br />
*Uploaded initial map version<br />
*Started on the second version of the map<br />
==='''Day 4 (6/14)'''===<br />
*Split the data into multiple sets<br />
*Used K-Means to sort in a variety of ways<br />
*Wrote a python script to run K-Means multiple times and output results to be fed into D3<br />
<br />
==='''Day 5 (6/15)'''===<br />
*Put modified data into D3 setup for the new map<br />
*Tweaked basic settings<br />
*Added ability to display different variations of K-Means on the map<br />
<br />
=='''Week Four (6/19 - 6/23)'''==<br />
==='''Day 1 (6/19)'''===<br />
*Tweaked the map visuals<br />
*Added a convex hull to display the cluster borders<br />
*Made the convex hull creation dynamic and attached to the data, rather than precomputed in the data frame<br />
==='''Day 2 (6/20)'''===<br />
*Added more visual tweaks to the map<br />
*Added a grid to the display and fixed inaccurate axis labels<br />
*Evaluated the relevancy and accuracy of the different clusters produced<br />
==='''Day 3 (6/21)'''===<br />
*Compared Milwaukee crime reports against produced clusters to gauge accuracy<br />
*Read [http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
*Met with Dr. Guha and discussed the next step in the project<br />
==='''Day 4 (6/22)'''===<br />
*Started programming a (mostly) vectorized implementation of K-Means to later modify<br />
*Continued reading ''Weapons of Math Destruction''<br />
*Had the weekly working lunch and began early outlines of the mini-presentations<br />
<br />
==='''Day 5 (6/23)'''===<br />
*Fixed vectorized implementation of K-Means<br />
*Tested implementation on random datasets and compared with the results of Sci-Kit Learn's implementation<br />
*Implemented a geodesic distance metric using the Haversine great circle distance formula<br />
*Modified my implementation of K-Means to use the new distance metric and build a test framework to compare clustering with the geodesic distance and euclidan distance from the same set of starting points.<br />
<br />
=='''Week Five (6/26 - 6/30)'''==</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-26T20:31:15Z<p>Grberlstein: </p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
<br />
Griffin is an undergraduate majoring in Mathematics and Computer Science at Vassar College in Poughkeepsie, New York.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
*[http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
*[http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means<br />
<br />
==='''Day 3 (6/7)'''===<br />
*Met with Dr. Guha and discussed the immediate future<br />
*Set the goal to produce an interactive crime map by next Wednesday<br />
*Gathered data from website and began sorting<br />
<br />
==='''Day 4 (6/8)'''===<br />
*Created a script to aggregate the data from multiple spreadsheets into a single usable file<br />
*Looked into potential libraries needed to create the interactive map<br />
*Ran into issues with the format of the data location<br />
*Converted the addresses in the data into latitude/longitude coordinates<br />
<br />
==='''Day 5 (6/9)'''===<br />
*Found a publically available shape file of the city<br />
*Set up the necessary scripts to display the file<br />
*Ran into an issue with the points not being in the same coordinate system as the shape file<br />
<br />
=='''Week Three (6/12 - 6/16)'''==<br />
==='''Day 1 (6/12)'''===<br />
*Fixed point plotting to align with shapefile<br />
*Added choropleth coloring by neighborhood<br />
*Started reading [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
==='''Day 2 (6/13)'''===<br />
*Finished [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
*Started reading ''Weapons of Math Destruction''<br />
*Started implementation of website from GitHub<br />
*Established the needed dependencies to run a local instance of Jekyll<br />
==='''Day 3 (6/14)'''===<br />
*Finished website framework<br />
*Uploaded initial map version<br />
*Started on the second version of the map<br />
==='''Day 4 (6/14)'''===<br />
*Split the data into multiple sets<br />
*Used K-Means to sort in a variety of ways<br />
*Wrote a python script to run K-Means multiple times and output results to be fed into D3<br />
<br />
==='''Day 5 (6/15)'''===<br />
*Put modified data into D3 setup for the new map<br />
*Tweaked basic settings<br />
*Added ability to display different variations of K-Means on the map<br />
<br />
=='''Week Four (6/19 - 6/23)'''==<br />
==='''Day 1 (6/19)'''===<br />
*Tweaked the map visuals<br />
*Added a convex hull to display the cluster borders<br />
*Made the convex hull creation dynamic and attached to the data, rather than precomputed in the data frame<br />
==='''Day 2 (6/20)'''===<br />
*Added more visual tweaks to the map<br />
*Added a grid to the display and fixed inaccurate axis labels<br />
*Evaluated the relevancy and accuracy of the different clusters produced<br />
==='''Day 3 (6/21)'''===<br />
*Compared Milwaukee crime reports against produced clusters to gauge accuracy<br />
*Read [http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
*Met with Dr. Guha and discussed the next step in the project<br />
==='''Day 4 (6/22)'''===<br />
*Started programming a (mostly) vectorized implementation of K-Means to later modify<br />
*Continued reading ''Weapons of Math Destruction''<br />
*Had the weekly working lunch and began early outlines of the mini-presentations<br />
<br />
==='''Day 5 (6/23)'''===<br />
*Fixed vectorized implementation of K-Means<br />
*Tested implementation on random datasets and compared with the results of Sci-Kit Learn's implementation<br />
*Implemented a geodesic distance metric using the Haversine great circle distance formula<br />
*Modified my implementation of K-Means to use the new distance metric and build a test framework to compare clustering with the geodesic distance and euclidan distance from the same set of starting points.<br />
<br />
=='''Week Five (6/26 - 6/30)'''==</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-26T14:48:56Z<p>Grberlstein: /* Day 4 (6/22) */</p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
*[http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
*[http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means<br />
<br />
==='''Day 3 (6/7)'''===<br />
*Met with Dr. Guha and discussed the immediate future<br />
*Set the goal to produce an interactive crime map by next Wednesday<br />
*Gathered data from website and began sorting<br />
<br />
==='''Day 4 (6/8)'''===<br />
*Created a script to aggregate the data from multiple spreadsheets into a single usable file<br />
*Looked into potential libraries needed to create the interactive map<br />
*Ran into issues with the format of the data location<br />
*Converted the addresses in the data into latitude/longitude coordinates<br />
<br />
==='''Day 5 (6/9)'''===<br />
*Found a publically available shape file of the city<br />
*Set up the necessary scripts to display the file<br />
*Ran into an issue with the points not being in the same coordinate system as the shape file<br />
<br />
=='''Week Three (6/12 - 6/16)'''==<br />
==='''Day 1 (6/12)'''===<br />
*Fixed point plotting to align with shapefile<br />
*Added choropleth coloring by neighborhood<br />
*Started reading [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
==='''Day 2 (6/13)'''===<br />
*Finished [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
*Started reading ''Weapons of Math Destruction''<br />
*Started implementation of website from GitHub<br />
*Established the needed dependencies to run a local instance of Jekyll<br />
==='''Day 3 (6/14)'''===<br />
*Finished website framework<br />
*Uploaded initial map version<br />
*Started on the second version of the map<br />
==='''Day 4 (6/14)'''===<br />
*Split the data into multiple sets<br />
*Used K-Means to sort in a variety of ways<br />
*Wrote a python script to run K-Means multiple times and output results to be fed into D3<br />
<br />
==='''Day 5 (6/15)'''===<br />
*Put modified data into D3 setup for the new map<br />
*Tweaked basic settings<br />
*Added ability to display different variations of K-Means on the map<br />
<br />
=='''Week Four (6/19 - 6/23)'''==<br />
==='''Day 1 (6/19)'''===<br />
*Tweaked the map visuals<br />
*Added a convex hull to display the cluster borders<br />
*Made the convex hull creation dynamic and attached to the data, rather than precomputed in the data frame<br />
==='''Day 2 (6/20)'''===<br />
*Added more visual tweaks to the map<br />
*Added a grid to the display and fixed inaccurate axis labels<br />
*Evaluated the relevancy and accuracy of the different clusters produced<br />
==='''Day 3 (6/21)'''===<br />
*Compared Milwaukee crime reports against produced clusters to gauge accuracy<br />
*Read [http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
*Met with Dr. Guha and discussed the next step in the project<br />
==='''Day 4 (6/22)'''===<br />
*Started programming a (mostly) vectorized implementation of K-Means to later modify<br />
*Continued reading ''Weapons of Math Destruction''<br />
*Had the weekly working lunch and began early outlines of the mini-presentations<br />
<br />
==='''Day 5 (6/23)'''===<br />
*Fixed vectorized implementation of K-Means<br />
*Tested implementation on random datasets and compared with the results of Sci-Kit Learn's implementation<br />
*Implemented a geodesic distance metric using the Haversine great circle distance formula<br />
*Modified my implementation of K-Means to use the new distance metric and build a test framework to compare clustering with the geodesic distance and euclidan distance from the same set of starting points.<br />
<br />
=='''Week Five (6/26 - 6/30)'''==</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-26T14:48:02Z<p>Grberlstein: </p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
*[http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
*[http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means<br />
<br />
==='''Day 3 (6/7)'''===<br />
*Met with Dr. Guha and discussed the immediate future<br />
*Set the goal to produce an interactive crime map by next Wednesday<br />
*Gathered data from website and began sorting<br />
<br />
==='''Day 4 (6/8)'''===<br />
*Created a script to aggregate the data from multiple spreadsheets into a single usable file<br />
*Looked into potential libraries needed to create the interactive map<br />
*Ran into issues with the format of the data location<br />
*Converted the addresses in the data into latitude/longitude coordinates<br />
<br />
==='''Day 5 (6/9)'''===<br />
*Found a publically available shape file of the city<br />
*Set up the necessary scripts to display the file<br />
*Ran into an issue with the points not being in the same coordinate system as the shape file<br />
<br />
=='''Week Three (6/12 - 6/16)'''==<br />
==='''Day 1 (6/12)'''===<br />
*Fixed point plotting to align with shapefile<br />
*Added choropleth coloring by neighborhood<br />
*Started reading [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
==='''Day 2 (6/13)'''===<br />
*Finished [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
*Started reading ''Weapons of Math Destruction''<br />
*Started implementation of website from GitHub<br />
*Established the needed dependencies to run a local instance of Jekyll<br />
==='''Day 3 (6/14)'''===<br />
*Finished website framework<br />
*Uploaded initial map version<br />
*Started on the second version of the map<br />
==='''Day 4 (6/14)'''===<br />
*Split the data into multiple sets<br />
*Used K-Means to sort in a variety of ways<br />
*Wrote a python script to run K-Means multiple times and output results to be fed into D3<br />
<br />
==='''Day 5 (6/15)'''===<br />
*Put modified data into D3 setup for the new map<br />
*Tweaked basic settings<br />
*Added ability to display different variations of K-Means on the map<br />
<br />
=='''Week Four (6/19 - 6/23)'''==<br />
==='''Day 1 (6/19)'''===<br />
*Tweaked the map visuals<br />
*Added a convex hull to display the cluster borders<br />
*Made the convex hull creation dynamic and attached to the data, rather than precomputed in the data frame<br />
==='''Day 2 (6/20)'''===<br />
*Added more visual tweaks to the map<br />
*Added a grid to the display and fixed inaccurate axis labels<br />
*Evaluated the relevancy and accuracy of the different clusters produced<br />
==='''Day 3 (6/21)'''===<br />
*Compared Milwaukee crime reports against produced clusters to gauge accuracy<br />
*Read [http://online.liebertpub.com/doi/abs/10.1089/big.2016.0050 Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science]<br />
*Met with Dr. Guha and discussed the next step in the project<br />
==='''Day 4 (6/22)'''===<br />
*Started programming a (mostly) vectorized implementation of K-Means to later modify<br />
*Continued reading ''Weapons of Math Destruction''<br />
==='''Day 5 (6/23)'''===<br />
*Fixed vectorized implementation of K-Means<br />
*Tested implementation on random datasets and compared with the results of Sci-Kit Learn's implementation<br />
*Implemented a geodesic distance metric using the Haversine great circle distance formula<br />
*Modified my implementation of K-Means to use the new distance metric and build a test framework to compare clustering with the geodesic distance and euclidan distance from the same set of starting points.<br />
<br />
=='''Week Five (6/26 - 6/30)'''==</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-18T00:26:36Z<p>Grberlstein: /* Clustering and Data Science */</p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
*[http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means<br />
<br />
==='''Day 3 (6/7)'''===<br />
*Met with Dr. Guha and discussed the immediate future<br />
*Set the goal to produce an interactive crime map by next Wednesday<br />
*Gathered data from website and began sorting<br />
<br />
==='''Day 4 (6/8)'''===<br />
*Created a script to aggregate the data from multiple spreadsheets into a single usable file<br />
*Looked into potential libraries needed to create the interactive map<br />
*Ran into issues with the format of the data location<br />
*Converted the addresses in the data into latitude/longitude coordinates<br />
<br />
==='''Day 5 (6/9)'''===<br />
*Found a publically available shape file of the city<br />
*Set up the necessary scripts to display the file<br />
*Ran into an issue with the points not being in the same coordinate system as the shape file<br />
<br />
=='''Week Three (6/12 - 6/16)'''==<br />
==='''Day 1 (6/12)'''===<br />
*Fixed point plotting to align with shapefile<br />
*Added choropleth coloring by neighborhood<br />
*Started reading [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
==='''Day 2 (6/13)'''===<br />
*Finished [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
*Started reading ''Weapons of Math Destruction''<br />
*Started implementation of website from GitHub<br />
*Established the needed dependencies to run a local instance of Jekyll<br />
==='''Day 3 (6/14)'''===<br />
*Finished website framework<br />
*Uploaded initial map version<br />
*Started on the second version of the map<br />
==='''Day 4 (6/14)'''===<br />
*Split the data into multiple sets<br />
*Used K-Means to sort in a variety of ways<br />
*Wrote a python script to run K-Means multiple times and output results to be fed into D3<br />
<br />
==='''Day 5 (6/15)'''===<br />
*Put modified data into D3 setup for the new map<br />
*Tweaked basic settings<br />
*Added ability to display different variations of K-Means on the map</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-18T00:26:21Z<p>Grberlstein: /* Day 4 (6/14) */</p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
*[http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
*<br />
<br />
<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means<br />
<br />
==='''Day 3 (6/7)'''===<br />
*Met with Dr. Guha and discussed the immediate future<br />
*Set the goal to produce an interactive crime map by next Wednesday<br />
*Gathered data from website and began sorting<br />
<br />
==='''Day 4 (6/8)'''===<br />
*Created a script to aggregate the data from multiple spreadsheets into a single usable file<br />
*Looked into potential libraries needed to create the interactive map<br />
*Ran into issues with the format of the data location<br />
*Converted the addresses in the data into latitude/longitude coordinates<br />
<br />
==='''Day 5 (6/9)'''===<br />
*Found a publically available shape file of the city<br />
*Set up the necessary scripts to display the file<br />
*Ran into an issue with the points not being in the same coordinate system as the shape file<br />
<br />
=='''Week Three (6/12 - 6/16)'''==<br />
==='''Day 1 (6/12)'''===<br />
*Fixed point plotting to align with shapefile<br />
*Added choropleth coloring by neighborhood<br />
*Started reading [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
==='''Day 2 (6/13)'''===<br />
*Finished [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
*Started reading ''Weapons of Math Destruction''<br />
*Started implementation of website from GitHub<br />
*Established the needed dependencies to run a local instance of Jekyll<br />
==='''Day 3 (6/14)'''===<br />
*Finished website framework<br />
*Uploaded initial map version<br />
*Started on the second version of the map<br />
==='''Day 4 (6/14)'''===<br />
*Split the data into multiple sets<br />
*Used K-Means to sort in a variety of ways<br />
*Wrote a python script to run K-Means multiple times and output results to be fed into D3<br />
<br />
==='''Day 5 (6/15)'''===<br />
*Put modified data into D3 setup for the new map<br />
*Tweaked basic settings<br />
*Added ability to display different variations of K-Means on the map</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-18T00:22:36Z<p>Grberlstein: </p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
*[http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
*<br />
<br />
<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means<br />
<br />
==='''Day 3 (6/7)'''===<br />
*Met with Dr. Guha and discussed the immediate future<br />
*Set the goal to produce an interactive crime map by next Wednesday<br />
*Gathered data from website and began sorting<br />
<br />
==='''Day 4 (6/8)'''===<br />
*Created a script to aggregate the data from multiple spreadsheets into a single usable file<br />
*Looked into potential libraries needed to create the interactive map<br />
*Ran into issues with the format of the data location<br />
*Converted the addresses in the data into latitude/longitude coordinates<br />
<br />
==='''Day 5 (6/9)'''===<br />
*Found a publically available shape file of the city<br />
*Set up the necessary scripts to display the file<br />
*Ran into an issue with the points not being in the same coordinate system as the shape file<br />
<br />
=='''Week Three (6/12 - 6/16)'''==<br />
==='''Day 1 (6/12)'''===<br />
*Fixed point plotting to align with shapefile<br />
*Added choropleth coloring by neighborhood<br />
*Started reading [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
==='''Day 2 (6/13)'''===<br />
*Finished [http://journals.sagepub.com/doi/full/10.1177/2053951716679679 The ethics of algorithms: Mapping the debate]<br />
*Started reading ''Weapons of Math Destruction''<br />
*Started implementation of website from GitHub<br />
*Established the needed dependencies to run a local instance of Jekyll<br />
==='''Day 3 (6/14)'''===<br />
*Finished website framework<br />
*Uploaded initial map version<br />
*Started on the second version of the map<br />
==='''Day 4 (6/14)'''===<br />
*Split the data into multiple sets<br />
*Used K-Means to sort in a variety of ways<br />
*Wrote a python script to run K-Means multiple times and aggregate the results for display<br />
==='''Day 5 (6/15)'''===<br />
*Put modified data into D3 setup for the new map<br />
*Tweaked basic settings<br />
*Added ability to display different variations of K-Means on the map</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-11T20:51:52Z<p>Grberlstein: </p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
<br />
<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means<br />
<br />
==='''Day 3 (6/7)'''===<br />
*Met with Dr. Guha and discussed the immediate future<br />
*Set the goal to produce an interactive crime map by next Wednesday<br />
*Gathered data from website and began sorting<br />
<br />
==='''Day 4 (6/8)'''===<br />
*Created a script to aggregate the data from multiple spreadsheets into a single usable file<br />
*Looked into potential libraries needed to create the interactive map<br />
*Ran into issues with the format of the data location<br />
*Converted the addresses in the data into latitude/longitude coordinates<br />
<br />
==='''Day 5 (6/9)'''===<br />
*Found a publically available shape file of the city<br />
*Set up the necessary scripts to display the file<br />
*Ran into an issue with the points not being in the same coordinate system as the shape file<br />
<br />
=='''Week Three (6/12 - 6/16)'''==</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-06T22:34:19Z<p>Grberlstein: /* Day 1 (6/5) */</p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
<br />
<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Tested the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-06T22:33:59Z<p>Grberlstein: /* Day 1 (6/5) */</p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
<br />
<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Started testing the algorithm on random Gaussian distributions, rather than random points<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-06T22:33:14Z<p>Grberlstein: </p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
= Readings =<br />
== Background ==<br />
=== Algorithmic Ethics ===<br />
*[http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms]<br />
*[https://link.springer.com/article/10.1007/s10676-010-9233-7 Is There an Ethics of Algorithms?]<br />
*[http://journals.sagepub.com/doi/abs/10.1177/0162243915606523 Toward an Ethics of Algorithms]<br />
*[https://arxiv.org/pdf/1704.01347.pdf Quantifying Search Bias]<br />
*[https://pdfs.semanticscholar.org/e092/65ed8eee4c7b35e3ebe53b5d75492b4628a2.pdf Understanding and Designing around Users' Interaction with Hidden Algorithms in Sociotechnical Systems]<br />
=== Clustering and Data Science ===<br />
*[http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*[https://datasciencelab.wordpress.com/tag/k-means/ K-means Clustering in Python]<br />
<br />
<br />
<br />
= Project Log For Summer 2017 =<br />
<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access<br />
*Met with Dr. Guha and discussed broad ideas surrounding the project<br />
==='''Day 2 (5/31)'''===<br />
*Attended Library orientation<br />
*Finished reading [http://essay.utwente.nl/70934/1/Slot_MA_BMS.pdf Ethics of Algorithms] by Thijs Slot. This was the last of the pre-REU reading.<br />
*Started reviewing the basics of Python<br />
*Given crime data sets to review by Dr. Guha<br />
==='''Day 3 (6/1)'''===<br />
*Attended a meeting on proper research practices by Dr. Factor<br />
*Set up direct deposit<br />
*Reviewed the basics of GitHub<br />
*Continued to review Python<br />
*Examined crime data and the various ways it was made publically available<br />
==='''Day 4 (6/2)'''===<br />
*Moved mentor meeting to Wednesday due to scheduling issue<br />
*Started reading background information provided by Dr. Guha<br />
*Set up Jupyter notebook and the various dependent libraries<br />
*Created rough implementation of K-means clustering on random data<br />
*Obtained card access to Dr. Guha's lab<br />
*Posted rough, pre-discussion milestones<br />
=='''Week Two (6/5 - 6/9)'''==<br />
==='''Day 1 (6/5)'''===<br />
*Refined K-means implementation with the K-means++ seeding described in the [https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/ Data Science Lab] article<br />
*Experimented with visual plotting of the algorithm using Seaborn and Matplotlib<br />
==='''Day 2 (6/6)'''===<br />
*Attended RCR training<br />
*Finished reading the relevant sections of [http://homepages.inf.ed.ac.uk/rbf/BOOKS/JAIN/Clustering_Jain_Dubes.pdf Algorithms for Clustering Data]<br />
*Experimented with Scikit-learn's implementation of K-means</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-06T22:00:22Z<p>Grberlstein: </p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
<br />
= Project Log For Summer 2017 =<br />
=='''Week One (5/30 - 6/2)'''==<br />
==='''Day 1 (5/30/17)'''===<br />
*Attended REU orientation<br />
*Obtained ID card and computer access</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-06T21:31:05Z<p>Grberlstein: /* Log For Summer 2017 */</p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
<br />
= Project Log For Summer 2017 =<br />
=='''Week One (5/30 - 6/2)'''==</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-06-06T21:30:20Z<p>Grberlstein: /* Log For Summer 2017 */</p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
<br />
= Log For Summer 2017 =<br />
'''Week One (5/30 - 6/3)'''</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/Analyzing_the_Ethical_Consequences_of_Popular_Clustering_AlgorithmsAnalyzing the Ethical Consequences of Popular Clustering Algorithms2017-06-02T16:21:50Z<p>Grberlstein: /* Goals */</p>
<hr />
<div>'''Mentor:''' [http://www.marquette.edu/mscs/facstaff-guha.shtml Dr. Shion Guha]<br />
<br />
'''Researchers:''' [[User:Grberlstein|Griffin Berlstein]], Justin Miller<br />
<br />
==Background==<br />
Data-driven algorithms are almost ubiquitous in modern social computing, where they make inferences regarding the behavior of users in order to better predict their needs and probable future actions. The common narrative is that these algorithms are neutral, i.e., that they operate on and analyze data without any regard for what the data contains. In some situations this may be the case; however, there is an inherent danger in assuming neutrality on the part of these algorithms as their behavior with border cases demands the use of assumptions. When data points evade simple classification algorithms can either leave them as outliers or place them into a category that they might not properly fit into. This means that depending on the initial conditions of algorithms, data points will end up in false isolation or false aggregation, and while this can seem like a natural hazard of data classification, it becomes problematic when the data points are individuals, rather than numbers. Anyone using these algorithms will believe the narrative of impartiality and will make decisions based on an analysis that may not properly represent the people being analyzed.<br />
<br />
==Goals==<br />
In this exploratory project, we will examine how popular clustering algorithms can introduce bias to a data set by testing clustering algorithms on crime and census data to see where bias is introduced. Once this is done we will test the algorithms to determine how their initial configurations can be modified to ameliorate the bias introduced.</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/Analyzing_the_Ethical_Consequences_of_Popular_Clustering_AlgorithmsAnalyzing the Ethical Consequences of Popular Clustering Algorithms2017-06-02T16:20:45Z<p>Grberlstein: Created page with "'''Mentor:''' [http://www.marquette.edu/mscs/facstaff-guha.shtml Dr. Shion Guha] '''Researchers:''' Griffin Berlstein, Justin Miller ==Background== Data..."</p>
<hr />
<div>'''Mentor:''' [http://www.marquette.edu/mscs/facstaff-guha.shtml Dr. Shion Guha]<br />
<br />
'''Researchers:''' [[User:Grberlstein|Griffin Berlstein]], Justin Miller<br />
<br />
==Background==<br />
Data-driven algorithms are almost ubiquitous in modern social computing, where they make inferences regarding the behavior of users in order to better predict their needs and probable future actions. The common narrative is that these algorithms are neutral, i.e., that they operate on and analyze data without any regard for what the data contains. In some situations this may be the case; however, there is an inherent danger in assuming neutrality on the part of these algorithms as their behavior with border cases demands the use of assumptions. When data points evade simple classification algorithms can either leave them as outliers or place them into a category that they might not properly fit into. This means that depending on the initial conditions of algorithms, data points will end up in false isolation or false aggregation, and while this can seem like a natural hazard of data classification, it becomes problematic when the data points are individuals, rather than numbers. Anyone using these algorithms will believe the narrative of impartiality and will make decisions based on an analysis that may not properly represent the people being analyzed.<br />
<br />
==Goals==<br />
In this exploratory project, we will examine how popular clustering algorithms can introduce bias to a data set by testing clustering algorithms on crime and census data to see where bias is introduced. Once this is done we will examine the algorithms to determine how their initial configurations can be modified to ameliorate the bias introduced.</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/Summer_2017_ProjectsSummer 2017 Projects2017-06-02T15:58:22Z<p>Grberlstein: </p>
<hr />
<div>Current projects:<br />
<br />
* [[Cyber Security of Social Robots and the Internet of Things]]<br />
* [[Upgrading Embedded Xinu for the Multi-Core Raspberry Pi 3]]<br />
* [[LSTMs for Energy Forecasting]]<br />
* [[Generating probabilistic models for forecasting using DNN]]<br />
* [[Stock Prediction using Social Media Analysis]]<br />
* [[Analyzing the Ethical Consequences of Popular Clustering Algorithms]]</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-05-30T18:45:04Z<p>Grberlstein: </p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
Nominally a person.<br />
<br />
<br />
== Log For Summer 2017 ==<br />
'''Week One (5/30 - 6/3)'''</div>Grberlsteinhttps://reu.cs.mu.edu/index.php/User:GrberlsteinUser:Grberlstein2017-05-30T18:42:30Z<p>Grberlstein: Created page with "== '''Griffin Berlstein''' == == Log For Summer 2017 =="</p>
<hr />
<div>== '''Griffin Berlstein''' ==<br />
<br />
== Log For Summer 2017 ==</div>Grberlstein