Difference between revisions of "User:Crepaci"

From REU@MU
Jump to: navigation, search
(Work Log)
 
(20 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
== About Me ==
 
== About Me ==
  
I'm Charlie Repaci, a senior at Simmons University studying Data Science, with a special interest in Biochemistry and Sociology. This summer I am working with Dr. Shion Guha and doctoral candidate Devansh Saxena on Developing Ethical Algorithms for Placement Stability in the Foster Care System.
+
I'm [https://www.linkedin.com/in/charlie-repaci-07b723179/ Charlie Repaci], a senior at [https://www.simmons.edu Simmons University] studying [https://www.simmons.edu/undergraduate/academics/majors-minors/data-science-and-analytics Data Science], with a special interest in Biochemistry and Sociology. This summer I am working with [https://www.shionguha.net/ Dr. Shion Guha] and doctoral candidate [https://www.saxena.io/ Devansh Saxena] on [[Developing Ethical Algorithms for Placement Stability in the Foster Care System]].
  
 
== Work Log ==
 
== Work Log ==
  
'''Week 1'''
+
===Week 1===
 
June 1 to June 7
 
June 1 to June 7
 
# Orientation
 
# Orientation
Line 12: Line 12:
 
#* Review of REU calendar and expectations
 
#* Review of REU calendar and expectations
 
# Data Science Bootcamp (talk by Dr. Madiraju)
 
# Data Science Bootcamp (talk by Dr. Madiraju)
#* Introduction and basics of data analysis with python (Anaconda and Jypyter Notebook)
+
#* Introduction and basics of data analysis with python (Anaconda and Jypyter Notebook; pandas, numpy, matplotlib, seaborn, scipy)
 
#** Read in data
 
#** Read in data
 
#** Pre-processing
 
#** Pre-processing
 
#** Modeling
 
#** Modeling
 
#** Data visualization
 
#** Data visualization
#** Packages:
 
#*** pandas
 
#*** numpy
 
#*** matplotlib and matplotlib.pyplot
 
#*** seaborn
 
#*** scipy
 
 
# Good Research Practices (talk by Dr. Brylow)
 
# Good Research Practices (talk by Dr. Brylow)
 
# Literature Review
 
# Literature Review
Line 29: Line 23:
 
#* [https://dl.acm.org/doi/abs/10.1145/3323994.3369888 Child Welfare System: Interaction of Policy, Practice, and Algorithms]
 
#* [https://dl.acm.org/doi/abs/10.1145/3323994.3369888 Child Welfare System: Interaction of Policy, Practice, and Algorithms]
 
#* (Supplementary) [https://dl.acm.org/doi/abs/10.1145/3290605.3300497 Risk vs. Restriction: The Tension between Providing a Sense of Normalcy and Keeping Foster Teens Safe Online]
 
#* (Supplementary) [https://dl.acm.org/doi/abs/10.1145/3290605.3300497 Risk vs. Restriction: The Tension between Providing a Sense of Normalcy and Keeping Foster Teens Safe Online]
 +
 +
===Week 2===
 +
June 8 to June 14
 +
# Responsible Conduct of Research Training (talk by Dr. Brylow)
 +
#* Ethical treatment of data
 +
#* Authorship, credit, plagiarism
 +
#* Human participants
 +
#* Intellectual property
 +
#* Conflicts of interest and professional standards
 +
# Worked on getting Citi certification for all three RCR sessions
 +
# Technical Writing Workshop (talk by Dr. Brylow and Dr. Madiraju)
 +
#* What the sections of a technical paper are
 +
#* What the publication process is like
 +
#* General tips
 +
# Literature review
 +
#* [http://dx.doi.org/10.2139/ssrn.2245322 Governing Algorithms: A Provocation Piece]
 +
#* [https://academiccommons.columbia.edu/doi/10.7916/D8ZK5TW2 Algorithmic Accountability Reporting: On the Investigation of Black Boxes]
 +
#* [http://www.tandfonline.com/doi/full/10.1080/1369118X.2016.1154087#abstract Thinking Critically About and Researching Algorithms]
 +
#* (Supplementary) [http://culturedigitally.org/wp-content/uploads/2016/07/Gillespie-2016-Algorithm-Digital-Keywords-Peters-ed.pdf Algorithm in Digital Keywords: a Vocabulary of Information, Society, and Culture]
 +
# Meeting with mentors
 +
#* Github created for the project
 +
#* Questions and discussion of the literature reviewed
 +
#* Planned work for the next two weeks
 +
 +
===Week 3===
 +
June 15 to June 21
 +
# Set up Wiki entrees for myself and my project
 +
# Meeting with all REU interns to discuss our projects so far and any problems we have run into
 +
# Literature review
 +
#* [https://dl.acm.org/doi/abs/10.1145/3290605.3300760 Street-Level Algorithms: A Theory at the Gaps Between Policy and Decisions]
 +
#* [https://journals.sagepub.com/doi/abs/10.1177/2053951717738104 Algorithms as culture: Some tactics for the ethnography of algorithmic systems]
 +
#* (Supplementary) [https://journals.sagepub.com/doi/full/10.1177/2053951717751552 Algorithms as fetish: Faith and possibility in algorithmic work]
 +
# Meeting with Devansh to discuss the first dataset
 +
#* Understand variables and components -- reading the documentation that comes with datasets
 +
#* Import dataset into R and common problems
 +
 +
===Week 4===
 +
June 22 to June 28
 +
 +
# Importing data
 +
#* Need SAS to run a program included with the data files that creates .sas7bdat and .sas7bcat files from the .dat files that were given to us
 +
#* Some are not importing correctly for yet unknown reasons
 +
# Student Check-In (by Dr. Brylow and Dr. Madiraju)
 +
# Research Presentation (talk by guest lecturer Dr. Walter Bialkowski)
 +
#* Blood donation and potential risk of lower bone density to donors due to prolonged and repeated exposure to the anticoagulant citrate added to the blood (and then returned to the donor) during the donation process
 +
#* Study 1: Data analysis of Scandinavian blood donor data
 +
#** Concluded that there was no association between blood donation and the number of bone fractures donors had later in life
 +
#** Limitations included differences in blood donation policy, process, and popularity of varying donation types between Scandanavia and the United States
 +
#* Study 2: Longitudinal study of blood donors
 +
#** Concluded that current guidelines were enough to protect adult male donors between the ages of 20 and 65
 +
#** Limited in that conclusion can not be extrapolated to women or to men outside that age range
 +
# Presentation on work done in weeks 3 and 4 to mentors (readings, problems with data)
 +
 +
===Week 5===
 +
June 29 to July 5
 +
 +
# Research Presentations (talk by Dr. Brylow)
 +
#* Format and sections
 +
#** Very similar to the paper itself: introduction, background (know your audience), description of your work, results, conclusion
 +
#** Figure out 2-3 points you want your audience to take away from the talk and centralize your material around those
 +
#* General tips
 +
#** 1 slide per minute rule of thumb
 +
#** People sometimes bring extra slides in anticipation of questions
 +
#** Avoid distractions and keep it simple: avoid full sentences, use (and cite) diagrams, use simple color schemes (light text on a dark background reads well from afar)
 +
# Presented work done so far to other mentors and students
 +
#* You can find my slide deck with presentation notes [https://docs.google.com/presentation/d/1CUA1-6la1RL5SZk0vk4eeMWW7DA-WO-SVQSlE5SfKbQ/edit?usp=sharing here]
 +
# Began data exploration of the Phase I dataset (from project 107: Factors that Influence the Decision Not to Substantiate a CPS Referral)
 +
 +
===Week 6===
 +
July 6 to July 12
 +
 +
# Data Ethics Lecture (talk by Dr. Michael Zimmer)
 +
#* Empiricist epistemology and criticisms
 +
#** The idea that big data captures everything, so there is no need for theory or models, no need to worry about biased values, and no need to consult domain-specific experts
 +
#** Hidden biases in both the collection and analysis stages present considerable risks
 +
#* Rich, identifiable data from multiple sources on the same person (ex: different apps on phone)
 +
#* Questionable consent
 +
#** Clicking through the Terms of Use without reading it
 +
#** "Public" data used without identification efforts
 +
#* Reproducibility vs deidentification
 +
# More exploratory work and visualizations to highlight factors that would be interesting to look at from a technical perspective and their implications for any predictive systems
 +
# Making Research Posters (talk by Dr. Brylow)
 +
#* Usually posters display work that isn't so far along as to be published yet
 +
#* Stylistic tips to improve readability but also conserve space and convey the topic effectively from across the room
 +
#* Talk to each person for less than three minutes as this is only a peek into your work
 +
# Meeting with Mentors
 +
#* Focus on risk assessment in Phase I of the 107 data set. How is it measured? What is "normal" risk? Was it generated using a human or an algorithm? How well does it perform?
 +
#* Goals for the next two weeks
 +
#** Make a small multiple for all the factors in the set against risk assessment and write a few lines of analysis for each
 +
#** Start attempting to answer the questions posed above
 +
# Literature review
 +
#* [https://dgergle.soc.northwestern.edu//resources/pn3458-diazA.pdf Addressing Age-Related Bias in Sentiment Analysis]
 +
#* [https://www.researchgate.net/publication/257560404_Bias_in_algorithmic_filtering_and_personalization Bias in algorithmic filtering and personalization]
 +
#* Towards a Feminist HCI Methodology: Social Science, Feminism, and HCI by Shaowen Bardzell and Jeffrey Bardzell
 +
 +
===Week 7===
 +
July 13 to July 19
 +
 +
# Phase I deeper variable exploration / Start algorithm audit
 +
#* Spreadsheet created for ease of sorting by source and type
 +
#* Determined and highlighted various measures of risk and their source (human vs modeled)
 +
#* Continued to read up on Phase I methods of collection
 +
#* Continued to create small multiples of the 664 variables
 +
# Literature Review
 +
#* [https://doi.org/10.1145/3274357 The misgendering machines: Trans/HCI implications of automatic gender recognition]
 +
#* [https://reallifemag.com/counting-the-countless/ Counting the Countless: Why data science is a profound threat for queer people]
 +
#* [https://ainowinstitute.org/discriminatingsystems.pdf Discriminating systems: Gender, race and power in AI]
 +
#* [https://doi.org/10.1145/3274424 Safe spaces and safe places: Unpacking technology-mediated experiences of safety and harm with transgender people]
 +
#* [https://doi.org/10.1145/3290607.3311750 Queer(ing) HCI: Moving forward in theory and practice]
 +
 +
===Week 8===
 +
July 20 to July 26
 +
 +
# WARM (Washington Assessment of Risk Matrix) factors vs Risk Tag graphs
 +
# Graduate Schools - Discussion with Dr. Brylow
 +
#* Types of programs and funding sources
 +
#* Important parts of the application
 +
#* How to write a personal statement
 +
#* Selecting your school
 +
 +
===Week 9===
 +
July 27 to August 2
 +
 +
===Week 10===
 +
August 3 to August 9

Latest revision as of 21:40, 1 August 2020

About Me

I'm Charlie Repaci, a senior at Simmons University studying Data Science, with a special interest in Biochemistry and Sociology. This summer I am working with Dr. Shion Guha and doctoral candidate Devansh Saxena on Developing Ethical Algorithms for Placement Stability in the Foster Care System.

Work Log

Week 1

June 1 to June 7

  1. Orientation
    • Introduction to other mentors, mentees, and REU heads Dr. Praveen Madiraju and Dr. Dennis Brylow
    • Review of REU calendar and expectations
  2. Data Science Bootcamp (talk by Dr. Madiraju)
    • Introduction and basics of data analysis with python (Anaconda and Jypyter Notebook; pandas, numpy, matplotlib, seaborn, scipy)
      • Read in data
      • Pre-processing
      • Modeling
      • Data visualization
  3. Good Research Practices (talk by Dr. Brylow)
  4. Literature Review

Week 2

June 8 to June 14

  1. Responsible Conduct of Research Training (talk by Dr. Brylow)
    • Ethical treatment of data
    • Authorship, credit, plagiarism
    • Human participants
    • Intellectual property
    • Conflicts of interest and professional standards
  2. Worked on getting Citi certification for all three RCR sessions
  3. Technical Writing Workshop (talk by Dr. Brylow and Dr. Madiraju)
    • What the sections of a technical paper are
    • What the publication process is like
    • General tips
  4. Literature review
  5. Meeting with mentors
    • Github created for the project
    • Questions and discussion of the literature reviewed
    • Planned work for the next two weeks

Week 3

June 15 to June 21

  1. Set up Wiki entrees for myself and my project
  2. Meeting with all REU interns to discuss our projects so far and any problems we have run into
  3. Literature review
  4. Meeting with Devansh to discuss the first dataset
    • Understand variables and components -- reading the documentation that comes with datasets
    • Import dataset into R and common problems

Week 4

June 22 to June 28

  1. Importing data
    • Need SAS to run a program included with the data files that creates .sas7bdat and .sas7bcat files from the .dat files that were given to us
    • Some are not importing correctly for yet unknown reasons
  2. Student Check-In (by Dr. Brylow and Dr. Madiraju)
  3. Research Presentation (talk by guest lecturer Dr. Walter Bialkowski)
    • Blood donation and potential risk of lower bone density to donors due to prolonged and repeated exposure to the anticoagulant citrate added to the blood (and then returned to the donor) during the donation process
    • Study 1: Data analysis of Scandinavian blood donor data
      • Concluded that there was no association between blood donation and the number of bone fractures donors had later in life
      • Limitations included differences in blood donation policy, process, and popularity of varying donation types between Scandanavia and the United States
    • Study 2: Longitudinal study of blood donors
      • Concluded that current guidelines were enough to protect adult male donors between the ages of 20 and 65
      • Limited in that conclusion can not be extrapolated to women or to men outside that age range
  4. Presentation on work done in weeks 3 and 4 to mentors (readings, problems with data)

Week 5

June 29 to July 5

  1. Research Presentations (talk by Dr. Brylow)
    • Format and sections
      • Very similar to the paper itself: introduction, background (know your audience), description of your work, results, conclusion
      • Figure out 2-3 points you want your audience to take away from the talk and centralize your material around those
    • General tips
      • 1 slide per minute rule of thumb
      • People sometimes bring extra slides in anticipation of questions
      • Avoid distractions and keep it simple: avoid full sentences, use (and cite) diagrams, use simple color schemes (light text on a dark background reads well from afar)
  2. Presented work done so far to other mentors and students
    • You can find my slide deck with presentation notes here
  3. Began data exploration of the Phase I dataset (from project 107: Factors that Influence the Decision Not to Substantiate a CPS Referral)

Week 6

July 6 to July 12

  1. Data Ethics Lecture (talk by Dr. Michael Zimmer)
    • Empiricist epistemology and criticisms
      • The idea that big data captures everything, so there is no need for theory or models, no need to worry about biased values, and no need to consult domain-specific experts
      • Hidden biases in both the collection and analysis stages present considerable risks
    • Rich, identifiable data from multiple sources on the same person (ex: different apps on phone)
    • Questionable consent
      • Clicking through the Terms of Use without reading it
      • "Public" data used without identification efforts
    • Reproducibility vs deidentification
  2. More exploratory work and visualizations to highlight factors that would be interesting to look at from a technical perspective and their implications for any predictive systems
  3. Making Research Posters (talk by Dr. Brylow)
    • Usually posters display work that isn't so far along as to be published yet
    • Stylistic tips to improve readability but also conserve space and convey the topic effectively from across the room
    • Talk to each person for less than three minutes as this is only a peek into your work
  4. Meeting with Mentors
    • Focus on risk assessment in Phase I of the 107 data set. How is it measured? What is "normal" risk? Was it generated using a human or an algorithm? How well does it perform?
    • Goals for the next two weeks
      • Make a small multiple for all the factors in the set against risk assessment and write a few lines of analysis for each
      • Start attempting to answer the questions posed above
  5. Literature review

Week 7

July 13 to July 19

  1. Phase I deeper variable exploration / Start algorithm audit
    • Spreadsheet created for ease of sorting by source and type
    • Determined and highlighted various measures of risk and their source (human vs modeled)
    • Continued to read up on Phase I methods of collection
    • Continued to create small multiples of the 664 variables
  2. Literature Review

Week 8

July 20 to July 26

  1. WARM (Washington Assessment of Risk Matrix) factors vs Risk Tag graphs
  2. Graduate Schools - Discussion with Dr. Brylow
    • Types of programs and funding sources
    • Important parts of the application
    • How to write a personal statement
    • Selecting your school

Week 9

July 27 to August 2

Week 10

August 3 to August 9