Difference between revisions of "User:Carolinearnold"

From REU@MU
Jump to: navigation, search
Line 38: Line 38:
 
** Implemented try/catch statements to record "NA" values in the event that data isn't found.
 
** Implemented try/catch statements to record "NA" values in the event that data isn't found.
 
** In order to store campaign descriptions in a .csv file, I had to remove commas, which will interfere with fine-tuning later in the project. Possible solutions include using a different file extension (e.g., .tsv) or replacing commas with punctuation unlikely to appear elsewhere in the description.
 
** In order to store campaign descriptions in a .csv file, I had to remove commas, which will interfere with fine-tuning later in the project. Possible solutions include using a different file extension (e.g., .tsv) or replacing commas with punctuation unlikely to appear elsewhere in the description.
* TODO: If a campaign was launched within the last week, the launch date retrieved by the web scraper is expressed relative to the current time. For example, a campaign published yesterday will return "[launched] one day ago" as opposed to "[launched] June 7 2023." Work is needed to express all dates in the latter form OR throw out campaigns launched within the previous week.
+
* TODO: If a campaign was launched within the last week, the launch date retrieved by the web scraper is expressed relative to the current time. For example, a campaign published yesterday will return "[launched] one day ago" as opposed to "[launched] June 7 2023." Work is needed to express all dates in the latter form OR throw out campaigns launched within the week.

Revision as of 16:57, 8 June 2023

Week 1: 5/30/23 - 6/2/23

Tuesday:

  • Attended REU orientation.
  • Toured Dr. Xu's lab and discussed his intentions for the project.
  • Downloaded Windows Terminal and Python 3.11 to the lab computer.
  • Installed OpenAI's command-line interface (CLI) to use in training a fine-tuned model.
  • Completed Unit 1 and the first half of Unit 2 of RCR/RECR training as mandated by the NSF.

Wednesday:

  • Attended Dr. Brylow's presentation on good research practices and the importance of keeping logs.
  • Completed Units 2 and 3 of RCR/RECR training.
  • Researched factors that influence perceived credibility in human and AI-generated communication.
  • Examined methodologies used by other researchers to evaluate the degree of trust in unknown authors.

Thursday:

  • Narrowed research to large language models (LLMs) and their capacity for social awareness.
  • Attempted to connect with Alex Fischmann, who completed a related project under Dr. Xu's mentorship.
  • Familiarized myself with PyCharm and began this tutorial for fine-tuning a model.
  • Examined deliverables produced by Fischmann during the course of her project.

Friday:

  • Explored case studies of fine-tuned LLMs and their applications.
  • Began troubleshooting the fine-tuning process using this guide as a reference.
  • Met with Dr. Xu to discuss a rough timeline of the project.
  • Drafted a summary of research goals according to our proposed timeline.

Week 2: 6/5/23 - 6/9/23

Monday:

  • Attended RCR training with Dr. Brylow.

Tuesday:

  • Installed PyCharm 2022.1.4 and required packages for web scraping.
  • Fixed warnings in Fischmann's web scraper.
  • Adapted the aforementioned script to GoFundMe's medical crowdfunding homepage.
  • Created a .csv file in which to store campaign data.
  • Retrieved the following data: title, URL, description, organizer(s), and launch date.

Wednesday:

  • Attended Dr. Brylow's presentation on technical writing and effective research talks.
  • Modified my script to retrieve the following data: amount raised, goal, beneficiary, and number of donations.
  • Added code to store campaign data in the aforementioned .csv file.
    • Implemented try/catch statements to record "NA" values in the event that data isn't found.
    • In order to store campaign descriptions in a .csv file, I had to remove commas, which will interfere with fine-tuning later in the project. Possible solutions include using a different file extension (e.g., .tsv) or replacing commas with punctuation unlikely to appear elsewhere in the description.
  • TODO: If a campaign was launched within the last week, the launch date retrieved by the web scraper is expressed relative to the current time. For example, a campaign published yesterday will return "[launched] one day ago" as opposed to "[launched] June 7 2023." Work is needed to express all dates in the latter form OR throw out campaigns launched within the week.