Difference between revisions of "User:Carolinearnold"

From REU@MU
Jump to: navigation, search
Line 38: Line 38:
 
** Implemented try/catch statements to record "NA" values in the event that data isn't found.
 
** Implemented try/catch statements to record "NA" values in the event that data isn't found.
 
** In order to store campaign descriptions in a .csv file, I had to remove commas, which will interfere with fine-tuning later in the project. Possible solutions include using a different file extension (e.g., .tsv) or replacing commas with punctuation unlikely to appear elsewhere in the description.
 
** In order to store campaign descriptions in a .csv file, I had to remove commas, which will interfere with fine-tuning later in the project. Possible solutions include using a different file extension (e.g., .tsv) or replacing commas with punctuation unlikely to appear elsewhere in the description.
* TODO: If a campaign was launched within the last week, the launch date retrieved by the web scraper is expressed relative to the current time. For example, a campaign published yesterday will return "[launched] one day ago" as opposed to "[launched] June 7 2023." Work is needed to express all dates in the latter form OR throw out campaigns launched within the week.
+
* Discovered the following bugs:
 +
** If a campaign was launched within the last week, the launch date retrieved by the web scraper is expressed relative to the current time. For example, a campaign published yesterday will return "[launched] one day ago" as opposed to "[launched] June 7 2023." Work is needed to express all dates in the latter form OR throw out campaigns launched within the week.
 +
** Long descriptions are hidden by a "Read more" button on GoFundMe. My script appends "Read more" to the visible description as opposed to retrieving hidden text. Code is needed to click the "Read more" button and retrieve the full description.

Revision as of 17:08, 8 June 2023

Week 1: 5/30/23 - 6/2/23

Tuesday:

  • Attended REU orientation.
  • Toured Dr. Xu's lab and discussed his intentions for the project.
  • Downloaded Windows Terminal and Python 3.11 to the lab computer.
  • Installed OpenAI's command-line interface (CLI) to use in training a fine-tuned model.
  • Completed Unit 1 and the first half of Unit 2 of RCR/RECR training as mandated by the NSF.

Wednesday:

  • Attended Dr. Brylow's presentation on good research practices and the importance of keeping logs.
  • Completed Units 2 and 3 of RCR/RECR training.
  • Researched factors that influence perceived credibility in human and AI-generated communication.
  • Examined methodologies used by other researchers to evaluate the degree of trust in unknown authors.

Thursday:

  • Narrowed research to large language models (LLMs) and their capacity for social awareness.
  • Attempted to connect with Alex Fischmann, who completed a related project under Dr. Xu's mentorship.
  • Familiarized myself with PyCharm and began this tutorial for fine-tuning a model.
  • Examined deliverables produced by Fischmann during the course of her project.

Friday:

  • Explored case studies of fine-tuned LLMs and their applications.
  • Began troubleshooting the fine-tuning process using this guide as a reference.
  • Met with Dr. Xu to discuss a rough timeline of the project.
  • Drafted a summary of research goals according to our proposed timeline.

Week 2: 6/5/23 - 6/9/23

Monday:

  • Attended RCR training with Dr. Brylow.

Tuesday:

  • Installed PyCharm 2022.1.4 and required packages for web scraping.
  • Fixed warnings in Fischmann's web scraper.
  • Adapted the aforementioned script to GoFundMe's medical crowdfunding homepage.
  • Created a .csv file in which to store campaign data.
  • Retrieved the following data: title, URL, description, organizer(s), and launch date.

Wednesday:

  • Attended Dr. Brylow's presentation on technical writing and effective research talks.
  • Modified my script to retrieve the following data: amount raised, goal, beneficiary, and number of donations.
  • Added code to store campaign data in the aforementioned .csv file.
    • Implemented try/catch statements to record "NA" values in the event that data isn't found.
    • In order to store campaign descriptions in a .csv file, I had to remove commas, which will interfere with fine-tuning later in the project. Possible solutions include using a different file extension (e.g., .tsv) or replacing commas with punctuation unlikely to appear elsewhere in the description.
  • Discovered the following bugs:
    • If a campaign was launched within the last week, the launch date retrieved by the web scraper is expressed relative to the current time. For example, a campaign published yesterday will return "[launched] one day ago" as opposed to "[launched] June 7 2023." Work is needed to express all dates in the latter form OR throw out campaigns launched within the week.
    • Long descriptions are hidden by a "Read more" button on GoFundMe. My script appends "Read more" to the visible description as opposed to retrieving hidden text. Code is needed to click the "Read more" button and retrieve the full description.