Difference between revisions of "User:Carolinearnold"

From REU@MU
Jump to: navigation, search
Line 46: Line 46:
 
* Replaced commas in campaign descriptions with semicolons. I should be able to revert back to commas before fine-tuning my LLM. If I can't, semicolons are a decent substitution because they preserve tone and readability despite grammatical incorrectness.
 
* Replaced commas in campaign descriptions with semicolons. I should be able to revert back to commas before fine-tuning my LLM. If I can't, semicolons are a decent substitution because they preserve tone and readability despite grammatical incorrectness.
 
* Fixed formatting issues in output.csv.
 
* Fixed formatting issues in output.csv.
 +
* Copied data from 504 campaigns to output.csv.
 +
** Doesn't include number of donations due to a bug.

Revision as of 22:00, 8 June 2023

Week 1: 5/30/23 - 6/2/23

Tuesday:

  • Attended REU orientation.
  • Toured Dr. Xu's lab and discussed his intentions for the project.
  • Downloaded Windows Terminal and Python 3.11 to the lab computer.
  • Installed OpenAI's command-line interface (CLI) to use in training a fine-tuned model.
  • Completed Unit 1 and the first half of Unit 2 of RCR/RECR training as mandated by the NSF.

Wednesday:

  • Attended Dr. Brylow's presentation on good research practices and the importance of keeping logs.
  • Completed Units 2 and 3 of RCR/RECR training.
  • Researched factors that influence perceived credibility in human and AI-generated communication.
  • Examined methodologies used by other researchers to evaluate the degree of trust in unknown authors.

Thursday:

  • Narrowed research to large language models (LLMs) and their capacity for social awareness.
  • Attempted to connect with Alex Fischmann, who completed a related project under Dr. Xu's mentorship.
  • Familiarized myself with PyCharm and began this tutorial for fine-tuning a model.
  • Examined deliverables produced by Fischmann during the course of her project.

Friday:

  • Explored case studies of fine-tuned LLMs and their applications.
  • Began troubleshooting the fine-tuning process using this guide as a reference.
  • Met with Dr. Xu to discuss a rough timeline of the project.
  • Drafted a summary of research goals according to our proposed timeline.

Week 2: 6/5/23 - 6/9/23

Monday:

  • Attended RCR training with Dr. Brylow.

Tuesday:

  • Installed PyCharm 2022.1.4 and required packages for web scraping.
  • Fixed warnings in Fischmann's web scraper.
  • Adapted the aforementioned script to GoFundMe's medical crowdfunding homepage.
  • Created a .csv file in which to store campaign data.
  • Retrieved the following data: title, URL, description, organizer(s), and launch date.

Wednesday:

  • Attended Dr. Brylow's presentation on technical writing and effective research talks.
  • Modified my script to retrieve the following data: amount raised, goal, beneficiary, and number of donations.
  • Added code to store campaign data in the aforementioned .csv file.
    • Implemented try/catch statements to record "NA" values in the event that data isn't found.
    • In order to store campaign descriptions in a .csv file, I had to remove commas, which will interfere with fine-tuning later in the project. Possible solutions include using a different file extension (e.g., .tsv) or replacing commas with punctuation unlikely to appear elsewhere in the description.
  • Discovered the following bugs:
    • If a campaign was launched within the last week, the launch date retrieved by the web scraper is expressed relative to the current time. For example, a campaign published yesterday will return "[launched] one day ago" as opposed to "[launched] June 7 2023." Work is needed to express all dates in the latter form OR throw out campaigns launched within the week.
    • Long descriptions are hidden by a "Read more" button on GoFundMe. My script appends "Read more" to the visible description as opposed to retrieving hidden text. Code is needed to click the "Read more" button and retrieve the full description.

Thursday:

  • Fixed bugs in description and organizer retrieval.
  • Attempted to reformat launch dates. I will more than likely have to throw out campaigns without proper dates attached.
  • Replaced commas in campaign descriptions with semicolons. I should be able to revert back to commas before fine-tuning my LLM. If I can't, semicolons are a decent substitution because they preserve tone and readability despite grammatical incorrectness.
  • Fixed formatting issues in output.csv.
  • Copied data from 504 campaigns to output.csv.
    • Doesn't include number of donations due to a bug.