Difference between revisions of "User:KonradJ"

From REU@MU
Jump to: navigation, search
(2023 Week 3: (June 19th - June 23rd ))
(2023 Week 7:( July 10th - July 14th ))
 
(20 intermediate revisions by the same user not shown)
Line 89: Line 89:
 
** Removed punctuation, made lowercase, tokenized, then stemmed the columns
 
** Removed punctuation, made lowercase, tokenized, then stemmed the columns
 
* Continued to read up on relevent information about how to preprocess and word2vector strategies
 
* Continued to read up on relevent information about how to preprocess and word2vector strategies
== 2023 Week 3: (June 19th - June 23rd ) ==
+
== 2023 Week 4: (June 19th - June 23rd ) ==
  
'''Tuesday (6/13)'''
+
'''Tuesday (6/20)'''
 
* Re-read one of the more valuable research papers to get an idea of where to go from here after learning more about the relevent technologies and algorithms
 
* Re-read one of the more valuable research papers to get an idea of where to go from here after learning more about the relevent technologies and algorithms
 
* Reviewed coursera materials to better understand the steps in the process of natural language processing
 
* Reviewed coursera materials to better understand the steps in the process of natural language processing
 
* Continued to tinker with the data with preprocessing before I get more in depth this week
 
* Continued to tinker with the data with preprocessing before I get more in depth this week
* Reviewed Naive Bayes training notes taken from Coursera
+
'''Wednesday (6/21)'''
'''Wednesday (6/14)'''
+
*Listened to a great research lecture from Dr. Walt Bialkowski on using data science to predict food shortages at local pantries
 +
* Talked with peers about their projects after the lecture
 +
*Began creating some slides and reviewing materials for mini-presentation
  
 +
'''Thursday (6/22)'''
 +
* Met with my mentor to talk about mini-presentation coming up next week as well as adjustments to my pre-processing methods and what to do next
 +
* Continued with creation of the mini-presentation and script to go along with it
  
'''Thursday (6/15)'''
+
'''Friday (6/23)'''
  
'''Friday (6/16)'''
+
== 2023 Week 5: (June 26th - June 30th) ==
 +
'''Monday (6/26)'''
 +
* Reviewed papers and other materials for writing script out for presentation
 +
*Created some slides
 +
 
 +
'''Tuesday (6/27)'''
 +
*Wrote the script for talk
 +
*Finished presentation slides
 +
*Practiced until I felt comfortable with the material
 +
 
 +
'''Wednesday (6/28)'''
 +
* Gave presentation to my peers
 +
* Talked with them after about their projects and upcoming plans
 +
 
 +
'''Thursday (6/29)'''
 +
* Met with my mentor to establish game plan for this week and the upcoming weeks
 +
* Watched some videos on approaches to the BERT model and picked out some artciles
 +
 
 +
'''Friday (6/30)'''
 +
* Read thorough artciles with implmentation notes and code
 +
* Started to attempt implementing myself
 +
* Found a new implementation I liked, confused on how to split up the data or if I should be even doing that before I apply the model
 +
* Talked with mentor about slowing down and just working on BOW model, too big picture right now
 +
 
 +
== 2023 Week 6:( July 3rd - July 7th ) ==
 +
'''Monday (7/3)'''
 +
* Reviewed two research papers for content and ideas on methodology
 +
* Worked to complete BOW models after slowing down and attempting to take it step by step
 +
** Had some problems with tokenization and stemming attempting to iron those out
 +
* Word count has also been a problem and trying to find out if its in the preprocessing or model itself
 +
 
 +
'''Tuesday (7/4)'''
 +
 
 +
'''Wednesday (7/5)'''
 +
* Watched the rest of my peers' research presentations
 +
* Looked at some other tokenization methods to help with current preprocessing bug
 +
* Reviewd Coursera notes and videos they have to refresh on concepts
 +
 
 +
'''Thursday (7/6)'''
 +
* Reviewed more of the coursera materials
 +
* Conintued to look into different tokenizers
 +
 
 +
'''Friday (7/7)'''
 +
* Ran into some trouble with stemming so looked into different methods.
 +
* Still have't been able to deal with a tokenization error that effects BOW
 +
* Continued review of Coursera materials
 +
 
 +
== 2023 Week 7:( July 10th - July 14th ) ==
 +
'''Monday (7/10)'''
 +
* Tinkered with preprocessed text attempting to get it into vector form
 +
* Looked into alternatives to CountVectorizer()
 +
* Made a new notebook and tried different ways of preprocessing data when CountVectorizer() is involved
 +
'''Tuesday (7/11)'''
 +
* Continued looking into vecotirzation methods that allow for stemming (hitting bit of a roadblock)
 +
'''Wednesday (7/12)'''
 +
* Learned about do's and don'ts of poster cration from Brylow
 +
* Talked to some peers about progress and what we want to do with our papers and posters
 +
* Looked into some poster templates and started preparing an outline for the poster content
 +
'''Thursday (7/13)'''
 +
* Connected with a colleasgue of my mentor
 +
** Sent me some useful materials I will review tomorrow
 +
* Called him and talked about BERT and my project
 +
'''Friday (7/14)'''
 +
* Reviewed resourses from mentor's colleague on Youtube
 +
** Best BERT content I've found so far takes its time and breaks it down
 +
* Downloaded some notebook files from there and started tinkering with them in Jupyter

Latest revision as of 22:06, 21 July 2023

2023 Week 1: (May 30th - June 4th)

Tuesday (5/30)

  • Attended REU oreintation
  • Met with peers and mentors

Wednesday (5/31)

  • Attended lecture lead by Brylow about research papers and how to log progress properly
  • Met with mentor to talk about goals and milestones and shared research papers to read (will update tomorrow)

Thursday (6/1)

  • Sadly most of my day was consumed with moving into my apartment so minimized my ability to work
  • Went over my first research paper with a surface level reading.
    • Assessment of Medical Reports Uncertainity through Topic Modeling and Machine Learning

Friday (6/2)

  • Read various research articles including the one from yesterday more thoroughly
    • Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile SImilarity
    • Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data
  • Took notes on what I wrote later

Sunday (6/4)

  • Took more notes on the research papers I've already read
  • Began a lengthy research paper
    • Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach
  • Prepared Log for this upcoming week


2023 Week 2: (June 5th - June 11th)

Monday (6/5)

  • Met with group for federally mandated training.

Tuesday (6/6)

  • Met with peers and discussed where evyerone was at in their process
  • Finished the lengthy research paper from the other day
  • Read the last of my research papers
    • Challenges and opportunities beyond structured data in analysis of electronic health records
  • Got up to the first lab of a coursera course recommended by my mentor

Wednesday (6/7)

  • Completed 2 coursera labs
  • Met with peers and discussed milestones before hearing a lecture on technical writing and presenting from Dr. Brylow
  • Met with my mentor and talked about goals for this week
    • Reviewing the relevent data in the form of csv files


Thursday (6/8)

  • Got to the end of the first section of coursera and began the lab associated
  • Began looking at relevent files for the summer project
    • Csv files that contain patient data that we will use to practice uncertainty quantifying strategies

Friday (6/9)

  • Dug deeper into research papers I had read to find what the most perteinent files were for the MIMIC-III dataset
    • Seems like the best dataset for us would be the D_ICD_DIAGNOSES.csv which has ICD-9 codes for each diagnoses
  • Continued online natural language processing course through coursera
    • Naive Bayes, Bayes Rule, Laplacian smoothing, and log likelihood

Sunday (6/11)

  • Completed week 2 on Naive Bayes in Coursera
    • Got passing grade on week 2 quiz on material and completed all labs
  • Started Week 3 course on Vector Space Models
    • Completed a couple labs
  • Prepared log for upcoming week

2023 Week 3: (June 12th - June 16th)

Monday (6/12)

  • Completed Week 3 courses for the Coursera class I am taking
    • Vector Space Models
  • Dug more into the ICD_9 records

Tuesday (6/13)

  • Completed labs from week 1 and week 2 of coursera course
    • Logistic Regression and Naive Bayes

Wednesday (6/14)

  • Heard research talk from Dr. Praveen
  • Talked with peers about where they are in their projects
  • Continued with week 4 of coursera lectures

Thursday (6/15)

  • Finished week 4 of coursera lectures
    • Machine Translation and Document Search
  • Did week 3 lab for coursera
    • Vector Space Models

Friday (6/16)

  • Began messing with preprocessing functions with some columns of the dataset (MIMIC-III, D_ICD_DIAGNOSES)
    • Removed punctuation, made lowercase, tokenized, then stemmed the columns
  • Continued to read up on relevent information about how to preprocess and word2vector strategies

2023 Week 4: (June 19th - June 23rd )

Tuesday (6/20)

  • Re-read one of the more valuable research papers to get an idea of where to go from here after learning more about the relevent technologies and algorithms
  • Reviewed coursera materials to better understand the steps in the process of natural language processing
  • Continued to tinker with the data with preprocessing before I get more in depth this week

Wednesday (6/21)

  • Listened to a great research lecture from Dr. Walt Bialkowski on using data science to predict food shortages at local pantries
  • Talked with peers about their projects after the lecture
  • Began creating some slides and reviewing materials for mini-presentation

Thursday (6/22)

  • Met with my mentor to talk about mini-presentation coming up next week as well as adjustments to my pre-processing methods and what to do next
  • Continued with creation of the mini-presentation and script to go along with it

Friday (6/23)

2023 Week 5: (June 26th - June 30th)

Monday (6/26)

  • Reviewed papers and other materials for writing script out for presentation
  • Created some slides

Tuesday (6/27)

  • Wrote the script for talk
  • Finished presentation slides
  • Practiced until I felt comfortable with the material

Wednesday (6/28)

  • Gave presentation to my peers
  • Talked with them after about their projects and upcoming plans

Thursday (6/29)

  • Met with my mentor to establish game plan for this week and the upcoming weeks
  • Watched some videos on approaches to the BERT model and picked out some artciles

Friday (6/30)

  • Read thorough artciles with implmentation notes and code
  • Started to attempt implementing myself
  • Found a new implementation I liked, confused on how to split up the data or if I should be even doing that before I apply the model
  • Talked with mentor about slowing down and just working on BOW model, too big picture right now

2023 Week 6:( July 3rd - July 7th )

Monday (7/3)

  • Reviewed two research papers for content and ideas on methodology
  • Worked to complete BOW models after slowing down and attempting to take it step by step
    • Had some problems with tokenization and stemming attempting to iron those out
  • Word count has also been a problem and trying to find out if its in the preprocessing or model itself

Tuesday (7/4)

Wednesday (7/5)

  • Watched the rest of my peers' research presentations
  • Looked at some other tokenization methods to help with current preprocessing bug
  • Reviewd Coursera notes and videos they have to refresh on concepts

Thursday (7/6)

  • Reviewed more of the coursera materials
  • Conintued to look into different tokenizers

Friday (7/7)

  • Ran into some trouble with stemming so looked into different methods.
  • Still have't been able to deal with a tokenization error that effects BOW
  • Continued review of Coursera materials

2023 Week 7:( July 10th - July 14th )

Monday (7/10)

  • Tinkered with preprocessed text attempting to get it into vector form
  • Looked into alternatives to CountVectorizer()
  • Made a new notebook and tried different ways of preprocessing data when CountVectorizer() is involved

Tuesday (7/11)

  • Continued looking into vecotirzation methods that allow for stemming (hitting bit of a roadblock)

Wednesday (7/12)

  • Learned about do's and don'ts of poster cration from Brylow
  • Talked to some peers about progress and what we want to do with our papers and posters
  • Looked into some poster templates and started preparing an outline for the poster content

Thursday (7/13)

  • Connected with a colleasgue of my mentor
    • Sent me some useful materials I will review tomorrow
  • Called him and talked about BERT and my project

Friday (7/14)

  • Reviewed resourses from mentor's colleague on Youtube
    • Best BERT content I've found so far takes its time and breaks it down
  • Downloaded some notebook files from there and started tinkering with them in Jupyter