User:KonradJ
From REU@MU
Contents
2023 Week 1: (May 30th - June 4th)
Tuesday (5/30)
- Attended REU oreintation
- Met with peers and mentors
Wednesday (5/31)
- Attended lecture lead by Brylow about research papers and how to log progress properly
- Met with mentor to talk about goals and milestones and shared research papers to read (will update tomorrow)
Thursday (6/1)
- Sadly most of my day was consumed with moving into my apartment so minimized my ability to work
- Went over my first research paper with a surface level reading.
- Assessment of Medical Reports Uncertainity through Topic Modeling and Machine Learning
Friday (6/2)
- Read various research articles including the one from yesterday more thoroughly
- Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile SImilarity
- Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data
- Took notes on what I wrote later
Sunday (6/4)
- Took more notes on the research papers I've already read
- Began a lengthy research paper
- Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach
- Prepared Log for this upcoming week
2023 Week 2: (June 5th - June 11th)
Monday (6/5)
- Met with group for federally mandated training.
Tuesday (6/6)
- Met with peers and discussed where evyerone was at in their process
- Finished the lengthy research paper from the other day
- Read the last of my research papers
- Challenges and opportunities beyond structured data in analysis of electronic health records
- Got up to the first lab of a coursera course recommended by my mentor
Wednesday (6/7)
- Completed 2 coursera labs
- Met with peers and discussed milestones before hearing a lecture on technical writing and presenting from Dr. Brylow
- Met with my mentor and talked about goals for this week
- Reviewing the relevent data in the form of csv files
Thursday (6/8)
- Got to the end of the first section of coursera and began the lab associated
- Began looking at relevent files for the summer project
- Csv files that contain patient data that we will use to practice uncertainty quantifying strategies
Friday (6/9)
- Dug deeper into research papers I had read to find what the most perteinent files were for the MIMIC-III dataset
- Seems like the best dataset for us would be the D_ICD_DIAGNOSES.csv which has ICD-9 codes for each diagnoses
- Continued online natural language processing course through coursera
- Naive Bayes, Bayes Rule, Laplacian smoothing, and log likelihood
Sunday (6/11)
- Completed week 2 on Naive Bayes in Coursera
- Got passing grade on week 2 quiz on material and completed all labs
- Started Week 3 course on Vector Space Models
- Completed a couple labs
- Prepared log for upcoming week
2023 Week 3: (June 12th - June 16th)
Monday (6/12)
- Completed Week 3 courses for the Coursera class I am taking
- Vector Space Models
- Dug more into the ICD_9 records
Tuesday (6/13)
- Completed labs from week 1 and week 2 of coursera course
- Logistic Regression and Naive Bayes
Wednesday (6/14)
- Heard research talk from Dr. Praveen
- Talked with peers about where they are in their projects
- Continued with week 4 of coursera lectures
Thursday (6/15)
- Finished week 4 of coursera lectures
- Machine Translation and Document Search
- Did week 3 lab for coursera
- Vector Space Models
Friday (6/16)
- Began messing with preprocessing functions with some columns of the dataset (MIMIC-III, D_ICD_DIAGNOSES)
- Removed punctuation, made lowercase, tokenized, then stemmed the columns
- Continued to read up on relevent information about how to preprocess and word2vector strategies
2023 Week 3: (June 19th - June 23rd )
Tuesday (6/20)
- Re-read one of the more valuable research papers to get an idea of where to go from here after learning more about the relevent technologies and algorithms
- Reviewed coursera materials to better understand the steps in the process of natural language processing
- Continued to tinker with the data with preprocessing before I get more in depth this week
Wednesday (6/21)
- Listened to a great research lecture from Dr. Walt Bialkowski on using data science to predict food shortages at local pantries
- Talked with peers about their projects after the lecture
- Began creating some slides and reviewing materials for mini-presentation
Thursday (6/22)
- Met with my mentor to talk about mini-presentation coming up next week as well as adjustments to my pre-processing methods and what to do next
- Continued with creation of the mini-presentation and script to go along with it
Friday (6/23)
2023 Week 4: (June 26th - June 30th)
Monday (6/26)
- Reviewed papers and other materials for writing script out for presentation
- Created some slides
Tuesday (6/27)
- Wrote the script for talk
- Finished presentation slides
- Practiced until I felt comfortable with the material
Wednesday (6/28)
- Gave presentation to my peers
- Talked with them after about their projects and upcoming plans
Thursday (6/29)
- Met with my mentor to establish game plan for this week and the upcoming weeks
- Watched some videos on approaches to the BERT model and picked out some artciles
Friday (6/30)
- Read thorough artciles with implmentation notes and code
- Started to attempt implementing myself
- Found a new implementation I liked, confused on how to split up the data or if I should be even doing that before I apply the model
- Talked with mentor about slowing down and just working on BOW model, too big picture right now
2023 Week 5:( July 3rd - July 7th )
Monday (7/3)
- Reviewed two research papers for content and ideas on methodology
- Worked to complete BOW models after slowing down and attempting to take it step by step
- Had some problems with tokenization and stemming attempting to iron those out
- Word count has also been a problem and trying to find out if its in the preprocessing or model itself
Tuesday (7/4)
Wednesday (7/5)
- Watched the rest of my peers' research presentations
- Looked at some other tokenization methods to help with current preprocessing bug
- Reviewd Coursera notes and videos they have to refresh on concepts
Thursday (7/6)
- Reviewed more of the coursera materials
- Conintued to look into different tokenizers
Friday (7/7)
- Ran into some trouble with stemming so looked into different methods.
- Still have't been able to deal with a tokenization error that effects BOW
- Continued review of Coursera materials
2023 Week 6:( July 10th - July 14th )
Monday (7/10)
- Tinkered with preprocessed text attempting to get it into vector form
- Looked into alternatives to CountVectorizer()
- Made a new notebook and tried different ways of preprocessing data when CountVectorizer() is involved
Tuesday (7/11)
Wednesday (7/12)
Thursday (7/13)
Friday (7/14)