Difference between revisions of "User:KonradJ"
From REU@MU
(→2023 Week 7:( July 10th - July 14th )) |
|||
(7 intermediate revisions by the same user not shown) | |||
Line 89: | Line 89: | ||
** Removed punctuation, made lowercase, tokenized, then stemmed the columns | ** Removed punctuation, made lowercase, tokenized, then stemmed the columns | ||
* Continued to read up on relevent information about how to preprocess and word2vector strategies | * Continued to read up on relevent information about how to preprocess and word2vector strategies | ||
− | == 2023 Week | + | == 2023 Week 4: (June 19th - June 23rd ) == |
'''Tuesday (6/20)''' | '''Tuesday (6/20)''' | ||
Line 106: | Line 106: | ||
'''Friday (6/23)''' | '''Friday (6/23)''' | ||
− | == 2023 Week | + | == 2023 Week 5: (June 26th - June 30th) == |
'''Monday (6/26)''' | '''Monday (6/26)''' | ||
* Reviewed papers and other materials for writing script out for presentation | * Reviewed papers and other materials for writing script out for presentation | ||
Line 130: | Line 130: | ||
* Talked with mentor about slowing down and just working on BOW model, too big picture right now | * Talked with mentor about slowing down and just working on BOW model, too big picture right now | ||
− | == 2023 Week | + | == 2023 Week 6:( July 3rd - July 7th ) == |
'''Monday (7/3)''' | '''Monday (7/3)''' | ||
* Reviewed two research papers for content and ideas on methodology | * Reviewed two research papers for content and ideas on methodology | ||
Line 153: | Line 153: | ||
* Continued review of Coursera materials | * Continued review of Coursera materials | ||
− | == 2023 Week | + | == 2023 Week 7:( July 10th - July 14th ) == |
'''Monday (7/10)''' | '''Monday (7/10)''' | ||
* Tinkered with preprocessed text attempting to get it into vector form | * Tinkered with preprocessed text attempting to get it into vector form | ||
Line 159: | Line 159: | ||
* Made a new notebook and tried different ways of preprocessing data when CountVectorizer() is involved | * Made a new notebook and tried different ways of preprocessing data when CountVectorizer() is involved | ||
'''Tuesday (7/11)''' | '''Tuesday (7/11)''' | ||
− | + | * Continued looking into vecotirzation methods that allow for stemming (hitting bit of a roadblock) | |
'''Wednesday (7/12)''' | '''Wednesday (7/12)''' | ||
− | + | * Learned about do's and don'ts of poster cration from Brylow | |
− | + | * Talked to some peers about progress and what we want to do with our papers and posters | |
+ | * Looked into some poster templates and started preparing an outline for the poster content | ||
'''Thursday (7/13)''' | '''Thursday (7/13)''' | ||
− | + | * Connected with a colleasgue of my mentor | |
+ | ** Sent me some useful materials I will review tomorrow | ||
+ | * Called him and talked about BERT and my project | ||
'''Friday (7/14)''' | '''Friday (7/14)''' | ||
+ | * Reviewed resourses from mentor's colleague on Youtube | ||
+ | ** Best BERT content I've found so far takes its time and breaks it down | ||
+ | * Downloaded some notebook files from there and started tinkering with them in Jupyter |
Latest revision as of 22:06, 21 July 2023
Contents
2023 Week 1: (May 30th - June 4th)
Tuesday (5/30)
- Attended REU oreintation
- Met with peers and mentors
Wednesday (5/31)
- Attended lecture lead by Brylow about research papers and how to log progress properly
- Met with mentor to talk about goals and milestones and shared research papers to read (will update tomorrow)
Thursday (6/1)
- Sadly most of my day was consumed with moving into my apartment so minimized my ability to work
- Went over my first research paper with a surface level reading.
- Assessment of Medical Reports Uncertainity through Topic Modeling and Machine Learning
Friday (6/2)
- Read various research articles including the one from yesterday more thoroughly
- Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile SImilarity
- Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data
- Took notes on what I wrote later
Sunday (6/4)
- Took more notes on the research papers I've already read
- Began a lengthy research paper
- Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach
- Prepared Log for this upcoming week
2023 Week 2: (June 5th - June 11th)
Monday (6/5)
- Met with group for federally mandated training.
Tuesday (6/6)
- Met with peers and discussed where evyerone was at in their process
- Finished the lengthy research paper from the other day
- Read the last of my research papers
- Challenges and opportunities beyond structured data in analysis of electronic health records
- Got up to the first lab of a coursera course recommended by my mentor
Wednesday (6/7)
- Completed 2 coursera labs
- Met with peers and discussed milestones before hearing a lecture on technical writing and presenting from Dr. Brylow
- Met with my mentor and talked about goals for this week
- Reviewing the relevent data in the form of csv files
Thursday (6/8)
- Got to the end of the first section of coursera and began the lab associated
- Began looking at relevent files for the summer project
- Csv files that contain patient data that we will use to practice uncertainty quantifying strategies
Friday (6/9)
- Dug deeper into research papers I had read to find what the most perteinent files were for the MIMIC-III dataset
- Seems like the best dataset for us would be the D_ICD_DIAGNOSES.csv which has ICD-9 codes for each diagnoses
- Continued online natural language processing course through coursera
- Naive Bayes, Bayes Rule, Laplacian smoothing, and log likelihood
Sunday (6/11)
- Completed week 2 on Naive Bayes in Coursera
- Got passing grade on week 2 quiz on material and completed all labs
- Started Week 3 course on Vector Space Models
- Completed a couple labs
- Prepared log for upcoming week
2023 Week 3: (June 12th - June 16th)
Monday (6/12)
- Completed Week 3 courses for the Coursera class I am taking
- Vector Space Models
- Dug more into the ICD_9 records
Tuesday (6/13)
- Completed labs from week 1 and week 2 of coursera course
- Logistic Regression and Naive Bayes
Wednesday (6/14)
- Heard research talk from Dr. Praveen
- Talked with peers about where they are in their projects
- Continued with week 4 of coursera lectures
Thursday (6/15)
- Finished week 4 of coursera lectures
- Machine Translation and Document Search
- Did week 3 lab for coursera
- Vector Space Models
Friday (6/16)
- Began messing with preprocessing functions with some columns of the dataset (MIMIC-III, D_ICD_DIAGNOSES)
- Removed punctuation, made lowercase, tokenized, then stemmed the columns
- Continued to read up on relevent information about how to preprocess and word2vector strategies
2023 Week 4: (June 19th - June 23rd )
Tuesday (6/20)
- Re-read one of the more valuable research papers to get an idea of where to go from here after learning more about the relevent technologies and algorithms
- Reviewed coursera materials to better understand the steps in the process of natural language processing
- Continued to tinker with the data with preprocessing before I get more in depth this week
Wednesday (6/21)
- Listened to a great research lecture from Dr. Walt Bialkowski on using data science to predict food shortages at local pantries
- Talked with peers about their projects after the lecture
- Began creating some slides and reviewing materials for mini-presentation
Thursday (6/22)
- Met with my mentor to talk about mini-presentation coming up next week as well as adjustments to my pre-processing methods and what to do next
- Continued with creation of the mini-presentation and script to go along with it
Friday (6/23)
2023 Week 5: (June 26th - June 30th)
Monday (6/26)
- Reviewed papers and other materials for writing script out for presentation
- Created some slides
Tuesday (6/27)
- Wrote the script for talk
- Finished presentation slides
- Practiced until I felt comfortable with the material
Wednesday (6/28)
- Gave presentation to my peers
- Talked with them after about their projects and upcoming plans
Thursday (6/29)
- Met with my mentor to establish game plan for this week and the upcoming weeks
- Watched some videos on approaches to the BERT model and picked out some artciles
Friday (6/30)
- Read thorough artciles with implmentation notes and code
- Started to attempt implementing myself
- Found a new implementation I liked, confused on how to split up the data or if I should be even doing that before I apply the model
- Talked with mentor about slowing down and just working on BOW model, too big picture right now
2023 Week 6:( July 3rd - July 7th )
Monday (7/3)
- Reviewed two research papers for content and ideas on methodology
- Worked to complete BOW models after slowing down and attempting to take it step by step
- Had some problems with tokenization and stemming attempting to iron those out
- Word count has also been a problem and trying to find out if its in the preprocessing or model itself
Tuesday (7/4)
Wednesday (7/5)
- Watched the rest of my peers' research presentations
- Looked at some other tokenization methods to help with current preprocessing bug
- Reviewd Coursera notes and videos they have to refresh on concepts
Thursday (7/6)
- Reviewed more of the coursera materials
- Conintued to look into different tokenizers
Friday (7/7)
- Ran into some trouble with stemming so looked into different methods.
- Still have't been able to deal with a tokenization error that effects BOW
- Continued review of Coursera materials
2023 Week 7:( July 10th - July 14th )
Monday (7/10)
- Tinkered with preprocessed text attempting to get it into vector form
- Looked into alternatives to CountVectorizer()
- Made a new notebook and tried different ways of preprocessing data when CountVectorizer() is involved
Tuesday (7/11)
- Continued looking into vecotirzation methods that allow for stemming (hitting bit of a roadblock)
Wednesday (7/12)
- Learned about do's and don'ts of poster cration from Brylow
- Talked to some peers about progress and what we want to do with our papers and posters
- Looked into some poster templates and started preparing an outline for the poster content
Thursday (7/13)
- Connected with a colleasgue of my mentor
- Sent me some useful materials I will review tomorrow
- Called him and talked about BERT and my project
Friday (7/14)
- Reviewed resourses from mentor's colleague on Youtube
- Best BERT content I've found so far takes its time and breaks it down
- Downloaded some notebook files from there and started tinkering with them in Jupyter