Difference between revisions of "User:KonradJ"
From REU@MU
(→2023 Week 3: (June 19th - June 23rd )) |
(→2023 Week 7:( July 10th - July 14th )) |
||
(20 intermediate revisions by the same user not shown) | |||
Line 89: | Line 89: | ||
** Removed punctuation, made lowercase, tokenized, then stemmed the columns | ** Removed punctuation, made lowercase, tokenized, then stemmed the columns | ||
* Continued to read up on relevent information about how to preprocess and word2vector strategies | * Continued to read up on relevent information about how to preprocess and word2vector strategies | ||
− | == 2023 Week | + | == 2023 Week 4: (June 19th - June 23rd ) == |
− | '''Tuesday (6/ | + | '''Tuesday (6/20)''' |
* Re-read one of the more valuable research papers to get an idea of where to go from here after learning more about the relevent technologies and algorithms | * Re-read one of the more valuable research papers to get an idea of where to go from here after learning more about the relevent technologies and algorithms | ||
* Reviewed coursera materials to better understand the steps in the process of natural language processing | * Reviewed coursera materials to better understand the steps in the process of natural language processing | ||
* Continued to tinker with the data with preprocessing before I get more in depth this week | * Continued to tinker with the data with preprocessing before I get more in depth this week | ||
− | + | '''Wednesday (6/21)''' | |
− | '''Wednesday (6/ | + | *Listened to a great research lecture from Dr. Walt Bialkowski on using data science to predict food shortages at local pantries |
+ | * Talked with peers about their projects after the lecture | ||
+ | *Began creating some slides and reviewing materials for mini-presentation | ||
+ | '''Thursday (6/22)''' | ||
+ | * Met with my mentor to talk about mini-presentation coming up next week as well as adjustments to my pre-processing methods and what to do next | ||
+ | * Continued with creation of the mini-presentation and script to go along with it | ||
− | ''' | + | '''Friday (6/23)''' |
− | '''Friday (6/ | + | == 2023 Week 5: (June 26th - June 30th) == |
+ | '''Monday (6/26)''' | ||
+ | * Reviewed papers and other materials for writing script out for presentation | ||
+ | *Created some slides | ||
+ | |||
+ | '''Tuesday (6/27)''' | ||
+ | *Wrote the script for talk | ||
+ | *Finished presentation slides | ||
+ | *Practiced until I felt comfortable with the material | ||
+ | |||
+ | '''Wednesday (6/28)''' | ||
+ | * Gave presentation to my peers | ||
+ | * Talked with them after about their projects and upcoming plans | ||
+ | |||
+ | '''Thursday (6/29)''' | ||
+ | * Met with my mentor to establish game plan for this week and the upcoming weeks | ||
+ | * Watched some videos on approaches to the BERT model and picked out some artciles | ||
+ | |||
+ | '''Friday (6/30)''' | ||
+ | * Read thorough artciles with implmentation notes and code | ||
+ | * Started to attempt implementing myself | ||
+ | * Found a new implementation I liked, confused on how to split up the data or if I should be even doing that before I apply the model | ||
+ | * Talked with mentor about slowing down and just working on BOW model, too big picture right now | ||
+ | |||
+ | == 2023 Week 6:( July 3rd - July 7th ) == | ||
+ | '''Monday (7/3)''' | ||
+ | * Reviewed two research papers for content and ideas on methodology | ||
+ | * Worked to complete BOW models after slowing down and attempting to take it step by step | ||
+ | ** Had some problems with tokenization and stemming attempting to iron those out | ||
+ | * Word count has also been a problem and trying to find out if its in the preprocessing or model itself | ||
+ | |||
+ | '''Tuesday (7/4)''' | ||
+ | |||
+ | '''Wednesday (7/5)''' | ||
+ | * Watched the rest of my peers' research presentations | ||
+ | * Looked at some other tokenization methods to help with current preprocessing bug | ||
+ | * Reviewd Coursera notes and videos they have to refresh on concepts | ||
+ | |||
+ | '''Thursday (7/6)''' | ||
+ | * Reviewed more of the coursera materials | ||
+ | * Conintued to look into different tokenizers | ||
+ | |||
+ | '''Friday (7/7)''' | ||
+ | * Ran into some trouble with stemming so looked into different methods. | ||
+ | * Still have't been able to deal with a tokenization error that effects BOW | ||
+ | * Continued review of Coursera materials | ||
+ | |||
+ | == 2023 Week 7:( July 10th - July 14th ) == | ||
+ | '''Monday (7/10)''' | ||
+ | * Tinkered with preprocessed text attempting to get it into vector form | ||
+ | * Looked into alternatives to CountVectorizer() | ||
+ | * Made a new notebook and tried different ways of preprocessing data when CountVectorizer() is involved | ||
+ | '''Tuesday (7/11)''' | ||
+ | * Continued looking into vecotirzation methods that allow for stemming (hitting bit of a roadblock) | ||
+ | '''Wednesday (7/12)''' | ||
+ | * Learned about do's and don'ts of poster cration from Brylow | ||
+ | * Talked to some peers about progress and what we want to do with our papers and posters | ||
+ | * Looked into some poster templates and started preparing an outline for the poster content | ||
+ | '''Thursday (7/13)''' | ||
+ | * Connected with a colleasgue of my mentor | ||
+ | ** Sent me some useful materials I will review tomorrow | ||
+ | * Called him and talked about BERT and my project | ||
+ | '''Friday (7/14)''' | ||
+ | * Reviewed resourses from mentor's colleague on Youtube | ||
+ | ** Best BERT content I've found so far takes its time and breaks it down | ||
+ | * Downloaded some notebook files from there and started tinkering with them in Jupyter |
Latest revision as of 22:06, 21 July 2023
Contents
2023 Week 1: (May 30th - June 4th)
Tuesday (5/30)
- Attended REU oreintation
- Met with peers and mentors
Wednesday (5/31)
- Attended lecture lead by Brylow about research papers and how to log progress properly
- Met with mentor to talk about goals and milestones and shared research papers to read (will update tomorrow)
Thursday (6/1)
- Sadly most of my day was consumed with moving into my apartment so minimized my ability to work
- Went over my first research paper with a surface level reading.
- Assessment of Medical Reports Uncertainity through Topic Modeling and Machine Learning
Friday (6/2)
- Read various research articles including the one from yesterday more thoroughly
- Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile SImilarity
- Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data
- Took notes on what I wrote later
Sunday (6/4)
- Took more notes on the research papers I've already read
- Began a lengthy research paper
- Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach
- Prepared Log for this upcoming week
2023 Week 2: (June 5th - June 11th)
Monday (6/5)
- Met with group for federally mandated training.
Tuesday (6/6)
- Met with peers and discussed where evyerone was at in their process
- Finished the lengthy research paper from the other day
- Read the last of my research papers
- Challenges and opportunities beyond structured data in analysis of electronic health records
- Got up to the first lab of a coursera course recommended by my mentor
Wednesday (6/7)
- Completed 2 coursera labs
- Met with peers and discussed milestones before hearing a lecture on technical writing and presenting from Dr. Brylow
- Met with my mentor and talked about goals for this week
- Reviewing the relevent data in the form of csv files
Thursday (6/8)
- Got to the end of the first section of coursera and began the lab associated
- Began looking at relevent files for the summer project
- Csv files that contain patient data that we will use to practice uncertainty quantifying strategies
Friday (6/9)
- Dug deeper into research papers I had read to find what the most perteinent files were for the MIMIC-III dataset
- Seems like the best dataset for us would be the D_ICD_DIAGNOSES.csv which has ICD-9 codes for each diagnoses
- Continued online natural language processing course through coursera
- Naive Bayes, Bayes Rule, Laplacian smoothing, and log likelihood
Sunday (6/11)
- Completed week 2 on Naive Bayes in Coursera
- Got passing grade on week 2 quiz on material and completed all labs
- Started Week 3 course on Vector Space Models
- Completed a couple labs
- Prepared log for upcoming week
2023 Week 3: (June 12th - June 16th)
Monday (6/12)
- Completed Week 3 courses for the Coursera class I am taking
- Vector Space Models
- Dug more into the ICD_9 records
Tuesday (6/13)
- Completed labs from week 1 and week 2 of coursera course
- Logistic Regression and Naive Bayes
Wednesday (6/14)
- Heard research talk from Dr. Praveen
- Talked with peers about where they are in their projects
- Continued with week 4 of coursera lectures
Thursday (6/15)
- Finished week 4 of coursera lectures
- Machine Translation and Document Search
- Did week 3 lab for coursera
- Vector Space Models
Friday (6/16)
- Began messing with preprocessing functions with some columns of the dataset (MIMIC-III, D_ICD_DIAGNOSES)
- Removed punctuation, made lowercase, tokenized, then stemmed the columns
- Continued to read up on relevent information about how to preprocess and word2vector strategies
2023 Week 4: (June 19th - June 23rd )
Tuesday (6/20)
- Re-read one of the more valuable research papers to get an idea of where to go from here after learning more about the relevent technologies and algorithms
- Reviewed coursera materials to better understand the steps in the process of natural language processing
- Continued to tinker with the data with preprocessing before I get more in depth this week
Wednesday (6/21)
- Listened to a great research lecture from Dr. Walt Bialkowski on using data science to predict food shortages at local pantries
- Talked with peers about their projects after the lecture
- Began creating some slides and reviewing materials for mini-presentation
Thursday (6/22)
- Met with my mentor to talk about mini-presentation coming up next week as well as adjustments to my pre-processing methods and what to do next
- Continued with creation of the mini-presentation and script to go along with it
Friday (6/23)
2023 Week 5: (June 26th - June 30th)
Monday (6/26)
- Reviewed papers and other materials for writing script out for presentation
- Created some slides
Tuesday (6/27)
- Wrote the script for talk
- Finished presentation slides
- Practiced until I felt comfortable with the material
Wednesday (6/28)
- Gave presentation to my peers
- Talked with them after about their projects and upcoming plans
Thursday (6/29)
- Met with my mentor to establish game plan for this week and the upcoming weeks
- Watched some videos on approaches to the BERT model and picked out some artciles
Friday (6/30)
- Read thorough artciles with implmentation notes and code
- Started to attempt implementing myself
- Found a new implementation I liked, confused on how to split up the data or if I should be even doing that before I apply the model
- Talked with mentor about slowing down and just working on BOW model, too big picture right now
2023 Week 6:( July 3rd - July 7th )
Monday (7/3)
- Reviewed two research papers for content and ideas on methodology
- Worked to complete BOW models after slowing down and attempting to take it step by step
- Had some problems with tokenization and stemming attempting to iron those out
- Word count has also been a problem and trying to find out if its in the preprocessing or model itself
Tuesday (7/4)
Wednesday (7/5)
- Watched the rest of my peers' research presentations
- Looked at some other tokenization methods to help with current preprocessing bug
- Reviewd Coursera notes and videos they have to refresh on concepts
Thursday (7/6)
- Reviewed more of the coursera materials
- Conintued to look into different tokenizers
Friday (7/7)
- Ran into some trouble with stemming so looked into different methods.
- Still have't been able to deal with a tokenization error that effects BOW
- Continued review of Coursera materials
2023 Week 7:( July 10th - July 14th )
Monday (7/10)
- Tinkered with preprocessed text attempting to get it into vector form
- Looked into alternatives to CountVectorizer()
- Made a new notebook and tried different ways of preprocessing data when CountVectorizer() is involved
Tuesday (7/11)
- Continued looking into vecotirzation methods that allow for stemming (hitting bit of a roadblock)
Wednesday (7/12)
- Learned about do's and don'ts of poster cration from Brylow
- Talked to some peers about progress and what we want to do with our papers and posters
- Looked into some poster templates and started preparing an outline for the poster content
Thursday (7/13)
- Connected with a colleasgue of my mentor
- Sent me some useful materials I will review tomorrow
- Called him and talked about BERT and my project
Friday (7/14)
- Reviewed resourses from mentor's colleague on Youtube
- Best BERT content I've found so far takes its time and breaks it down
- Downloaded some notebook files from there and started tinkering with them in Jupyter