Difference between revisions of "Diego A Pérez Morales"
From REU@MU
(→Week 10) |
|||
(13 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
*Started learning about what is clustering and what type of methods exist | *Started learning about what is clustering and what type of methods exist | ||
* Started looking up information about K-Mean Clustering | * Started looking up information about K-Mean Clustering | ||
− | * Set goals and milestones for the summer | + | *Had a meeting with Prof.Zimmer to brainstorm initial ideas for the research and establish what the research is going to be about |
− | + | * Set personal goals and milestones for the summer | |
− | + | ||
===Week 2=== | ===Week 2=== | ||
− | + | *Attended the professional development meetings for the week | |
− | + | *Met with Prof.Zimmer to discuss the milestones we should tackle going into the future. | |
+ | *Established recurring weekly meetings with Prof.Zimmer. | ||
+ | *Shared research data and other material through shared folder. | ||
+ | *Used the shared txt data to start preprocessing the data in python. | ||
+ | *Managed to create three functions in python to preprocess some sample text from the shared data. | ||
+ | *Coded my first experiment of the KMean model for the sample text data and managed to get my first graphs of the clustering. | ||
===Week 3=== | ===Week 3=== | ||
− | + | *Finished the RCR requirements. | |
− | + | *Met with Prof.Zimmer to show him my progress up until now and set some goals for the week. | |
+ | *Commented and organized most of the code I've made up until now to make it look presentable. | ||
+ | *Met with past students who have worked on my research before and interchanged ideas. | ||
+ | *Made a script to group multiple text files into one text document, which can later be used on the previous algorithm I made. | ||
+ | *Added a feature to lemmatize the text of the file being fed to the algorithm. | ||
+ | *Made some minor changes to the preprocessing of the text as a whole. | ||
+ | *Shifted the focus of the research to be more of a thematic approach rather than looking for certain parts of a text. | ||
+ | *Started looking for supervised algorithms we can use to compare to the supervised one we have. | ||
+ | *Started preparing a research presentation of what I've done so far. | ||
===Week 4=== | ===Week 4=== | ||
− | + | *Attended profession meeting/presentation for the week | |
− | + | *Prepared a sample data file to test supervised M.L algorithms | |
+ | *Managed to prepare a K Neighbors algorithm to run the sample data on. | ||
+ | *Managed to prepare a SVM (SVC) algorithm to run the sample data on. | ||
+ | *Got preliminary results with these supervised algorithms using the sample data. | ||
+ | *Kept preparing the mini presentation to show that I've done so far. | ||
===Week 5=== | ===Week 5=== | ||
− | + | *Finished my mini presentation | |
− | + | *Met a total of 4 times with Professor Zimmer to discuss the presentation and other topics regarding our research | |
+ | *Presented my mini presentation in front of the group | ||
+ | *Developed a python script to convert docx documents into a unified text file containing all of the contents of the docx files | ||
+ | *Did some more cleaning on the newly unified data, specifically for the removal of single characters and empty strings | ||
+ | *Did some reading into some of the supervised ML algorithms I've used up until now, specifically SVC and KNeighbors | ||
===Week 6=== | ===Week 6=== | ||
− | + | *Attended the research poster presentation | |
− | + | *Received data representing the individual parts of each manually labeled document | |
+ | *Reviewed the data I received in order to work with it | ||
+ | *Made a python script to process the 800+ docx documents and group them into a text file | ||
+ | *Made a python script to only process the docx documents related to a certain section of the manually labeled data | ||
+ | *Made a python script which categorizes the manually labeled data into a csv file, which can later be used to run on the supervised algorithms | ||
+ | *Met with Prof.Zimmer to update him on what I've been working on and talk a bit about the data he sent me | ||
===Week 7=== | ===Week 7=== | ||
− | + | *Attended industry panel | |
− | + | *Met with Prof.Zimmer to clarify some details about the data he gave and what we want to do with it | |
+ | *Made a python script to have all of the data to process in one file | ||
+ | *Made a python script to further process the data Prof.Zimmer gave me, which corresponds to different parts of the data science programs | ||
+ | *Ran some tests using this processed data on the unsupervised clustering algorithm we have up until now | ||
+ | *Prepared the files of the data to be ran on the supervised clustering algorithm | ||
===Week 8=== | ===Week 8=== | ||
− | + | *Ran some initial tests, using the data from the different parts of the documents, on the supervised algorithms | |
− | + | *Met with Prof. Zimmer to discuss some results from the preprocessing and what we are going to do going towards the end of this REU | |
+ | *Was able to run the data of the different parts with both types of algorithms and get some results | ||
+ | *Started writing down the results I got from my tests running the data for future reference | ||
+ | *Started setting up the layout for the research poster | ||
+ | *Went into a weekly meeting with the rest of the REU participants to see how everyone was doing with their work | ||
+ | *Met with Prof.Zimmer to show him some of the code that I did to get some results and my initial thoughts for the poster | ||
===Week 9=== | ===Week 9=== | ||
− | + | *Went ahead and finished the research poster | |
− | + | *Met with Prof.Zimmer to show him the poster and go over the results we got from the data | |
+ | *Started working on a template for the research paper | ||
+ | *Did some research on on the SVC and KNeighbor algorithms in order to talk about them on the paper and poster | ||
+ | *Met with the rest of the REU participants to clear up any lingering questions about the end of the REU | ||
+ | *Finished the final formal presentation | ||
===Week 10=== | ===Week 10=== | ||
+ | *Finished writing the research paper | ||
+ | *Gave the formal presentation for the research | ||
+ | *Participated in poster session and talked about my work with other students | ||
+ | *Participated in the formal presentation of other students | ||
+ | *Did some arrangements to prepare all of the last hand-ins for the REU |
Latest revision as of 05:28, 6 August 2021
Contents
Weekly Log
Week 1
- Attended orientation
- Attended data science workshop
- Attended good research practices talk
- Started looking for research papers having to do with clustering
- Started learning about what is clustering and what type of methods exist
- Started looking up information about K-Mean Clustering
- Had a meeting with Prof.Zimmer to brainstorm initial ideas for the research and establish what the research is going to be about
- Set personal goals and milestones for the summer
Week 2
- Attended the professional development meetings for the week
- Met with Prof.Zimmer to discuss the milestones we should tackle going into the future.
- Established recurring weekly meetings with Prof.Zimmer.
- Shared research data and other material through shared folder.
- Used the shared txt data to start preprocessing the data in python.
- Managed to create three functions in python to preprocess some sample text from the shared data.
- Coded my first experiment of the KMean model for the sample text data and managed to get my first graphs of the clustering.
Week 3
- Finished the RCR requirements.
- Met with Prof.Zimmer to show him my progress up until now and set some goals for the week.
- Commented and organized most of the code I've made up until now to make it look presentable.
- Met with past students who have worked on my research before and interchanged ideas.
- Made a script to group multiple text files into one text document, which can later be used on the previous algorithm I made.
- Added a feature to lemmatize the text of the file being fed to the algorithm.
- Made some minor changes to the preprocessing of the text as a whole.
- Shifted the focus of the research to be more of a thematic approach rather than looking for certain parts of a text.
- Started looking for supervised algorithms we can use to compare to the supervised one we have.
- Started preparing a research presentation of what I've done so far.
Week 4
- Attended profession meeting/presentation for the week
- Prepared a sample data file to test supervised M.L algorithms
- Managed to prepare a K Neighbors algorithm to run the sample data on.
- Managed to prepare a SVM (SVC) algorithm to run the sample data on.
- Got preliminary results with these supervised algorithms using the sample data.
- Kept preparing the mini presentation to show that I've done so far.
Week 5
- Finished my mini presentation
- Met a total of 4 times with Professor Zimmer to discuss the presentation and other topics regarding our research
- Presented my mini presentation in front of the group
- Developed a python script to convert docx documents into a unified text file containing all of the contents of the docx files
- Did some more cleaning on the newly unified data, specifically for the removal of single characters and empty strings
- Did some reading into some of the supervised ML algorithms I've used up until now, specifically SVC and KNeighbors
Week 6
- Attended the research poster presentation
- Received data representing the individual parts of each manually labeled document
- Reviewed the data I received in order to work with it
- Made a python script to process the 800+ docx documents and group them into a text file
- Made a python script to only process the docx documents related to a certain section of the manually labeled data
- Made a python script which categorizes the manually labeled data into a csv file, which can later be used to run on the supervised algorithms
- Met with Prof.Zimmer to update him on what I've been working on and talk a bit about the data he sent me
Week 7
- Attended industry panel
- Met with Prof.Zimmer to clarify some details about the data he gave and what we want to do with it
- Made a python script to have all of the data to process in one file
- Made a python script to further process the data Prof.Zimmer gave me, which corresponds to different parts of the data science programs
- Ran some tests using this processed data on the unsupervised clustering algorithm we have up until now
- Prepared the files of the data to be ran on the supervised clustering algorithm
Week 8
- Ran some initial tests, using the data from the different parts of the documents, on the supervised algorithms
- Met with Prof. Zimmer to discuss some results from the preprocessing and what we are going to do going towards the end of this REU
- Was able to run the data of the different parts with both types of algorithms and get some results
- Started writing down the results I got from my tests running the data for future reference
- Started setting up the layout for the research poster
- Went into a weekly meeting with the rest of the REU participants to see how everyone was doing with their work
- Met with Prof.Zimmer to show him some of the code that I did to get some results and my initial thoughts for the poster
Week 9
- Went ahead and finished the research poster
- Met with Prof.Zimmer to show him the poster and go over the results we got from the data
- Started working on a template for the research paper
- Did some research on on the SVC and KNeighbor algorithms in order to talk about them on the paper and poster
- Met with the rest of the REU participants to clear up any lingering questions about the end of the REU
- Finished the final formal presentation
Week 10
- Finished writing the research paper
- Gave the formal presentation for the research
- Participated in poster session and talked about my work with other students
- Participated in the formal presentation of other students
- Did some arrangements to prepare all of the last hand-ins for the REU