Diego A Pérez Morales
From REU@MU
Contents
Weekly Log
Week 1
- Attended orientation
- Attended data science workshop
- Attended good research practices talk
- Started looking for research papers having to do with clustering
- Started learning about what is clustering and what type of methods exist
- Started looking up information about K-Mean Clustering
- Had a meeting with Prof.Zimmer to brainstorm initial ideas for the research and establish what the research is going to be about
- Set personal goals and milestones for the summer
Week 2
- Attended the professional development meetings for the week
- Met with Prof.Zimmer to discuss the milestones we should tackle going into the future.
- Established recurring weekly meetings with Prof.Zimmer.
- Shared research data and other material through shared folder.
- Used the shared txt data to start preprocessing the data in python.
- Managed to create three functions in python to preprocess some sample text from the shared data.
- Coded my first experiment of the KMean model for the sample text data and managed to get my first graphs of the clustering.
Week 3
- Finished the RCR requirements.
- Met with Prof.Zimmer to show him my progress up until now and set some goals for the week.
- Commented and organized most of the code I've made up until now to make it look presentable.
- Met with past students who have worked on my research before and interchanged ideas.
- Made a script to group multiple text files into one text document, which can later be used on the previous algorithm I made.
- Added a feature to lemmatize the text of the file being fed to the algorithm.
- Made some minor changes to the preprocessing of the text as a whole.
- Shifted the focus of the research to be more of a thematic approach rather than looking for certain parts of a text.
- Started looking for supervised algorithms we can use to compare to the supervised one we have.
- Started preparing a research presentation of what I've done so far.
Week 4
- Attended profession meeting/presentation for the week
- Prepared a sample data file to test supervised M.L algorithms
- Managed to prepare a K Neighbors algorithm to run the sample data on.
- Managed to prepare a SVM (SVC) algorithm to run the sample data on.
- Got preliminary results with these supervised algorithms using the sample data.
- Kept preparing the mini presentation to show that I've done so far.
Week 5
- Finished my mini presentation
- Met a total of 4 times with Professor Zimmer to discuss the presentation and other topics regarding our research
- Presented my mini presentation in front of the group
- Developed a python script to convert docx documents into a unified text file containing all of the contents of the docx files
- Did some more cleaning on the newly unified data, specifically for the removal of single characters and empty strings
- Did some reading into some of the supervised ML algorithms I've used up until now, specifically SVC and KNeighbors