Diego A Pérez Morales

Weekly Log

Attended orientation
Attended data science workshop
Attended good research practices talk
Started looking for research papers having to do with clustering
Started learning about what is clustering and what type of methods exist
Started looking up information about K-Mean Clustering
Had a meeting with Prof.Zimmer to brainstorm initial ideas for the research and establish what the research is going to be about
Set personal goals and milestones for the summer

Attended the professional development meetings for the week
Met with Prof.Zimmer to discuss the milestones we should tackle going into the future.
Established recurring weekly meetings with Prof.Zimmer.
Shared research data and other material through shared folder.
Used the shared txt data to start preprocessing the data in python.
Managed to create three functions in python to preprocess some sample text from the shared data.
Coded my first experiment of the KMean model for the sample text data and managed to get my first graphs of the clustering.

Finished the RCR requirements.
Met with Prof.Zimmer to show him my progress up until now and set some goals for the week.
Commented and organized most of the code I've made up until now to make it look presentable.
Met with past students who have worked on my research before and interchanged ideas.
Made a script to group multiple text files into one text document, which can later be used on the previous algorithm I made.
Added a feature to lemmatize the text of the file being fed to the algorithm.
Made some minor changes to the preprocessing of the text as a whole.
Shifted the focus of the research to be more of a thematic approach rather than looking for certain parts of a text.
Started looking for supervised algorithms we can use to compare to the supervised one we have.
Started preparing a research presentation of what I've done so far.

Finished my mini presentation
Met a total of 4 times with Professor Zimmer to discuss the presentation and other topics regarding our research
Presented my mini presentation in front of the group
Developed a python script to convert docx documents into a unified text file containing all of the contents of the docx files
Did some more cleaning on the newly unified data, specifically for the removal of single characters and empty strings
Did some reading into some of the supervised ML algorithms I've used up until now, specifically SVC and KNeighbors

Attended the research poster presentation
Received data representing the individual parts of each manually labeled document
Reviewed the data I received in order to work with it
Made a python script to process the 800+ docx documents and group them into a text file
Made a python script to only process the docx documents related to a certain section of the manually labeled data
Made a python script which categorizes the manually labeled data into a csv file, which can later be used to run on the supervised algorithms
Met with Prof.Zimmer to update him on what I've been working on and talk a bit about the data he sent me

Attended industry panel
Met with Prof.Zimmer to clarify some details about the data he gave and what we want to do with it
Made a python script to have all of the data to process in one file
Made a python script to further process the data Prof.Zimmer gave me, which corresponds to different parts of the data science programs
Ran some tests using this processed data on the unsupervised clustering algorithm we have up until now
Prepared the files of the data to be ran on the supervised clustering algorithm

Ran some initial tests, using the data from the different parts of the documents, on the supervised algorithms
Met with Prof. Zimmer to discuss some results from the preprocessing and what we are going to do going towards the end of this REU
Was able to run the data of the different parts with both types of algorithms and get some results
Started writing down the results I got from my tests running the data for future reference
Started setting up the layout for the research poster
Went into a weekly meeting with the rest of the REU participants to see how everyone was doing with their work
Met with Prof.Zimmer to show him some of the code that I did to get some results and my initial thoughts for the poster

Went ahead and finished the research poster
Met with Prof.Zimmer to show him the poster and go over the results we got from the data
Started working on a template for the research paper
Did some research on on the SVC and KNeighbor algorithms in order to talk about them on the paper and poster
Met with the rest of the REU participants to clear up any lingering questions about the end of the REU
Finished the final formal presentation