Diego A Pérez Morales
- Attended orientation
- Attended data science workshop
- Attended good research practices talk
- Started looking for research papers having to do with clustering
- Started learning about what is clustering and what type of methods exist
- Started looking up information about K-Mean Clustering
- Had a meeting with Prof.Zimmer to brainstorm initial ideas for the research and establish what the research is going to be about
- Set personal goals and milestones for the summer
- Attended the professional development meetings for the week
- Met with Prof.Zimmer to discuss the milestones we should tackle going into the future.
- Established recurring weekly meetings with Prof.Zimmer.
- Shared research data and other material through shared folder.
- Used the shared txt data to start preprocessing the data in python.
- Managed to create three functions in python to preprocess some sample text from the shared data.
- Coded my first experiment of the KMean model for the sample text data and managed to get my first graphs of the clustering.
- Finished the RCR requirements.
- Met with Prof.Zimmer to show him my progress up until now and set some goals for the week.
- Commented and organized most of the code I've made up until now to make it look presentable.
- Met with past students who have worked on my research before and interchanged ideas.
- Made a script to group multiple text files into one text document, which can later be used on the previous algorithm I made.
- Added a feature to lemmatize the text of the file being fed to the algorithm.
- Made some minor changes to the preprocessing of the text as a whole.
- Shifted the focus of the research to be more of a thematic approach rather than looking for certain parts of a text.
- Started looking for supervised algorithms we can use to compare to the supervised one we have.
- Started preparing a research presentation of what I've done so far.
- Attended profession meeting/presentation for the week
- Prepared a sample data file to test supervised M.L algorithms
- Managed to prepare a K Neighbors algorithm to run the sample data on.
- Managed to prepare a SVM (SVC) algorithm to run the sample data on.
- Got preliminary results with these supervised algorithms using the sample data.
- Kept preparing the mini presentation to show that I've done so far.
- Finished my mini presentation
- Met a total of 4 times with Professor Zimmer to discuss the presentation and other topics regarding our research
- Presented my mini presentation in front of the group
- Developed a python script to convert docx documents into a unified text file containing all of the contents of the docx files
- Did some more cleaning on the newly unified data, specifically for the removal of single characters and empty strings
- Did some reading into some of the supervised ML algorithms I've used up until now, specifically SVC and KNeighbors
- Attended the research poster presentation
- Received data representing the individual parts of each manually labeled document
- Reviewed the data I received in order to work with it
- Made a python script to process the 800+ docx documents and group them into a text file
- Made a python script to only process the docx documents related to a certain section of the manually labeled data
- Made a python script which categorizes the manually labeled data into a csv file, which can later be used to run on the supervised algorithms
- Met with Prof.Zimmer to update him on what I've been working on and talk a bit about the data he sent me
- Attended industry panel
- Met with Prof.Zimmer to clarify some details about the data he gave and what we want to do with it
- Made a python script to have all of the data to process in one file
- Made a python script to further process the data Prof.Zimmer gave me, which corresponds to different parts of the data science programs
- Ran some tests using this processed data on the unsupervised clustering algorithm we have up until now
- Prepared the files of the data to be ran on the supervised clustering algorithm
- Ran some initial tests, using the data from the different parts of the documents, on the supervised algorithms
- Met with Prof. Zimmer to discuss some results from the preprocessing and what we are going to do going towards the end of this REU