Diego A Pérez Morales

From REU@MU
Jump to: navigation, search

Weekly Log

Week 1

  • Attended orientation
  • Attended data science workshop
  • Attended good research practices talk
  • Started looking for research papers having to do with clustering
  • Started learning about what is clustering and what type of methods exist
  • Started looking up information about K-Mean Clustering
  • Had a meeting with Prof.Zimmer to brainstorm initial ideas for the research and establish what the research is going to be about
  • Set personal goals and milestones for the summer

Week 2

  • Attended the professional development meetings for the week
  • Met with Prof.Zimmer to discuss the milestones we should tackle going into the future.
  • Established recurring weekly meetings with Prof.Zimmer.
  • Shared research data and other material through shared folder.
  • Used the shared txt data to start preprocessing the data in python.
  • Managed to create three functions in python to preprocess some sample text from the shared data.
  • Coded my first experiment of the KMean model for the sample text data and managed to get my first graphs of the clustering.

Week 3

  • Finished the RCR requirements.
  • Met with Prof.Zimmer to show him my progress up until now and set some goals for the week.
  • Commented and organized most of the code I've made up until now to make it look presentable.
  • Met with past students who have worked on my research before and interchanged ideas.
  • Made a script to group multiple text files into one text document, which can later be used on the previous algorithm I made.
  • Added a feature to lemmatize the text of the file being fed to the algorithm.
  • Made some minor changes to the preprocessing of the text as a whole.
  • Shifted the focus of the research to be more of a thematic approach rather than looking for certain parts of a text.
  • Started looking for supervised algorithms we can use to compare to the supervised one we have.
  • Started preparing a research presentation of what I've done so far.

Week 4

  • Attended profession meeting/presentation for the week
  • Prepared a sample data file to test supervised M.L algorithms
  • Managed to prepare a K Neighbors algorithm to run the sample data on.
  • Managed to prepare a SVM (SVC) algorithm to run the sample data on.
  • Got preliminary results with these supervised algorithms using the sample data.
  • Kept preparing the mini presentation to show that I've done so far.

Week 5

  • Finished my mini presentation
  • Met a total of 4 times with Professor Zimmer to discuss the presentation and other topics regarding our research
  • Presented my mini presentation in front of the group
  • Developed a python script to convert docx documents into a unified text file containing all of the contents of the docx files
  • Did some more cleaning on the newly unified data, specifically for the removal of single characters and empty strings
  • Did some reading into some of the supervised ML algorithms I've used up until now, specifically SVC and KNeighbors

Week 6

  • Attended the research poster presentation
  • Received data representing the individual parts of each manually labeled document
  • Reviewed the data I received in order to work with it
  • Made a python script to process the 800+ docx documents and group them into a text file
  • Made a python script to only process the docx documents related to a certain section of the manually labeled data
  • Made a python script which categorizes the manually labeled data into a csv file, which can later be used to run on the supervised algorithms
  • Met with Prof.Zimmer to update him on what I've been working on and talk a bit about the data he sent me

Week 7

  • Attended industry panel
  • Met with Prof.Zimmer to clarify some details about the data he gave and what we want to do with it
  • Made a python script to have all of the data to process in one file
  • Made a python script to further process the data Prof.Zimmer gave me, which corresponds to different parts of the data science programs
  • Ran some tests using this processed data on the unsupervised clustering algorithm we have up until now
  • Prepared the files of the data to be ran on the supervised clustering algorithm

Week 8

  • Ran some initial tests, using the data from the different parts of the documents, on the supervised algorithms
  • Met with Prof. Zimmer to discuss some results from the preprocessing and what we are going to do going towards the end of this REU

Week 9

Week 10