Difference between revisions of "User:Abby.Martin"

From REU@MU
Jump to: navigation, search
(Week Four)
(Week Ten)
 
(43 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
==Project==
 
==Project==
 
===Mentor: Dr. Richard Povinelli===
 
===Mentor: Dr. Richard Povinelli===
I will be researching the application and accuracy of linear regression model trees. I aim to test the effectiveness of this method in assisting with electric load forecasting. I also plan on comparing this method of forecasting to a multitude of other methods that have also been attempted.
+
I will be researching the application and accuracy of linear regression model trees. I aim to test the effectiveness of this method in assisting with electric load forecasting. I also plan on comparing this method of forecasting to the current method used by GasDay lab.
  
 
===Goals & Milestones===
 
===Goals & Milestones===
 
*Test the influence of linear regression model trees on the accuracy of electrical use forecasting.
 
*Test the influence of linear regression model trees on the accuracy of electrical use forecasting.
 
*Determine if linear regression model trees are a better method of forecasting electric load forecasting than other methods.
 
*Determine if linear regression model trees are a better method of forecasting electric load forecasting than other methods.
*Create a linear regression model tree using MATLAB or WEKA for use in the electric portion of GasDay/ apply data and methods found/created to data from the GasDay lab.
 
 
*Research linear regression model trees and electrical usage.
 
*Research linear regression model trees and electrical usage.
 
*Continue research on linear regression model trees, electrical usage, WEKA and MATLAB. Start creating methods for forecasting electrical use using linear regression model trees.
 
*Continue research on linear regression model trees, electrical usage, WEKA and MATLAB. Start creating methods for forecasting electrical use using linear regression model trees.
 
*Continue research on linear regression model trees and electrical usage. Also learn how to effectively use MATLAB and WEKA.
 
*Continue research on linear regression model trees and electrical usage. Also learn how to effectively use MATLAB and WEKA.
 
*Test my linear regression model tree with real data and compare its effectiveness with that of other forecasting methods.
 
*Test my linear regression model tree with real data and compare its effectiveness with that of other forecasting methods.
 +
*Create a model tree that is as good as or better than the current model being used for electrical forecasting
  
 
==Weekly Goals==
 
==Weekly Goals==
Line 110: Line 110:
 
**Evaluated these various models
 
**Evaluated these various models
 
**Worked on mini presentation
 
**Worked on mini presentation
 +
*Wednesday, June 24th
 +
**Created and evaluated several more models of linear regression model trees
 +
**Generated models for various load factors
 +
**Created presentation for seminar
 
*Thursday, June 25th
 
*Thursday, June 25th
**Meet for lunch and discuss progress and challenges thus far
+
**Presented research thus far at seminar
 +
**Met for lunch and discussed progress and challenges thus far
 +
**Began researching and comprehending various methods of error analysis
 +
**Met with Dr. Povinelli to discuss research and how  to give a presentation
 +
**Began developing and evaluating trees with a minimum of 100 instances at each leaf
 +
*Friday, June 26th
 +
**Began developing and evaluating trees with a minimum of 350 instances at each leaf
 +
**Created and evaluated a model tree of only temperature and load
 +
**Created the appropriate graphs to visualize the above tree
 +
**Changed the temperature and load tree to not smoothed and repeated the above graphing process
 +
**Compared the smoothed and not smoothed trees
 +
**Normalized the real data so it can be presented
 +
*Saturday, June 27th
 +
**Worked on mini presentation
  
 
===Week Five===
 
===Week Five===
 +
*Monday, June 29th
 +
**Normalized a data set so results are appropriate for presenting
 +
**Wrote program to export actual and predicted values generated by Weka to an Excel file so appropriate statistics can be calculated
 +
**Wrote program to calculate the mean absolute percent error of the tests
 +
**Tested the data at various minimum numbers of instances and evaluated
 +
**Met with Dr. Povinelli to discuss progress
 +
**Continued work on mini presentation
 +
*Tuesday, June 30th
 +
**Prepared for seminar presentation and delivered presentation
 +
**Finished normalizing data
 +
**Calculated MAPE, RMSE, and total number of leaves for data with minimum number of instances of 350 and 500.
 +
*Wednesday, July 1st
 +
**Calculated MAPE, RMSE, and total number of leaves for data with minimum number of instances of 600, 750, 1000, 3500, and 5000.
 +
**Normalized remaining data sets and separated factors for individual testing.
 +
**Finalized mini presentation
 
*Thursday, July 2nd
 
*Thursday, July 2nd
 
**Mini Presentations
 
**Mini Presentations
 
***Informal(though serious) description of what we have been doing to receive feedback in preparation for the formal presentations in Week 10
 
***Informal(though serious) description of what we have been doing to receive feedback in preparation for the formal presentations in Week 10
 +
**Converted date to day of the week and day of the year with sine input
 +
*Friday, July 3rd
 +
**Added data points for weekend versus weekday, hourly, and previous day load data
 +
 
===Week Six===
 
===Week Six===
 +
*Monday, July 10th and Tuesday, July 11th
 +
**Read article on feed-forward neural networks and identify additional factors that contributed to reducing error
 +
**Analyzed dates with the highest error and found them to be holidays
 +
**Started creating methods to flag holidays and hopefully reduce error
 +
**Completed the RCR quizzes to finish training
 +
*Wednesday, July 12th
 +
**Analyzed largest errors in prediction to find trends in temperature and date
 +
**Read Philip Brierly's dissertation to get ideas on how to handle these anomalies
 +
**Added difference between previous day and current day as an input
 
*Thursday, July 9th
 
*Thursday, July 9th
 
**Meet for lunch and discuss progress and challenges thus far
 
**Meet for lunch and discuss progress and challenges thus far
 +
**Met with Dr. Povinelli to discuss progress thus far and how to write a research paper and create a poster
 +
**Implemented new method to handle holidays
 +
*Friday, July 10th
 +
**Worked on solutions to holiday problem and drastic change in temperature problem
 +
 
===Week Seven===
 
===Week Seven===
 +
*Monday, July 13th
 +
**Added inputs of HDD for 65 and 55 and CDD for 65 and 75
 +
**Continued to address the holiday errors and change in temperature errors
 +
**Met with Dr. Povinelli; first outline approved
 +
**Worked on more detailed paper outline
 +
*Tuesday, July 14th
 +
**Finished detailed paper outline
 +
**Compared new tests with 300 minimum number of instances and 600 minimum number of instances
 +
**Began creating graphs for poster
 +
*Wednesday, July 15th
 +
**Continued creating graphs for poster, paper, and to help visualize possible steps to reduce error
 
*Thursday, July 16th
 
*Thursday, July 16th
 
**Meet for lunch and discuss progress and challenges thus far
 
**Meet for lunch and discuss progress and challenges thus far
 +
**Met with Dr. Povinelli to discuss detailed outline and model that uses previous hour data
 +
**Began work on poster
 +
**Began work on paper rough draft
 +
*Friday, July 17th
 +
**Worked on paper rough draft
 +
 
===Week Eight===
 
===Week Eight===
 +
*Tuesday, July 21st
 +
**Performed Student t-tests to discover if the difference between the successive tests is statistically significant
 +
**Continued work on poster
 +
*Wednesday, July 22nd
 +
**Finished first draft of poster
 +
**Met with Dr. Povinelli to get feedback on poster
 
*Thursday, July 23rd
 
*Thursday, July 23rd
 +
**Attended GasDay seminar
 
**Meet for lunch and discuss progress and challenges thus far
 
**Meet for lunch and discuss progress and challenges thus far
 +
**Met with Dr. Povinelli and the rest of the electricity team to discuss work to be done
 +
**Continue work on poster
 +
*Friday, July 24th
 +
**Finished the second draft of the poster
 +
**Continued work on paper
 +
*Saturday, July 25th
 +
**Finished Poster
 +
 
===Week Nine===
 
===Week Nine===
 +
*Monday, July 27th
 +
**Met with Saber to discuss research and how he can implement it with current forecasting method
 +
**Researched how to integrate Weka in my own Java code and worked on developing my own method to call to create and evaluate a model tree
 +
**Sent Dr. Povinelli finalized poster to be approved
 +
*Tuesday, July 28th
 +
**Met with Saber again to further delve into my code and what code is left to be written
 +
**Continued to research and create a method that integrates Weka
 
*Wednesday, July 29th
 
*Wednesday, July 29th
 
**Electronic version of poster due
 
**Electronic version of poster due
 +
**Finished first rough draft of research paper
 
*Thursday, July 30th
 
*Thursday, July 30th
 +
**Attended GasDay Seminar
 
**Meet for lunch and discuss progress and challenges thus far
 
**Meet for lunch and discuss progress and challenges thus far
 +
**Began work on final presentation
 +
**Created abstract for formal presentation
 +
*Friday, July 31st
 +
**Continued to work on final presentation and editing final paper
 +
*Saturday, August 1st
 +
**Continued work on final presentation and worked on integrating Weka into own Java code
 +
 
===Week Ten===
 
===Week Ten===
 +
*Monday, August 3rd
 +
**Continued working on final presentation
 +
**Continued editing research paper
 +
**Got the program to run Weka from my own Java code running and outputting predictions and statistics
 
*Tuesday, August 4th
 
*Tuesday, August 4th
 +
**Practiced formal presentation
 
**Poster Session
 
**Poster Session
 +
**Worked on final paper
 
*Wednesday, August 5th
 
*Wednesday, August 5th
 
**First half of the formal presentations
 
**First half of the formal presentations
 +
**Worked on final paper
 
*Thursday, August 6th
 
*Thursday, August 6th
 
**Second half of the formal presentations
 
**Second half of the formal presentations
 +
**Continued work on paper
 
*Friday, August 7th
 
*Friday, August 7th
 
**Post REU Survey and Final Instructions
 
**Post REU Survey and Final Instructions
 
**Research Papers Due
 
**Research Papers Due

Latest revision as of 14:29, 7 August 2015

Project

Mentor: Dr. Richard Povinelli

I will be researching the application and accuracy of linear regression model trees. I aim to test the effectiveness of this method in assisting with electric load forecasting. I also plan on comparing this method of forecasting to the current method used by GasDay lab.

Goals & Milestones

  • Test the influence of linear regression model trees on the accuracy of electrical use forecasting.
  • Determine if linear regression model trees are a better method of forecasting electric load forecasting than other methods.
  • Research linear regression model trees and electrical usage.
  • Continue research on linear regression model trees, electrical usage, WEKA and MATLAB. Start creating methods for forecasting electrical use using linear regression model trees.
  • Continue research on linear regression model trees and electrical usage. Also learn how to effectively use MATLAB and WEKA.
  • Test my linear regression model tree with real data and compare its effectiveness with that of other forecasting methods.
  • Create a model tree that is as good as or better than the current model being used for electrical forecasting

Weekly Goals

Week One

  • Read and research papers that address the following topics:
    • Decision Trees
    • Machine Learning
    • Model Trees
    • Linear Regression Model Trees
    • Electric Load Forecasting and other methods that have been used

Week Two

  • Continue reading about linear regression model trees
  • Begin testing various datasets using the WEKA software
  • Begin reading some of the source code and documentation to better understand WEKA
  • Begin learning, using, and applying MATLAB

Week Three

  • Test real data using the WEKA software to create various model trees.
  • Read Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank, and Mark A. Hall to better understand model trees and the WEKA software
  • Become comfortable with the MATLAB software

Week Four

  • Begin testing converted data using WEKA software
  • Create various model trees and compare results
  • Prepare mini presentation

Week Five

  • Give mini presentation
  • Continue testing and comparing model trees
  • Create other forecasting models for comparison

Week Six

  • Continue comparisons of models
  • Begin writing paper and creating poster

Week Seven

  • Continue research and comparisons
  • Continue work on paper and poster

Week Eight

  • Complete poster
  • Continue work on paper

Week Nine

  • Complete paper
  • Prepare for final talk

Week Ten

  • Finalize paper
  • Poster session
  • Formal talk

Weekly Log

Week One

  • Orientation activities and forms
  • Pre-REU Survey
  • Attended GasDay Camp
  • Met Dr.Povinelli and decided on research topic
  • Read papers on my topic to discover:
    • the definition and application of decision trees
    • difference between classification trees, regression trees, and model trees
    • machine learning and how trees split

Week Two

  • Met with Dr. Povinelli to further discuss concepts and goals
    • Explained:
      • the "greedy" approach
      • the various ways to determine the "best" variable and tree
      • suggested reading about the M5P Model
  • Read about the M5P Model and learned:
    • Splits using a Standard Deviation Reduction Method
    • Uses a smoothing method for leaves
    • Article contained helpful pseudocode for understanding the process of creating a linear regression model tree
  • Began working with the WEKA software to create linear regression model trees
  • Read some of the source code from WEKA to understand the linear regression model tree creation process
  • Began working with and learning MATLAB

Week Three

  • Monday, June 15th
    • Read Data Mining: Practical Machine Learning Tools and Techniques and gained a much fuller and comprehensive understanding of data mining, model trees, and the WEKA software
  • Tuesday, June 16th
    • Attend required talk on Good Presentations, Good Technical Writing, and the Difference Between Them
    • Continued reading Data Mining: Practical Machine Learning Tools and Techniques
    • Organized research and created goals for upcoming weeks
  • Wednesday, June 17th
    • Tested data using the Weka software to compare linear regression model trees against standard linear regression
    • Found a plugin that will give MATLAB the capability of generating linear regression model trees
    • Began learning and practicing how to implement MATLAB
    • Learned about and implemented the M5PrimeLab plugin for MATLAB
  • Thursday, June 18th
    • Attend talk on Responsible Conduct of Research
    • Did the interactive movie project that was assigned to accompany RCR training
    • Successfully implemented the M5PrimeLab plugin to create linear regression model trees
  • Friday, June 19th
    • Continued with reading of Data Mining: Practical Machine Learning Tools and Techniques
    • Used Weka to compare the M5P model against standard linear regression and also to compare the cross-validation techniques against the percentage split technique.
    • Downloaded and began learning how to use the typesetting software LaTeX
    • Met with Dr. Povinelli to discuss progress

Week Four

  • Monday, June 22nd
    • Converted all MATLAB data files to arff format
    • Began work on mini presentation
    • Began testing data using Weka to create model trees
  • Tuesday, June 23rd
    • Created and tested 10 fold cross-validation tests with a minimum of 10 instances at each leaf for each data set
    • Created and tested 66% split with a minimum of 10 instances at each leaf for each data set
    • Created and tested an LR model with training set and test set
    • Evaluated these various models
    • Worked on mini presentation
  • Wednesday, June 24th
    • Created and evaluated several more models of linear regression model trees
    • Generated models for various load factors
    • Created presentation for seminar
  • Thursday, June 25th
    • Presented research thus far at seminar
    • Met for lunch and discussed progress and challenges thus far
    • Began researching and comprehending various methods of error analysis
    • Met with Dr. Povinelli to discuss research and how to give a presentation
    • Began developing and evaluating trees with a minimum of 100 instances at each leaf
  • Friday, June 26th
    • Began developing and evaluating trees with a minimum of 350 instances at each leaf
    • Created and evaluated a model tree of only temperature and load
    • Created the appropriate graphs to visualize the above tree
    • Changed the temperature and load tree to not smoothed and repeated the above graphing process
    • Compared the smoothed and not smoothed trees
    • Normalized the real data so it can be presented
  • Saturday, June 27th
    • Worked on mini presentation

Week Five

  • Monday, June 29th
    • Normalized a data set so results are appropriate for presenting
    • Wrote program to export actual and predicted values generated by Weka to an Excel file so appropriate statistics can be calculated
    • Wrote program to calculate the mean absolute percent error of the tests
    • Tested the data at various minimum numbers of instances and evaluated
    • Met with Dr. Povinelli to discuss progress
    • Continued work on mini presentation
  • Tuesday, June 30th
    • Prepared for seminar presentation and delivered presentation
    • Finished normalizing data
    • Calculated MAPE, RMSE, and total number of leaves for data with minimum number of instances of 350 and 500.
  • Wednesday, July 1st
    • Calculated MAPE, RMSE, and total number of leaves for data with minimum number of instances of 600, 750, 1000, 3500, and 5000.
    • Normalized remaining data sets and separated factors for individual testing.
    • Finalized mini presentation
  • Thursday, July 2nd
    • Mini Presentations
      • Informal(though serious) description of what we have been doing to receive feedback in preparation for the formal presentations in Week 10
    • Converted date to day of the week and day of the year with sine input
  • Friday, July 3rd
    • Added data points for weekend versus weekday, hourly, and previous day load data

Week Six

  • Monday, July 10th and Tuesday, July 11th
    • Read article on feed-forward neural networks and identify additional factors that contributed to reducing error
    • Analyzed dates with the highest error and found them to be holidays
    • Started creating methods to flag holidays and hopefully reduce error
    • Completed the RCR quizzes to finish training
  • Wednesday, July 12th
    • Analyzed largest errors in prediction to find trends in temperature and date
    • Read Philip Brierly's dissertation to get ideas on how to handle these anomalies
    • Added difference between previous day and current day as an input
  • Thursday, July 9th
    • Meet for lunch and discuss progress and challenges thus far
    • Met with Dr. Povinelli to discuss progress thus far and how to write a research paper and create a poster
    • Implemented new method to handle holidays
  • Friday, July 10th
    • Worked on solutions to holiday problem and drastic change in temperature problem

Week Seven

  • Monday, July 13th
    • Added inputs of HDD for 65 and 55 and CDD for 65 and 75
    • Continued to address the holiday errors and change in temperature errors
    • Met with Dr. Povinelli; first outline approved
    • Worked on more detailed paper outline
  • Tuesday, July 14th
    • Finished detailed paper outline
    • Compared new tests with 300 minimum number of instances and 600 minimum number of instances
    • Began creating graphs for poster
  • Wednesday, July 15th
    • Continued creating graphs for poster, paper, and to help visualize possible steps to reduce error
  • Thursday, July 16th
    • Meet for lunch and discuss progress and challenges thus far
    • Met with Dr. Povinelli to discuss detailed outline and model that uses previous hour data
    • Began work on poster
    • Began work on paper rough draft
  • Friday, July 17th
    • Worked on paper rough draft

Week Eight

  • Tuesday, July 21st
    • Performed Student t-tests to discover if the difference between the successive tests is statistically significant
    • Continued work on poster
  • Wednesday, July 22nd
    • Finished first draft of poster
    • Met with Dr. Povinelli to get feedback on poster
  • Thursday, July 23rd
    • Attended GasDay seminar
    • Meet for lunch and discuss progress and challenges thus far
    • Met with Dr. Povinelli and the rest of the electricity team to discuss work to be done
    • Continue work on poster
  • Friday, July 24th
    • Finished the second draft of the poster
    • Continued work on paper
  • Saturday, July 25th
    • Finished Poster

Week Nine

  • Monday, July 27th
    • Met with Saber to discuss research and how he can implement it with current forecasting method
    • Researched how to integrate Weka in my own Java code and worked on developing my own method to call to create and evaluate a model tree
    • Sent Dr. Povinelli finalized poster to be approved
  • Tuesday, July 28th
    • Met with Saber again to further delve into my code and what code is left to be written
    • Continued to research and create a method that integrates Weka
  • Wednesday, July 29th
    • Electronic version of poster due
    • Finished first rough draft of research paper
  • Thursday, July 30th
    • Attended GasDay Seminar
    • Meet for lunch and discuss progress and challenges thus far
    • Began work on final presentation
    • Created abstract for formal presentation
  • Friday, July 31st
    • Continued to work on final presentation and editing final paper
  • Saturday, August 1st
    • Continued work on final presentation and worked on integrating Weka into own Java code

Week Ten

  • Monday, August 3rd
    • Continued working on final presentation
    • Continued editing research paper
    • Got the program to run Weka from my own Java code running and outputting predictions and statistics
  • Tuesday, August 4th
    • Practiced formal presentation
    • Poster Session
    • Worked on final paper
  • Wednesday, August 5th
    • First half of the formal presentations
    • Worked on final paper
  • Thursday, August 6th
    • Second half of the formal presentations
    • Continued work on paper
  • Friday, August 7th
    • Post REU Survey and Final Instructions
    • Research Papers Due