Difference between revisions of "User:Abby.Martin"

From REU@MU
Jump to: navigation, search
(Week Six)
(Goals & Milestones)
Line 6: Line 6:
 
*Test the influence of linear regression model trees on the accuracy of electrical use forecasting.
 
*Test the influence of linear regression model trees on the accuracy of electrical use forecasting.
 
*Determine if linear regression model trees are a better method of forecasting electric load forecasting than other methods.
 
*Determine if linear regression model trees are a better method of forecasting electric load forecasting than other methods.
*Create a linear regression model tree using MATLAB or WEKA for use in the electric portion of GasDay/ apply data and methods found/created to data from the GasDay lab.
 
 
*Research linear regression model trees and electrical usage.
 
*Research linear regression model trees and electrical usage.
 
*Continue research on linear regression model trees, electrical usage, WEKA and MATLAB. Start creating methods for forecasting electrical use using linear regression model trees.
 
*Continue research on linear regression model trees, electrical usage, WEKA and MATLAB. Start creating methods for forecasting electrical use using linear regression model trees.
 
*Continue research on linear regression model trees and electrical usage. Also learn how to effectively use MATLAB and WEKA.
 
*Continue research on linear regression model trees and electrical usage. Also learn how to effectively use MATLAB and WEKA.
 
*Test my linear regression model tree with real data and compare its effectiveness with that of other forecasting methods.
 
*Test my linear regression model tree with real data and compare its effectiveness with that of other forecasting methods.
 +
*Create a model tree that is as good as or better than the current model being used for electrical forecasting
  
 
==Weekly Goals==
 
==Weekly Goals==

Revision as of 19:21, 9 July 2015

Project

Mentor: Dr. Richard Povinelli

I will be researching the application and accuracy of linear regression model trees. I aim to test the effectiveness of this method in assisting with electric load forecasting. I also plan on comparing this method of forecasting to a multitude of other methods that have also been attempted.

Goals & Milestones

  • Test the influence of linear regression model trees on the accuracy of electrical use forecasting.
  • Determine if linear regression model trees are a better method of forecasting electric load forecasting than other methods.
  • Research linear regression model trees and electrical usage.
  • Continue research on linear regression model trees, electrical usage, WEKA and MATLAB. Start creating methods for forecasting electrical use using linear regression model trees.
  • Continue research on linear regression model trees and electrical usage. Also learn how to effectively use MATLAB and WEKA.
  • Test my linear regression model tree with real data and compare its effectiveness with that of other forecasting methods.
  • Create a model tree that is as good as or better than the current model being used for electrical forecasting

Weekly Goals

Week One

  • Read and research papers that address the following topics:
    • Decision Trees
    • Machine Learning
    • Model Trees
    • Linear Regression Model Trees
    • Electric Load Forecasting and other methods that have been used

Week Two

  • Continue reading about linear regression model trees
  • Begin testing various datasets using the WEKA software
  • Begin reading some of the source code and documentation to better understand WEKA
  • Begin learning, using, and applying MATLAB

Week Three

  • Test real data using the WEKA software to create various model trees.
  • Read Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank, and Mark A. Hall to better understand model trees and the WEKA software
  • Become comfortable with the MATLAB software

Week Four

  • Begin testing converted data using WEKA software
  • Create various model trees and compare results
  • Prepare mini presentation

Week Five

  • Give mini presentation
  • Continue testing and comparing model trees
  • Create other forecasting models for comparison

Week Six

  • Continue comparisons of models
  • Begin writing paper and creating poster

Week Seven

  • Continue research and comparisons
  • Continue work on paper and poster

Week Eight

  • Complete poster
  • Continue work on paper

Week Nine

  • Complete paper
  • Prepare for final talk

Week Ten

  • Finalize paper
  • Poster session
  • Formal talk

Weekly Log

Week One

  • Orientation activities and forms
  • Pre-REU Survey
  • Attended GasDay Camp
  • Met Dr.Povinelli and decided on research topic
  • Read papers on my topic to discover:
    • the definition and application of decision trees
    • difference between classification trees, regression trees, and model trees
    • machine learning and how trees split

Week Two

  • Met with Dr. Povinelli to further discuss concepts and goals
    • Explained:
      • the "greedy" approach
      • the various ways to determine the "best" variable and tree
      • suggested reading about the M5P Model
  • Read about the M5P Model and learned:
    • Splits using a Standard Deviation Reduction Method
    • Uses a smoothing method for leaves
    • Article contained helpful pseudocode for understanding the process of creating a linear regression model tree
  • Began working with the WEKA software to create linear regression model trees
  • Read some of the source code from WEKA to understand the linear regression model tree creation process
  • Began working with and learning MATLAB

Week Three

  • Monday, June 15th
    • Read Data Mining: Practical Machine Learning Tools and Techniques and gained a much fuller and comprehensive understanding of data mining, model trees, and the WEKA software
  • Tuesday, June 16th
    • Attend required talk on Good Presentations, Good Technical Writing, and the Difference Between Them
    • Continued reading Data Mining: Practical Machine Learning Tools and Techniques
    • Organized research and created goals for upcoming weeks
  • Wednesday, June 17th
    • Tested data using the Weka software to compare linear regression model trees against standard linear regression
    • Found a plugin that will give MATLAB the capability of generating linear regression model trees
    • Began learning and practicing how to implement MATLAB
    • Learned about and implemented the M5PrimeLab plugin for MATLAB
  • Thursday, June 18th
    • Attend talk on Responsible Conduct of Research
    • Did the interactive movie project that was assigned to accompany RCR training
    • Successfully implemented the M5PrimeLab plugin to create linear regression model trees
  • Friday, June 19th
    • Continued with reading of Data Mining: Practical Machine Learning Tools and Techniques
    • Used Weka to compare the M5P model against standard linear regression and also to compare the cross-validation techniques against the percentage split technique.
    • Downloaded and began learning how to use the typesetting software LaTeX
    • Met with Dr. Povinelli to discuss progress

Week Four

  • Monday, June 22nd
    • Converted all MATLAB data files to arff format
    • Began work on mini presentation
    • Began testing data using Weka to create model trees
  • Tuesday, June 23rd
    • Created and tested 10 fold cross-validation tests with a minimum of 10 instances at each leaf for each data set
    • Created and tested 66% split with a minimum of 10 instances at each leaf for each data set
    • Created and tested an LR model with training set and test set
    • Evaluated these various models
    • Worked on mini presentation
  • Wednesday, June 24th
    • Created and evaluated several more models of linear regression model trees
    • Generated models for various load factors
    • Created presentation for seminar
  • Thursday, June 25th
    • Presented research thus far at seminar
    • Met for lunch and discussed progress and challenges thus far
    • Began researching and comprehending various methods of error analysis
    • Met with Dr. Povinelli to discuss research and how to give a presentation
    • Began developing and evaluating trees with a minimum of 100 instances at each leaf
  • Friday, June 26th
    • Began developing and evaluating trees with a minimum of 350 instances at each leaf
    • Created and evaluated a model tree of only temperature and load
    • Created the appropriate graphs to visualize the above tree
    • Changed the temperature and load tree to not smoothed and repeated the above graphing process
    • Compared the smoothed and not smoothed trees
    • Normalized the real data so it can be presented
  • Saturday, June 27th
    • Worked on mini presentation

Week Five

  • Monday, June 29th
    • Normalized a data set so results are appropriate for presenting
    • Wrote program to export actual and predicted values generated by Weka to an Excel file so appropriate statistics can be calculated
    • Wrote program to calculate the mean absolute percent error of the tests
    • Tested the data at various minimum numbers of instances and evaluated
    • Met with Dr. Povinelli to discuss progress
    • Continued work on mini presentation
  • Tuesday, June 30th
    • Prepared for seminar presentation and delivered presentation
    • Finished normalizing data
    • Calculated MAPE, RMSE, and total number of leaves for data with minimum number of instances of 350 and 500.
  • Wednesday, July 1st
    • Calculated MAPE, RMSE, and total number of leaves for data with minimum number of instances of 600, 750, 1000, 3500, and 5000.
    • Normalized remaining data sets and separated factors for individual testing.
    • Finalized mini presentation
  • Thursday, July 2nd
    • Mini Presentations
      • Informal(though serious) description of what we have been doing to receive feedback in preparation for the formal presentations in Week 10
    • Converted date to day of the week and day of the year with sine input
  • Friday, July 3rd
    • Added data points for weekend versus weekday, hourly, and previous day load data

Week Six

  • Monday, July 10th and Tuesday, July 11th
    • Read article on feed-forward neural networks and identify additional factors that contributed to reducing error
    • Analyzed dates with the highest error and found them to be holidays
    • Started creating methods to flag holidays and hopefully reduce error
    • Completed the RCR quizzes to finish training
  • Wednesday, July 12th
    • Analyzed largest errors in prediction to find trends in temperature and date
    • Read Philip Brierly's dissertation to get ideas on how to handle these anomalies
    • Added difference between previous day and current day as an input
  • Thursday, July 9th
    • Meet for lunch and discuss progress and challenges thus far
    • Met with Dr. Povinelli to discuss progress thus far and how to write a research paper and create a poster
    • Implemented new method to handle holidays

Week Seven

  • Thursday, July 16th
    • Meet for lunch and discuss progress and challenges thus far

Week Eight

  • Thursday, July 23rd
    • Meet for lunch and discuss progress and challenges thus far

Week Nine

  • Wednesday, July 29th
    • Electronic version of poster due
  • Thursday, July 30th
    • Meet for lunch and discuss progress and challenges thus far

Week Ten

  • Tuesday, August 4th
    • Poster Session
  • Wednesday, August 5th
    • First half of the formal presentations
  • Thursday, August 6th
    • Second half of the formal presentations
  • Friday, August 7th
    • Post REU Survey and Final Instructions
    • Research Papers Due